As part of this topic, let us see how we can come up with solutions using collections with basic programming constructs as well as map reduce APIs of itertools.
Problem Statement
- Get count by order status
- Read data from orders
- Table Structure: order_id, order_date, order_customer_id, order_status
- Data is in the local file system
Design
- Read data from the files using open
- Create list using data from the file
- Iterate through list and create dict where key will be date
- In each iteration do the look up into the dict
- If the key exists add 1 to the value
- If key does not exist add new element to dict with 1 as value
- Finally we should get a dict which contain date as key and count as value
- Using basic constructs do not have clear separation of duties, that is where map reduce APIs come into picture
- Using map reduce APIs with packages such as itertools gives us clear separation of duties
- Map Reduce APIs are primarily come handy when dealing with huge data sets with distributed frameworks such as map reduce, spark etc.
Start the discussion at discuss.itversity.com