MapReduce program has two phases Map phase and Reduce phase. As part of Map phase Shuffle operation is also performed.
- Map task
- Shuffle - sort & group
- Reduce Task
Stock Quote Example
Download historical data for any listed company from google finance.
For example: This link provides historical data for Google stock for last one year in CSV format
This csv file contains fields like Date, Opening Price, High, Low price and closing price for each day.
Sample data below:
Now lets use Hadoop to find out the Highest stock price for a given year.
Mapper - This function extracts required fields from above data and create a file as shown below:
May, 873.88
May, 863.87
May, 861.85
May, 846.8
May, 834.55
May, 824.72
April, 783.75
April, 779.55
April, 786.99
April, 805.75
April, 814.2
April, 814.83
April, 802.25
The Shuffle step sort and group data for each month as shown below:
It sorts the data(key value pair) by key. Now input to reducer function look like:
(May, [ 873.88, 863.87, 861.85, 846.8, 834.55, 824.72 ] )
(April, [775.5 , 778.75 , 786.06, 804.25, 813.46, 804.54, 795.01] )
Now Reducer function loop through this data and find out the highest price for last one year.
(May , 873.88)
Data flow:
input data > Mapper function > Shuffle(sort/ group) > Reducer function > output file
4 comments:
This is a really cool post i have ever read.I really love the way you explain all the things about the stock price and also about the share.Keep posting.We need it a lot... :)
In Hadoop, MapReduce is a calculation that decomposes large manipulation jobs into individual tasks that can be executed in parallel cross a cluster of servers. The results of tasks can be joined together to compute final results.
Mapreduce program example
Hadoop fs command using java api
Post a Comment