Friday, May 10, 2013

How MapReduce Works - Stock Quote Example


MapReduce program has two phases Map phase and Reduce phase. As part of Map phase Shuffle operation is also performed.

  • Map task
  • Shuffle - sort & group
  • Reduce Task



Stock Quote Example

Download historical data for any listed company from google finance.
For example: This link provides historical data for Google stock for last one year in CSV format
This csv file contains fields like Date, Opening Price, High, Low price and closing price for each day.
Sample data below:



Now lets use Hadoop to find out the Highest stock price for a given year.

Mapper - This function extracts required fields from above data and create a file as shown below:


May, 873.88
May, 863.87
May, 861.85
May, 846.8
May, 834.55
May, 824.72
April, 783.75
April, 779.55
April, 786.99
April, 805.75
April, 814.2
April, 814.83
April, 802.25


The Shuffle step sort and group data for each month as shown below:
It sorts the data(key value pair) by key. Now input to reducer function look like:

(May, [ 873.88, 863.87, 861.85, 846.8, 834.55, 824.72 ] )
(April, [775.5 , 778.75 , 786.06, 804.25, 813.46, 804.54, 795.01] )

Now Reducer function loop through this data and find out the highest price for last one year.
(May , 873.88)

Data flow:
input data > Mapper function > Shuffle(sort/ group) > Reducer function > output file


4 comments:

Unknown said...

This is a really cool post i have ever read.I really love the way you explain all the things about the stock price and also about the share.Keep posting.We need it a lot... :)

Unknown said...


In Hadoop, MapReduce is a calculation that decomposes large manipulation jobs into individual tasks that can be executed in parallel cross a cluster of servers. The results of tasks can be joined together to compute final results.
Mapreduce program example
Hadoop fs command using java api

sharemarketedo said...
This comment has been removed by a blog administrator.
Sonal Jain said...
This comment has been removed by a blog administrator.