top of page

MapReduce

MapReduce is a programming paradigm where the computation takes a set of input key/value pairs, and produces a set of output key/value pairs. The user of the MapReduce library expresses the computation as two functions: Map and Reduce.

Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs. The MapReduce library groups together all intermediate values associated with the same intermediate key I and passes them to the Reduce function.

The Reduce function, also written by the user, accepts an intermediate key I and a set of values for that key. It merges together these values to form a possibly smaller set of values. Typically just zero or one output value is produced per Reduce invocation. The intermediate values are supplied to the user’s reduce function via an iterator. This allows us to handle lists of values that are too large to fit in memory.

Consider the problem of counting the number of occurrences of each word in a large collection of documents. The user would write code similar to the following pseudo-code:

map(String key, String value): // key: document name & value:

document contents

for each word w in value:

EmitIntermediate(w, “1”);

reduce(String key, Iterator values): // key: a word & values: a list of

counts

int result = 0;

for each v in values:

result += ParseInt(v);

Emit(AsString(result));

The map function emits each word plus an associated count of occurrences (just `1′ in this simple example). The reduce function sums together all counts emitted for a particular word.

RECENT POSTS

FEATURED POSTS

Check back soon
Once posts are published, you’ll see them here.

FOLLOW US

  • Grey Facebook Icon
  • Grey Twitter Icon
  • Grey Instagram Icon
  • Grey Google+ Icon
  • Grey Pinterest Icon
bottom of page