Google's MapReduce patent and the future of Hadoop and CouchDB
Recently Ars Technica published a good article about Google being awarded a Software Patent (by USPTO) that covers the principle of distributed MapReduce.
The importance of this event lies in the fact that many of todays leading software companies use MapReduce based projects. It is slightly scary for these players especially the users of the Hadoop and the CouchDb projects (refer below for an introduction to these). Hadoop has been a quite popular open source implementation of the MapReduce framework and is used by Yahoo, Amazon, IBM, Facebook, Rackspace, Hulu, the New York Times and many other companies.
But as the article suggests its is very unlikely that Google will go after projects like Hadoop to enforce its patent as that could lead to a backlash (both from software companies and developers) that might be tough to handle.
For those who don’t know about Map Reduce, it a software framework introduced by google in a paper in 2004. As the paper specifies ”MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key“.
Google Code University has a lecture series here on Cluster Computing and MapReduce .
Hadoop is a free licence Apache project that is a Java software framework that supports data-intensive distributed applications.
Quoting Wikipedia: It enables applications to work with thousands of nodes and petabytes of data. It was originally developed to support distribution for the Nutch search engine project. In 2008, Yahoo launched the world’s largest Hadoop production application that runs on a more than 10,000 core Linux cluster and produces data that is now used in every Yahoo Web search query.
CouchDB is another interesting Apache project. It is an open source database that is not relational. Instead of storing data in rows and columns, the database manages a collection of JSON documents. Basically it is a document database server, accessible via a RESTful JSON API.


No Comments, Comment or Ping
Reply to “Google's MapReduce patent and the future of Hadoop and CouchDB”