MapReduce [ the algorithm that google uses for massively parallel computation) … is:
数据挖掘论坛
1. A giant step backward in the programming paradigm for large-scale data intensive applications
2. A sub-optimal implementation, in that it uses brute force instead of indexing 数据挖掘实验室
3. Not novel at all — it represents a specific implementation of well known techniques developed nearly 25 years ago
4. Missing most of the features that are routinely included in current DBMS 数据挖掘研究院
5. Incompatible with all of the tools DBMS users have come to depend on
数据挖掘交友
In related news, cars that run off of nothing but sunlight and air are 数据挖掘研究院
1. A giant step backward 数据挖掘工具
2. A sub-optimal implementation, in that they don’t use gasoline 数据挖掘论坛
3. Not novel at all — we had solar powered and compressed air powered cars 25 years ago 数据挖掘研究院
4. Missing most of the features that are routinely included in current gasoline powered cars 数据挖掘交友
5. Incompatible with all of the tools that gasoline-engine mechanics use 数据挖掘工具
Holy !@#$, these guys are dense.
数据挖掘研究院
The database community has learned the following three lessons from the 40 years that have unfolded since IBM first released IMS in 1968.
* Schemas are good.
* Separation of the schema from the application is good.
数据挖掘工具
* High-level access languages are good.
数据挖掘工具
MapReduce has learned none of these lessons and represents a throw back to the 1960s, before modern DBMSs were invented.
数据挖掘研究院
Look, I get their points. I like relational databases myself (or, rather, SQL-style databases, which a true database theorist will point out are not “true” relational databases).
数据挖掘研究院
…but arguing with success is kind of hard. I assert that it is objective truth that no relational database can possibly do the things that MapReduce does. 数据挖掘工具
To say that MapReduce stinks, because it “learned none of these lessons” is bunk. The Google guys are not dimwits. They clearly made a decision to trade off some features for others.
The feature of winning the search engine wars and making people into billionaires is a pretty good one, IMO.
数据挖掘论坛
The MapReduce community seems to feel that they have discovered an entirely new paradigm for processing large data sets. In actuality, the techniques employed by MapReduce are more than 20 years old.
数据挖掘实验室
Huh? They do? 数据挖掘研究院
MapReduce obviously has tons of ancestors, including vector processing.
数据挖掘工具
Who says that it’s a new concept? 数据挖掘实验室
数据挖掘论坛
4. MapReduce is missing features 数据挖掘研究院
All of the following features are routinely provided by modern DBMSs, and all are missing from MapReduce: 数据挖掘工具
* Bulk loader — to transform input data in files into a desired format and load it into a DBMS
* Indexing — as noted above
数据挖掘论坛
* Updates — to change the data in the data base
数据挖掘交友
* Transactions — to support parallel update and recovery from failures during update 数据挖掘论坛
* Integrity constraints — to help keep garbage out of the data base
数据挖掘交友
* Referential integrity — again, to help keep garbage out of the data base
* Views — so the schema can change without having to rewrite the application program
数据挖掘交友
In summary, MapReduce provides only a sliver of the functionality found in modern DBMSs.
Oh noz! 数据挖掘论坛
A clever five year old (?) tool is less polished and complete that some dusty hidebound, thirty year old alternative concept. 数据挖掘研究院
In related news, few of the kids getting admitted to MIT and CalTech this year have 401(k)s as well funded as typical fifty year old engineers. 数据挖掘论坛
5. MapReduce is incompatible with the DBMS tools
数据挖掘实验室
A modern SQL DBMS has available all of the following classes of tools: 数据挖掘交友
* Report writers (e.g., Crystal reports) to prepare reports for human visualization
数据挖掘论坛
* Business intelligence tools (e.g., Business Objects or Cognos) to enable ad-hoc querying of large data warehouses
* Data mining tools (e.g., Oracle Data Mining or IBM DB2 Intelligent Miner) to allow a user to discover structure in large data sets
数据挖掘工具
* Replication tools (e.g., Golden Gate) to allow a user to replicate data from on DBMS to another 数据挖掘实验室
* Database design tools (e.g., Embarcadero) to assist the user in constructing a data base.
数据挖掘论坛
True.
On the other hand, Modern SQL DBMS are incompatible with Google, Google Maps, Orkut, etc. 数据挖掘工具
I’m sure that the Google execs are ** so ** upset that Crystal reports doesn’t run on their data.
An “interesting” article by David J. DeWitt and Michael Stonebraker. 数据挖掘工具
If you wondered what getting put out to pasture by a bounch of young turks sounds like, this is it.