I had a chance to talk to a Big Data sales manager 5 years ago.
He said, “Hadoop is a tough sale on the East Coast. Hoping Spark will help.”
Some of the reasons Hadoop has lost its lustre:
- Commercial Hadoop is licensed per server ($5,000+) times the number of cluster nodes ($millions)
- Hadoop requires rewriting any current reporting jobs using Java/Mapreduce. Large companies tend to not formally budget for maintenance or re-QA of existing applications
- Hadoop jobs tend to be duplicated for different departments and salespersons compared to dedicated internal reporting projects. Yahoo! went from 10 servers to around 1,000 Hadoop nodes for one of their datawarehouses
- Google, the inventor of Mapreduce, used C, which resulted in 3x faster results than Java, and have moved onto other graph systems, like Pregel
- vendors declined to monetize their distributed file system
DBA Pro Tip: Don’t use Hadoop, instead use summary tables and aggressive data retention.
Cloudera plummets 40% after CEO abruptly departs and company cuts forecast
An Update from MapR “As a privately held company we are unable to provide forward-looking statements regarding financial performance.” – Really?