A Data Science Central Community
Many data warehousing players have built Hadoop connectors, but EMC went for a distribution so it could improve query response time.
EMC on Monday rolled out its own distribution of Hadoop in a move that integrates the open source big data software directly with its Greenplum intellectual property. The aim: Take on Cloudera.
The distribution, called Pivotal HD, is notable because it puts EMC in competition with Cloudera, which has a bevy of partners and is often seen as the Red Hat of big data. Many data warehousing players have built Hadoop connectors, but EMC went for a distribution so it could improve query response time. EMC's Pivotal HD could also drive sales of its Greenplum software and appliances.
Josh Klahr, vice president of products at EMC Greenplum, wasn't shy about the Cloudera comparison:
We want to be competitive with Cloudera. When we beta (Pivotal HD) with customers we've been able to stop a Cloudera purchase decision. Every account we go into there's increasing interest and adoption of Hadoop. The interest ranges from experimental to large production deployments.
Klahr noted that Pivotal HD has parts of Apache Hadoop, value added from the 100 developers EMC has on the project and proprietary database tools.
Among the key points about Pivotal HD:
EMC said the rationale for its own distribution is that Hadoop interfaces in the enterprise aren't up to snuff and connectors are too slow. With Greenplum, EMC is looking to bring components and tools to bridge big data and business intelligence software via SQL.
The biggest issue for the big data market is that Hadoop distributions are piling up. Cloudera, IBM and Hortonworks are a few key players and the field is growing.
Pivotal HD will be available as software only or embedded with an appliance.