Two of the top vendors in the rise of big data plan to come together in a deal likely to shake up Hadoop and other open source data processing frameworks — and leave big data users with fewer technology options.
The Cloudera-Hortonworks combination further narrows the number of commercial Hadoop distributions available to users, following previous dropouts by IBM and others over the past few years.
Cloudera-Hortonworks aims at cloud rivals
But it also likely will enable the new company to compete better against not only fellow Hadoop pioneer MapR Technologies, the other remaining independent vendor, but also Amazon EMR and the Google Cloud Dataproc managed service in the cloud. Microsoft also offers a Hadoop-based managed service in its Azure cloud, although the Azure HDInsight technology is based on the Hortonworks platform.
The plan, which was approved by both Cloudera and Hortonworks boards, calls for Cloudera shareowners to hold about 60% of the combined company. Still, both Cloudera CEO Tom Reilly and Hortonworks CEO Rob Bearden suggested the Cloudera-Hortonworks merger should be seen as a combination of equals. Reilly will serve as CEO after the merger is finalized, while Bearden will be on the board of directors but not have an operational role.
Both companies faced mounting challenges as publicly traded companies providing complex and ever-changing technology. Neither is profitable, and in recent years, cloud implementations of Hadoop have required both vendors to start to develop microservices versions of their platforms that embrace cloud storage architectures, such as the Amazon Simple Storage Service (S3).
Old rivalry put to rest
The rivalry between two of the earliest commercial Hadoop producers has been heated at times.
“We have been competitive for many years, and as competitors you get a lot of emotions. But competing has made our combined entity a better company,” Reilly said in a conference call with investment analysts and reporters. “I am confident that we will bring our two companies together because we both respect one another tremendously.”
As technology rapidly evolved in recent years, what was first called Hadoop evolved, too, largely casting off the MapReduce software framework that originally defined it. It hasn’t been a smooth road for either users or vendors.
Both Cloudera and Hortonworks over time began to play down the role of Hadoop in improving large-scale analytics within corporations.
Rationalizing the Hadoop market
Meanwhile, a reduction in the number of independent Hadoop distribution providers makes sense, according to independent analyst Curt Monash.
“One company per major open source project is enough,” Monash said. “There were too many ‘Hadoop companies’.”
The notion of a permanent Hadoop market category is suspect, he added.
Providing cloud versions of big data analytics tools while competing with each other has been taxing for both Cloudera and Hortonworks, said Doug Henschen, a Constellation Research analyst.
Doug Henschenanalyst, Constellation Research
Moreover, the two companies have encountered competition in the form of the mainline cloud providers, chiefly AWS and Google, which have directed increasing attention to big data analytics applications built on Hadoop and related technologies.
“The move to the cloud by enterprises is sapping growth and revenue potential for Cloudera and Hortonworks such that both players can’t sustain strong and profitable growth,” Henschen said. “Amazon EMR and Spark services and similar Azure and Google services are seeing faster growth and, together, are capturing the lion’s share of the big data platforms market.”
In particular, the combination of Amazon EMR and S3 has put Cloudera and Hortonworks under pressure. In 2016, Gartner analyst Merv Adrian said AWS had become the largest Hadoop vendor based on number of users, with more than the combined total of all its rivals. Amazon’s growing market share pushed Cloudera and Hortonworks to make it easier to run their big data platforms in the cloud. That effort continues — for example, Hortonworks in June expanded its offerings on the Google, Microsoft and IBM clouds.
Past differences aside
Cloudera and Hortonworks have played important parts in growing a significant field, and they shouldn’t face large obstacles in rationalizing their different product lines, according to analytics industry veteran Thomas Dinsmore.
“The differences in their software are more around the margins. There are differences, but they are subtle,” said Dinsmore, senior director of competitor intelligence at DataRobot, a provider of machine learning and AI tools. DataRobot has partnerships with both companies that he said are expected to continue.
Dinsmore, who formerly worked at Cloudera, noted a difference in approaches to data science tooling between the two Hadoop vendors.
Cloudera offers the Cloudera Data Science Workbench, a data science platform that’s based on technology it acquired by buying startup Sense.io in 2016. On the other hand, beginning in 2017, Hortonworks has worked closely with IBM to provide a package of machine learning and AI tools based on the IBM Data Science Experience platform, Dinsmore said.
In the conference call about the merger, Hortonworks executives said the deal with IBM is scheduled to continue.
Executives from both companies said they would continue separately to sell products until the Cloudera-Hortonworks deal is finalized. They pledged to support current users’ products for at least three years.
The executives expect the merger to close early in 2019, at which time they would be working on a unified platform they said would combine the strongest elements of each company’s technology.
Reilly said the merger would position the combined company to join engineering efforts for new cloud versions of their software, and pointed to expected cost savings to be gained in the pairing, including optimization of R&D work and expected efficiencies in corporate functions.
While there is a commonality in their use of open source Apache software, there also have been differences in the Hortonworks and Cloudera approaches. For its part, Hortonworks has pointedly claimed to most closely hew to Apache open source standards. Conversely, Cloudera has tended to put an emphasis on enterprise practicality somewhat ahead of standards purity, as has MapR.
“The biggest beneficiaries of the Cloudera-Hortonworks pairing will be the downstream users of Hortonworks that use cloud big data services accessed through their partner clouds,” said analyst Mike Matchett, founder of the Small World Big Data consultancy.
Meanwhile, Cloudera users may gain solace in the potential of a code base “re-merge,” with Hortonworks’ brand of big data, Matchett said.