Enterprise Hadoop will need to work with existing processes

At Hadoop Summit 2013

Enterprise Hadoop will need to work with existing processes

Jack Vaughan

A lot of ground has to be covered for the Hadoop Distributed File System to become enterprise Hadoop, but some of the steps to that end are emerging.

At last month's

Continue Reading This Article

Enjoy this article as well as all of our content, including E-Guides, news, tips and more.

Hadoop Summit 2013 in San Jose, Calif., IT leaders from assorted industries said it is easy to get started using open source Hadoop clusters, but the next step is more difficult. They offered guidance for those who would move Hadoop from experiments to actual enterprise operations.

We expect hundreds and thousands of users on the Hadoop cluster.

Ramesh Koteshwar,
business intelligence architect, Salesforce.com

Implementers should start small, be prepared to bring in trainers and think upfront about how multipetabyte Hadoop output will become part of operations and regular analytical workflows, according to participants in a panel entitled "Real-world Insight into Hadoop in the Enterprise."

The general rush to try out Hadoop brings its own issues, said an IT manager at a large retailer.

"It can be daunting. You hear about all the things that Hadoop can solve. You get all this data, then you go off and you try to solve everything that you can think of," said panelist Ratnakar Lavu, a senior vice president for digital innovation at Kohl's Department Stores, based in Milwaukee.

Lavu said his group learned early on that small use cases were good places to get going.

"[Hadoop] is a whole new way of doing things. Start with something small that you can actually manage. It's about learning," he said.

Lavu also told would-be enterprise Hadoop implementers to be careful not to solve "problems that are already solved." Existing reports do not need to be redone in Hadoop just for the sake of changing platforms.

The Hadoop Distributed File System gained traction based on efforts of best-in-class systems programmers at top websites such as Yahoo, Google, Facebook and Twitter.

Moving this technology to enterprise operations takes different skills. Even Web stalwarts such as Salesforce.com have learned lessons while moving Hadoop into a support role for business-line decision makers at the company.

"When Hadoop comes to mind, too often it's only the data -- how big it is. But as you add more and more users, you have to think in terms of the compute [requirements] also. It is not just the storage," said Ramesh Koteshwar, business intelligence architect at Salesforce.

Looking forward, he anticipates a sizable part of the workforce asking questions about data garnered through Hadoop. "We expect hundreds and thousands of users on the Hadoop cluster," he said.

More on enterprise Hadoop

Find out about Hadoop use cases and features

Check out Wayne Eckerson on Hadoop irony

Listen to a podcast on Hadoop in storage

Security enablement is part of the process of bringing Hadoop to wider use in the corporation, he said. Hadoop use at Salesforce.com and elsewhere is very much still part of an exploratory procedure, and access and authentication are barriers that still must be hurdled on the track to enterprise deployment.

"When you 'productionize' [Hadoop], you need to think it over up front," Koteshwar said. "When you really want to bring it into the enterprise, you want to make sure there are security policies and processes in place in front of the Hadoop [cluster]."

Ratnaka Lavu concurred that the way you fit Hadoop systems into the overall organization is important. "It's about building the right processes and the right kind of systems, and the data feeds as well as the user training and adoption," he said. "Those are the pieces that enable us to be successful."

While there has been a lot to learn in Hadoop's early going, at least some of the frontier work has been done, one panelist suggested. That betokens a benefit in coming to Hadoop after more pieces of data infrastructure are moved into place.

"The starters of today are going to have a leg up on us," said Hadoop Summit panelist Neeraj Kumar. "We had to build a lot of ad hoc processes and solutions just because the previous versions of Hadoop lacked those features.

Kumar, who is vice president for enterprise architecture at Cardinal Health in Dublin, Ohio, agreed that teams should start small and should find a use case that is a "net-new capability" in the organization.

"You need to also understand the talent base of your own organization," he said, adding that Hadoop can be extremely disruptive to existing talent and creates a need to identify a new talent base.

He advised data managers to start thinking about Hadoop training issues early. Consultants can help, he said, but, "You do need talent on-site, on the ground." Would-be Hadoop implementers will have to make decisions with that in mind, according to Kumar.