Data warehousing architecture takes logical turn in big data era

Analysis & Advice

Data warehousing architecture takes logical turn in big data era

Hybrids are all the rage in automotive circles, but the term is also gaining currency in data warehousing. A new style of hybrid or logical infrastructure, combining traditional enterprise data warehouses with emerging big data technologies, is being eyed to optimize how organizations process, manage and gain insights from their burgeoning stockpiles of both structured and unstructured data.

The advent of the big data phenomenon initially prompted visions of a separate, and maybe even dominant, data management environment for unstructured information. Some analysts and big data proponents went so far as to predict the demise of the enterprise data warehouse (EDW) as a central stockpile of business intelligence (BI) and analytics data. Now, though, the expectations have shifted: Instead of one technology replacing another, co-existence between the EDW, standalone analytical databases and big data systems such as Hadoop clusters and NoSQL databases is likely to be the name of the data warehousing architecture game going forward.

"Within the hybrid data ecosystem that we're dealing with today, the data warehouse is no longer the center of our data needs," said Shawn Rogers, vice president of research for BI and data warehousing at consultancy Enterprise Management Associates Inc. in Boulder, Colo. But, he added, the EDW "will continue to play a pivotal role in terms of storing and supplying information to make business decisions." There are various data resources that companies can make use of for BI and analytics, "and big data will just be one of them," Rogers said.

More on new strategies for data warehousing

Watch a video Q&A with consultant Shawn Rogers on what he calls the hybrid data ecosystem

Read about why the emergence of big data is requiring new thinking on data warehousing

Listen to a podcast to get tips on moving beyond hub-and-spoke data warehouse design

The way Rogers and other data warehousing analysts see it, big data environments typically will become an extension of the EDW, with processing workloads and data storage matched to the appropriate resource pool depending on business requirements and the type of data involved. For example, in many companies the EDW will continue to be the place where well-integrated and high-quality structured transactional data is accessed to enable business reporting and ad hoc querying along well-defined dimensions. Big data systems will be used to store, process and analyze rawer data -- primarily unstructured or semi-structured information, such as social media posts and activity reports, Internet clickstream data and machine-generated data captured from application and Web server logs, network monitoring devices and sensors.

Buy or build your own on BI and analytics

"Think of the EDW as the retail store where people pick up data that's organized and packaged and ready for them to accept," explained Ron Bodkin, president of Think Big Analytics, a consulting and professional services firm in Mountain View, Calif., that focuses on big data analytics and other forms of advanced analytics. "The big data environment becomes the factory where people go in and work with raw materials to create new things and experiment to find out what's valuable."

In a research report released last year, Gartner Inc. analysts Mark Beyer and Donald Feinberg disputed the notion that the advent of big data marked the end of the road for the enterprise data warehouse, predicting instead that the EDW would morph into what Gartner is calling the logical data warehouse (LDW). In the report, they said that instead of focusing on the physical data warehousing infrastructure, the LDW concept is centered on data processing and management logic.

In an interview, Beyer characterized the LDW as an information management and access engine more so than a data repository -- a scenario that he said requires a complete rethinking of how data is managed and where in a company's technology architecture different types of data should be processed to best support transformation, integration and analysis processes.

"The shift moves the focus from being a repository first and a data services engine second to being an information services platform first and a repository that's just one way to [manage and store data]," Beyer said. In an LDW setup, he explained, processing would take place in a separate data management layer as opposed to the traditional manner of doing that within individual systems.

EDW-big data mix means major changes

That kind of extended data warehousing architecture can provide organizations with much greater flexibility for orchestrating the storage and use of their data assets, according to Gartner -- but companies likely will have to make big changes to implement the new approach.

"The basic premise behind the new data warehouse is that it will combine the strengths of every engineering approach previously used to create a variety of architectural styles into a new model that supports easy switching between styles or a hybrid of diverse delivery approaches," Beyer and Feinberg wrote in their report. "Existing architectures must be altered radically to meet these new demands."

Most companies, even the early adopters of Hadoop and other big data technologies, are still in the formative stages of their big data management and analytics initiatives. At this point, many are feeling their way around an integration strategy in an effort to keep their EDW, analytical database and big data environments from becoming separate data silos that don't adequately serve the information needs of the business.

The real challenge, Rogers said, lies with masking all of the integration complexity from business users, who ultimately just want to be able to access data that can help them make better and more informed decisions. "The issue is how to maintain a view for end users that makes it as transparent as possible, because they don't want to know whether they're talking to big data systems, Hadoop clusters or the EDW," he said. "They simply don't care."

Beth Stackpole is a freelance writer who has been covering the intersection of technology and business for more than 25 years for a variety of trade and business publications and websites.


This was first published in September 2012