There's no question that data integration technology is a good thing. So why aren't businesses using it as much as they should be?
Data integration software has evolved significantly from the days when it primarily consisted of extract, transform and load (ETL) tools. The technologies available now can automate the process of integrating data from source systems around the world in real time if that's what companies want. Data integration tools can also increase IT productivity and make it easier to incorporate new data sources into data warehouses and business intelligence (BI) systems for users to analyze.
But despite tremendous gains in the capabilities and performance of data integration tools, as well as expanded offerings in the marketplace, much of the data integration projects in corporate enterprises are still being done through manual coding methods that are inefficient and often not documented. As a result, most companies haven't gained the productivity and code-reuse benefits that automated data integration processes offer. Instead, they're deluged with an ever-expanding backlog of data integration work, including the need to continually update and fix older, manually coded integration programs.
Even large companies that use automated tools to integrate and load data into their enterprise data warehouses are still relying on homegrown SQL scripts to load data marts, online analytical processing cubes and other data structures used in BI applications. And as you might expect, small and medium-sized businesses aren't widely using integration tools.
I think the biggest reason more organizations aren't taking advantage of data integration technology is they don't fully understand what it can do. Let's clear up some misconceptions about that.
Stuck in the integration past
Many IT managers don't realize how far data integration software has come in recent years. And it did have a long way to go. The first generation of ETL tools were simple code generators that were expensive and had limited functionality. Lots of companies that evaluated them found it more effective to develop their own custom integration code.
Second-generation ETL products offered more functionality, but they were primarily batch-oriented and didn't perform well. Based on those two sets of tools, many people in IT were left with the feeling that ETL software wasn't worth the effort to learn and wouldn't be able to meet their performance needs.
But what IT professionals should realize is that the current generation of data integration offerings consists of full-fledged suites that include ETL, enterprise application integration, real-time integration and data virtualization functionality as well as data cleansing and data profiling tools. The suites can support data integration processes in traditional batch mode or in real or near real time through the use of Web services. Built-in best practices can help improve both the software's performance and user productivity.
Meanwhile, vendors specializing in technologies such as data virtualization and complex event processing have emerged to offer more targeted alternatives to the suites. At this point, there's no good reason to be stuck in the past about the capabilities of automated integration tools.
SQL not the answer to all integration questions
Another common misperception is that manual SQL coding is sufficient to perform all data integration tasks. Although there is no shortage of people who can crank out SQL code, the reality is that data integration is often a much more complex undertaking than merely writing a sequential string of SQL statements. Manually written integration scripts can be laborious to create and usually do not scale or age well.
Over the years, developers with extensive experience working for software vendors have designed sophisticated workflows and data transformation routines to handle the myriad types of data integration that most enterprises need. IT and data management pros doing manual coding in user organizations typically don't have the same level of experience. In essence, instead of leveraging reusable integration workflows and transforms, these SQL coders are starting from a blank slate on each project.
Another issue is that IT departments generally do not invest in effective data integration training. This is a problem, even when IT groups do choose to use automated tools. Although they may engage in tool training, they neglect to learn data integration best practices or make an effort to fully understand how integration processes work. Without that kind of understanding, companies can't maximize the value of their data integration tools. Some end up just going back to relying on manual coding.
Because of these misconceptions, data integration in many organizations continues to be laborious and time-consuming -- much more so than it needs to be. To make matters even worse, enterprises can't truly leverage the data at their disposal, and they often are forced to invest in upgrading and expanding their IT infrastructures to support ineffective or inefficient data integration processes.
Fortunately, there are many capable tools available today and plenty of IT pros and consultants who are well versed in sound integration techniques and practices. But companies have to recognize that there's a problem, and viable solutions to that problem, before they can take advantage of what's out there to help break the data integration backlog.
ABOUT THE AUTHOR
Rick Sherman is the founder of Athena IT Solutions, a consultancy in Maynard, Mass., that focuses on BI, data integration and data warehousing. He is also an adjunct faculty member at Northeastern University's Graduate School of Engineering, and he offers independent analysis on his blog, The Data Doghouse. Email him at firstname.lastname@example.org.
This was first published in August 2012