News

A brief history of ETL

Bill Inmon

This article originally appeared on the BeyeNETWORK.

The early adopters of data warehousing included telecommunications organizations, insurance companies, banks and retailers. Then, after a while, data warehousing

Continue Reading This Article

Enjoy this article as well as all of our content, including E-Guides, news, tips and more.

spread to many other companies.

Companies that were ripe for data warehousing:

  • ran transactions,

  • had a diversity of users,

  • were large, and 

  • were in a competitive marketplace.

As a rule, government organizations were the last to engage in data warehousing.

On March 15, 1995, Prism Solutions went public on the NASDAQ exchange (PRSM). And then, there were even more competitors that entered the market space. One entrant was Ab Initio. Ab Initio specialized in the movement of very large amounts of data.

In a related space was enterprise application integration (EAI). EAI has many of the capabilities of a data warehouse extract, transform and load (ETL) tool when it comes to moving data about the corporation. However, EAI falls short when it comes to handling transformation and metadata management. Nevertheless, there is some degree of overlap between the worlds of ETL and EAI.

As data warehousing became accepted as a standard industry practice, the deal sizes of Prism, Informatica and others began to expand. Originally priced at $25,000, ETL sales soon progressed to site sales. When priced for a site, the sale price increased to $1,000,000. Soon, Prism and Informatica were only interested in large deals.

ETI formed an alliance with IBM. Carleton was merged with Oracle. And, after experiencing an unstable stock market (in the middle of the dot-com bust), Prism Solutions – under Warren “Bunny” Weiss – was purchased by Ardent. Prism was bought for less than the market price, thus creating a take under rather than a take over.

The succession of ownership after this point was rapid. After buying Prism Solutions, Ardent was bought by Informix. Soon thereafter, IBM bought Informix. Then, IBM spun off Ascential, after funding Ascential with $1 billion. A few years later, IBM repurchased Ascential.

In the meantime, other consolidation was taking place. Business Objects purchased Acta.

Additionally, other companies began to produce their own versions of ETL. Microsoft produced a product that it represented as an ETL product. That product was named DTS. SAP produced their own version of ETL, and Oracle produced their version of ETL. The SAP, Oracle, and Microsoft products were not designed for general-purpose ETL in the sense that the target platform was not unlimited. For example, SAP’s ETL only produces output bound for SAP BW.

In addition, SAP made a deal with Ascential to supply non-SAP data into an SAP environment.

By this point in time, the deal size of ETL was just explosive. What had originally started at $25,000 was far beyond that.

In addition, many other features had been added to the original software. There was parallelism, automated metadata capture, free-form coding, and outputs to Java and other languages.

By this time, data warehousing had begun to emerge as a concept that was applicable to companies that were medium sized and small, and there was a gap in the marketplace. While there certainly was a need for ETL, the deal size was so large that midsize and smaller companies could not afford the ETL software.

In 2000, a new type of ETL was introduced. This was ETL for the midsize marketplace. The leader in this space is Talend. Talend has a functional ETL tool set, but at open systems prices. This means that there is affordability for the midsize world. Talend offers its basic kernel for free. The basic kernel can be downloaded from the Internet. Sitting on top of the Talend basic kernel are other features and services.

Talend fits into the marketplace with good functionality at a price significantly below that of any other competitor. This is indeed good news for the midsize companies who need ETL but who do not need the price tag of a full-blown ETL package offered to and used by much larger companies.

Looking into the future is the prospect of ETL for unstructured data. To date, unstructured data has been relegated to a world of its own. But increasingly, shops are recognizing that by moving unstructured data to the structured world, the organization can take advantage of the infrastructure for analysis that has already been built. Stated differently, organizations have already invested large amounts of money in a business intelligence infrastructure. By moving unstructured data to the existing business intelligence environment, much money and effort can be saved.

 Bill Inmon

Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations. Bill can be reached at 303-681-6772.