News

Data marts as the basis of a data warehouse: Can this structure work?

Bill Inmon, BeyeNetwork Contributor

This article originally appeared on the BeyeNETWORK.

A corporation discovers that building independent data marts wasn’t what the corporation needed. There is no integrated data and no foundation for reusability. There

Requires Free Membership to View

is a very high cost to the infrastructure because each department has the same amount of detailed data. This requires a small army of programmers building and maintaining code that pulls data from the operational environment. However, the window for pulling data from the operational environment has become nonexistent. Other than that not much is wrong.

Now the organization wakes up and smells the coffee and discovers that it needs to build a real data warehouse.

What shortcuts can the organization take? What can be salvaged from the independent data marts that have been built?

In terms of physical infrastructure, not much can be salvaged. But in terms of the logical infrastructure, there is a fair amount that can be saved.

This doesn't necessarily mean that the existing data marts must be thrown away (although in some cases that is exactly the case!). Instead, the data marts can continue their existence and when the data warehouse becomes available, the data marts will take their feeds of data from the data warehouse. This process is a bit more complicated—as some data needed by the data marts may not be available, or may only be available in raw form. But in general, the existence of a data warehouse is tangential to the life of the data mart.

What can be gleaned from the independent data mart experience is to use it as a basis for gathering requirements for the data warehouse design. In that regard, the independent data marts are excellent guidelines for information requirements. Of course information requirements outside of the world of existing independent data marts must be factored into the design as well.

The data warehouse still has to be built. The data warehouse has to be populated and integrated. As a rule, none of this work can be taken from the existing independent data marts.

Now, what about the idea of taking a large and populated data mart and expanding it into a real data warehouse? In my opinion, under normal circumstances, this is an exceedingly bad idea.

Suppose you decided to engineer and manufacture your own car. You start with a Cadillac engine and frame because you believe it will impress your friends. But then you add a turbo charger, rear spoiler, treads instead of tires, fifties wings etc. Now it really isn’t a car. Now no one knows what it is.

Are your friends still impressed? You no longer have a Cadillac but an unusual vehicle of use only to you.

Data marts are similar. They are optimized to look at data in a unique way. If you want to look at the data in a different manner, then you need another structure. Trying to take an existing structure and retro-fit it into a general purpose structure is as ridiculous as trying to make a Cadillac into something is unique and servers only your needs.

A corporation discovers that building independent data marts wasn’t what the corporation needed. There is no integrated data and no foundation for reusability. There is a very high cost to the infrastructure because each department has the same amount of detailed data. This requires a small army of programmers building and maintaining code that pulls data from the operational environment. However, the window for pulling data from the operational environment has become nonexistent. Other than that not much is wrong.

Now the organization wakes up and smells the coffee and discovers that it needs to build a real data warehouse.

What shortcuts can the organization take? What can be salvaged from the independent data marts that have been built?

In terms of physical infrastructure, not much can be salvaged. But in terms of the logical infrastructure, there is a fair amount that can be saved.

This doesn't necessarily mean that the existing data marts must be thrown away (although in some cases that is exactly the case!). Instead, the data marts can continue their existence and when the data warehouse becomes available, the data marts will take their feeds of data from the data warehouse. This process is a bit more complicated—as some data needed by the data marts may not be available, or may only be available in raw form. But in general, the existence of a data warehouse is tangential to the life of the data mart.

What can be gleaned from the independent data mart experience is to use it as a basis for gathering requirements for the data warehouse design. In that regard, the independent data marts are excellent guidelines for information requirements. Of course information requirements outside of the world of existing independent data marts must be factored into the design as well.

The data warehouse still has to be built. The data warehouse has to be populated and integrated. As a rule, none of this work can be taken from the existing independent data marts.

Now, what about the idea of taking a large and populated data mart and expanding it into a real data warehouse? In my opinion, under normal circumstances, this is an exceedingly bad idea.

Suppose you decided to engineer and manufacture your own car. You start with a Cadillac engine and frame because you believe it will impress your friends. But then you add a turbo charger, rear spoiler, treads instead of tires, fifties wings etc. Now it really isn’t a car. Now no one knows what it is.

Are your friends still impressed? You no longer have a Cadillac but an unusual vehicle of use only to you.

Data marts are similar. They are optimized to look at data in a unique way. If you want to look at the data in a different manner, then you need another structure. Trying to take an existing structure and retro-fit it into a general purpose structure is as ridiculous as trying to make a Cadillac into something is unique and servers only your needs.

A corporation discovers that building independent data marts wasn’t what the corporation needed. There is no integrated data and no foundation for reusability. There is a very high cost to the infrastructure because each department has the same amount of detailed data. This requires a small army of programmers building and maintaining code that pulls data from the operational environment. However, the window for pulling data from the operational environment has become nonexistent. Other than that not much is wrong.

Now the organization wakes up and smells the coffee and discovers that it needs to build a real data warehouse.

What shortcuts can the organization take? What can be salvaged from the independent data marts that have been built?

In terms of physical infrastructure, not much can be salvaged. But in terms of the logical infrastructure, there is a fair amount that can be saved.

This doesn't necessarily mean that the existing data marts must be thrown away (although in some cases that is exactly the case!). Instead, the data marts can continue their existence and when the data warehouse becomes available, the data marts will take their feeds of data from the data warehouse. This process is a bit more complicated—as some data needed by the data marts may not be available, or may only be available in raw form. But in general, the existence of a data warehouse is tangential to the life of the data mart.

What can be gleaned from the independent data mart experience is to use it as a basis for gathering requirements for the data warehouse design. In that regard, the independent data marts are excellent guidelines for information requirements. Of course information requirements outside of the world of existing independent data marts must be factored into the design as well.

The data warehouse still has to be built. The data warehouse has to be populated and integrated. As a rule, none of this work can be taken from the existing independent data marts.

Now, what about the idea of taking a large and populated data mart and expanding it into a real data warehouse? In my opinion, under normal circumstances, this is an exceedingly bad idea.

Suppose you decided to engineer and manufacture your own car. You start with a Cadillac engine and frame because you believe it will impress your friends. But then you add a turbo charger, rear spoiler, treads instead of tires, fifties wings etc. Now it really isn’t a car. Now no one knows what it is.

Are your friends still impressed? You no longer have a Cadillac but an unusual vehicle of use only to you.

Data marts are similar. They are optimized to look at data in a unique way. If you want to look at the data in a different manner, then you need another structure. Trying to take an existing structure and retro-fit it into a general purpose structure is as ridiculous as trying to make a Cadillac into something is unique and servers only your needs.

 

Bill Inmon is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.

<< Indian EDW market trend

Tutorial home page >>