If so, that means the data flow could either be:
a) Source --> DM --> DW
b) Source --> DW --> DM
But it would never be:
a) Source <--> DM <--> DW
b) Source <--> DW <--> DM
However, I have seen designs where information is passed back to the source system from a data warehouse or data mart. So I need to understand: Are we breaking a principle if we do that, or is it acceptable?
After a while, it became apparent that the data warehouse held a far better customer list than any of the source systems did. You can probably see where this is going. We reasoned that if we sent the customer list from the data warehouse back to the source systems, then we could improve the quality of the data in the source systems themselves.
Take this one stage further. Suppose you have five source systems, all of which use customer data. From two of those systems you can create a good-quality, complete, up-to-date customer list. Why should you not create that list and feed it back to all five source systems? Well, we didn't have a good answer to that question, so we did it. And it worked.
Several years later, I began to hear about master data management (MDM) and decided I'd better read up on it and find out what it was. I discovered that we had invented it several years earlier! Of course, we hadn't really invented it; like many other people around the same time, we had simply figured out a sensible process that later became part of MDM.
Am I suggesting this approach to data flow is trivial to do? No. There are many issues that need to be dealt with, not the least of which is the control of database keys. Different source systems tend to use different keys. In a data warehouse, we normally add a surrogate key. How do you ensure that you supply the appropriate key values back to the appropriate source system? Answering that question can be non-trivial indeed. But ultimately, the gains to be had by feeding data back out of data warehouses can far outweigh the development work that's necessary to make it happen. As far as I'm concerned, there is simply no question: If the circumstances warrant doing so, it is perfectly good practice to send data from the warehouse back to your source systems.
This was first published in August 2010