Data quality management is a crucial part of any data integration process. It may be considered the first step to the integration process, as quality data is the key to achieving profitable insights. The data integration analysis will not be successful until good data quality processes are in place.
Business intelligence (BI) relies heavily on dashboards and analytical tools, which require data to be integrated from various source systems. Before the integration, data quality management is a must. The following tips will help you draft a competent data quality management policy for your organization.
Champions should be appointed to evangelize data quality management
1) Make data quality management part of the data lifecycle
Data quality management should be an ongoing effort and not just a one-time project. Before undertaking data integration, the condition of the data ought to be checked and it should be raised to a minimum level of quality.
Many tools are available today that help in deciphering the state of data. Yet, manual data quality management will also be required in the early stages of data lifecycle as well as for maintenance.
Manual data quality management requires physically examining a sample data and retracting it back to the data warehouse or source system for correction. Depending on the analytical need, the data will be treated accordingly.
For example, to analyze sales reports pertaining to certain geographies, the numeric data about the specific areas will have to be consolidated. For a report detailing the causes of a product not faring well, a variant set of data will have to be analyzed that would include customer spending habits, social factors, and others.
2) Good data governance demands data quality management
The primary objective of data governance is to rid the BI process of unclean data. Data quality management improves the quality of reports. Wherever the data is captured, the system owners need to have a strict governance program to tackle the issue of soiled data from the source point. The people dealing with the source systems should also be educated so that they may incorporate data quality management in their processes.
The top management has to ensure that the governance structure is followed by the application owners. It plays a significant role in base lining the minimum standard required for data quality management.
3) Dealing with duplicate data through data quality management
The issue of duplicate data will be taken care of if an organization follows a good master data management (MDM) practice. As a non-transactional data process, MDM will define the condition of data flow in an organization. This would, in turn, imply that there is a good data governance structure in place.
4) Assign data stewards for good data quality management
A data steward would ensure good data quality management from the source systems to the downstream applications to finally the data warehouse.
This need not be a fixed or sole role but an additional responsibility of the team leader. The data stewards would have to spread awareness regarding the benefits of efficient data quality management and the importance of good data governance. People handling source systems may resist any change in their processes. The importance of this endeavor has to be communicated.
The organization needs to appoint people who will take this role as an evangelist and carry on the data quality management process.
5) Data profiling essential to data quality management
Data profiling would depend on the type of data and the cardinality of it. Data quality management and profiling go hand in hand. Profiling is a crucial part of the data quality management process, as it indicates the data that needs correction. Automated tools are available to profile data, but these do not ensure good data quality. Right processes and good governance are also important.
About the Author: Bijoy Abraham is IT Leader - BI/DW at CSC. He has over 12 years of hands-on experience in architecting, managing and delivering BI, data warehousing, information management, information governance, and metadata management projects for large global client
(As told to Sharon D’Souza)
This was first published in February 2011