Today, the term ‘database’ is often used interchangeably with ‘RDBMS’. Hierarchical database models come with their attached inflexibility (in terms of physically arranged data in parent and child nodes). As a result of these limitations, organizations now opt for migration of data to a more flexible and easy-to-use RDBMS, where data is represented as an entity (and attributes).
Select the right RDBMS
Before database migration to a RDBMS platform, it is necessary to carefully evaluate and choose the right vendor. While the basic features are common, multiple vendors differ in terms of their support for:
• Symmetric multiprocessing (SMP) and massive parallel processing (MMP) architecture.
• Procedural language.
• Utilities to download and upload data as well as capture changed data.
• Audit management.
• Performance and workload management. Some of the features to look out for on this front include materialized views, index, star-joins, and resource allocation.
• Specific data types and their limits.
• Support for online analytical processing and ETL.
A nine step migration strategy
After the platform has been chosen, your database migration process should comprise of the following steps:
Database migration step 1: Define a migration strategy that includes phasing (iterations), strategy on historical data, data quality management, validation methods, and decommissioning of legacy systems.
Database migration step 2: Design relational database objects based on application requirements and best practices of the target RDBMS platform.
Database migration step 3: Perform a reverse engineering exercise of your legacy data system to not only understand record layout, but also record creation logic, as it is not just the data, but the logic it represents that needs to be migrated to a new platform.
Database migration step 4: Mapping source to target structures and identifying rules for converting source data to fit target data structures.
Database migration step 5: Profiling source data to understand technical and functional compatibility with the target structures.
Database migration step 6: Designing a reliable migration process, consisting of extracting data from legacy systems, validating, cleansing, transforming it to fit the target data format, and loading it into the target database.
Database migration step 7: Building migration scripts for smaller volumes of data that don’t require major transformations; however, for more sophisticated transformations, ETL tools might be useful.
Database migration step 8: Conducting a mock run using production quality and volume data and validating results to take a final call.
Database migration step 9: Running the data migration scripts on production environment and migrating completely to the RDBMS.
Pointers for effective data migration
The success of a database migration project hinges upon a few other action points:
1. Setting up a data governance and migration program process at the onset to handle change management, data quality, historical data migration, data mapping conflicts, and testing data requirements.
2. Migration should not be viewed as a ‘technology exercise’; new functionality should be identified and implemented in the system to facilitate business buy-in.
3. Migration should be a controlled and repeatable process to be executed in a phased, iterative manner. Complex migration should be preceded by simpler pilots.
4. Target data structure should not be a replica of the source, but must be based on application requirements and best practices of the target platform. Audit columns should be added to enable identification of migrated data.
5. Special attention should be paid to master data as RDBMS facilitates better master data management. Redundancies from the hierarchical model must be removed during the migration process.
6. Based on regulatory requirements, appropriate history should be retained, as it not only reduces effort in terms of data to be profiled and migrated, but also reduces transformation complexity.
7. A low risk decommission plan that includes parallel runs is advisable.
8. Test data strategy should be based on the data quality. The age and volume of data used for testing should not be random and must be based on data requirement, data profiling results, and criticality of migration performance. The more critical the application, the more imperative it is to use production quality and volume data for testing.
Owing to fundamental data architectural differences between relational and other databases, legacy data migration can be a complex task, fraught with risks. However, these can be managed and mitigated with a well-defined database migration strategy and validation plan.
About the author: Venkat Iyer is the Director of Building Information Modeling (BIM) at Capgemini. Bharati Venkataraman is the Senior Manager for BIM Solutions at Capgemini, India.
This was first published in October 2010