Seriously, there are some core functions that most ETL tool vendors offer today. These include:
- The ability to deal with multiple input data formats (such as flat files, SQL databases, SAS data sets, etc.)
- Support of a GUI interface that allows the "drag and drop" of data elements from source to destination
- Built-in data profiling and analysis functionality allowing the examination of source data
- Built-in data transformation functions (such as data type conversion, date reformatting, etc.)
- Support for data quality/cleansing functionality (either built-in, or via interfaces to popular data quality tools like Group 1, DataFlux, or Business Objects)
- Metadata support, allowing for the capture and identification of data transformation and business rules
- Job scheduling functionality
- Version control of the ETL maps
- The ability to manipulate data either inside or outside the target database (aka: ETL versus ELT)
What you should avoid
My ETL guys suggest staying away from code-generating ETL tools, which are difficult to troubleshoot and de-bug. They also recommend steering clear of ETL tools that don't provide error tracking and logging functionality.
And a final thought: ETL isn't just for breakfast anymore. What I mean by this is that ETL isn't exclusive to data warehouse projects, but can solve a host of data movement and transformation problems. Best of luck with your tool selection!
For more information on ETL tools
This was first published in August 2010