Q: Can big data challenges be tackled data analytics techniques? What has been your experience about this at Yahoo!?
A: Lately the industry is observing an explosion in data growth with the rising popularity of Web 3.0 and social media. This data cannot be overlooked. Companies need to use some data analytics techniques to get insights about consumer behavior and industry trends. Given the explosive data being generated on social media, huge amounts of data (big data) needs to be analyzed.
Big data, as the name suggests, cannot be handled by traditional RDBMS. Data analytics techniques like scalable systems and software are needed to analyze terabytes and petabytes of data. To give an example, Yahoo! processes petabytes of data on Hadoop systems. Many business units have loosely implemented the data warehousing STAR model to be able to explore data and answer unknown business questions. Instead of investing in proprietary, commercial solutions which may not scale, Yahoo! chose the path of Hadoop to tackle exploration of petabytes of data and gain industry insights and consumer behavior.
We all know that Grid/ Hadoop is not yet mature enough to be able to replace traditional RDBMS systems for data warehousing needs. It requires huge investments in programmers to be able to explore data analytics techniques on it. Your focus should not completely shift to new data analytics techniques as they still lack features that the traditional RDBMS systems have (the ones that help in analyzing data quickly and with minimal effort). Engineers have to develop custom code or user-defined functions for their analysis needs. The adoption of data analytics techniques using open source technologies may entail these as the upfront costs.
This was first published in September 2011