Big Data is the hot new buzzword in IT circles. The proliferation of digital technologies and digital storage and recording media has created massive accumulations of diverse data (Big Data), which can be used for marketing and other purposes. This tip gives you a quick overview of Big Data.
What is Big Data
Big Data refers to massive, often unstructured data that is beyond the processing capabilities
of traditional data management tools. Big Data can take up terabytes and petabytes of storage space in diverse
formats including text, video, sound, images, and so on. A good example would be websites like
Facebook or Twitter, which have data growing by the day. Traditional relational database management
systems cannot deal with such large masses of data.
Kinds of Big Data
Big Data consists of such data as search indexes, image and video archives, social networks,
research data generated by R&D centers, weather and surveillance data from satellites and other
sources, archives of all kinds: company records, medical records, the data generated in such
data-heavy fields as astronomy, genomics, economics, and so on. All this data is now being stored
digitally, leading to massive accumulation of digital data.
Technological impact
Big Data requires vast storage capacity and new kinds of data mining tools to make it accessible and useful. Major data storage vendors
such as EMC, IBM, Hitachi are developing new products to meet Big Data needs. Companies such as
Greenplum (EMC) are investing heavily in Big Data mining tools. Big Data mining tools require parallel processing capabilities and storage media with high data throughput
rates.
Impact of Big Data on datacenters and datacenter professionals
- Big Data throws up vast quantities of data to work with. Datacenters will see larger workloads due to Big Data.
- Datacenter professionals must store, process, and secure what is literally an explosion of
data. They may need to develop new skills to stay competitive.
What to consider while evaluating Big Data products
- Data storage vendors have begun introducing Big Data products. Pay attention to the maximum
file system size supported, and the maximum data throughput rate of the storage media being
offered.
- Focus on data transfer rate instead of the IOPS (input/ output operations per second) of the
storage media. High IOPS makes better sense when dealing with smaller quantities of data.
- Big Data needs software tools that can process data significantly faster than traditional data
mining tools. It is recommended to ensure that the Big Data hardware and software are mutually
compatible. Vendors may recommend using specific Big Data hardware and software together, which has
been tested and certified by them for compatibility. For optimum performance, it is also
recommended to purchase both solutions from the same vendor.
The future of Big Data
- The pool of digital data is expected to grow steadily. New digital data is being generated
every day by individuals on the internet, by governments and businesses, by universities and
research laboratories, by media houses, and by organizations of every size.
- There is a demand for a new generation of storage media and analytical software that can handle
the vast storage and processing requirements of Big Data.
- Big Data is piped to be the hot new IT trend for 2011.
![]() |
|
| Anuj Sharma | |
About the author: Anuj Sharma is an EMC Certified and NetApp accredited professional. Sharma has experience in handling implementation projects related to SAN, NAS and BURA. He also has to his credit several research papers published globally on SAN and BURA technologies.
This was first published in March 2011
