March 1, 2023
You hear it almost as often as "cloud computing" these days. Around every corner of the internet is another headline talking about "Big Data", but what is it, exactly? When data sets grow so large and complex that they are difficult to manage with traditional databases or processing tools, that's Big Data. Almost every organization I talk to has their own definition of what they think encompasses Big Data: SMBs mention anything in the multi-terabyte range, enterprises are eyeing petabytes and exabytes, and meanwhile the government (like the NSA's massive Utah data center) are sorting through zettabytes and yottabytes. The use of cloud computing to manage Big Data is on the rise, too.
How much is a yottabyte? To store (not process, but save) a yottabyte of information on the most compact microSDXC cards, you would need enough cards to build the Pyramid of Gisa.
An infographic from Intel titled “What happens in an Internet Minute” helps put Big Data into perspective. This graphic illustrates that in one minute 639,800 gigabytes of global IP data is transferred, equivalent to 204 million emails sent, 61,141 hours of music streamed over Pandora, six million Facebook views from 277,000 logins, over two million Google searches, or 1.3 million video views with 30 hours of video uploaded to YouTube. The growth continues in a staggering manner. Today there are as many network devices as the global population. By 2015 that number will grow twice the global population. That means in just two years it would take a human five years just to view all of the video data crossing IP networks each second.
Further contributing to this growth is the rise of M2M traffic or Machine-to-Machine communication. You may have seen the Cisco commercials mention the term of “the internet of things”, covering everything that can be connected to a network, from talking light switches to automated doors to shopping carts all centrally stored and controlled via network. This is M2M in action. One division of GE is equipping their new turbines with 250 sensors in each of its 5000 turbines, enabling real-time data processing via centralized monitoring facility, where they are on the lookout for leading issues, such as temperature on the bearings, vibrations, exhaust and other areas that signal the health of the machine. When readings fall outside safe predefined levels, GE technicians can get a jump start on fixes before mechanical errors or breakdown occur, enabling power to be sold on a per-hour basis. GE states that “for some customers just one hour of stoppage can cost $2 million in electrical output”. With those kinds of costs, would you rather have a machine notify you before an outage that something is wrong, or watch a technician with a toolbox diagnosing a busted turbine? Big Data can take it one step further: through predictive analysis, technicians can even discover when a turbine might fail and what steps to take to prevent that failure.
With all this data being generated the largest problem Big Data presents is how best to sort through and process everything. Data storage is only one obstacle. You also have to process and provide analysis. There are many software companies creating products manufacturers use to collect data from their machines, analyze it, and integrate the data into their business systems. The manufacturers can then use this machine data to understand usage and behavior, build models, and figure out new ways to drive value. While these Enterprise Resource Management (ERP) and other tools have been common in manufacturing for some time, this is now true across all sectors.
At Green House Data, we see many organizations starting to leverage MapReduce algorithms and Hadoop software frameworks to pull value out of their data. For example, it’s not uncommon for a hospital to query their data for “the number of times a women 30-40 received a mammogram in Wyoming”. Car dealers may search for the “age and sex of sedan buyers per day of the week” to help their sales force discover when they should market their new sedans and what demographic to focus on.
Whether you're dealing with terabytes or exabytes of data, new forward-thinking ideas are needed to keep up with the amount of data being generated and resources needed to store and process it. For a cost effective solution small and large businesses alike are frequently turning to the cloud to provide the needed resources to deal with Big Data, as cloud deployments enable fast-scaling, easily implemented and low cost infrastructure, ideal for experimenting with and crunching ever-increasing data sets. Despite its murky definition, what Big Data boils down to improving efficiency and increasing revenue by tracking and analyzing every aspect of your business. A valuable tool, indeed.
Posted By: Cortney Thompson