Big Data Uncovered?


I recently came across an eWeek article titled. You could easily say I have a few opinions on the matter of big data! :)

First, I agree with Frank’s first notion that big data is neither big or new. The fact is, I’ve been saying things like “Dude, that’s a ton of data!”since I started notching out the opposite sides of floppies back in the 80s. Remember these?

Ohlhprst quickly follows up his vague handwaving that ‘big data’ a new term with, “For most of its existence, big data has been out of the reach of small and midsize businesses (SMBs) because the storage and processing power needed to make this technology work is too expensive.”

Companies have been doing for years what they need to do better business, regardless of whether or not it’s expensive. In manufacturing, the costs of a small company optimizing on how to efficiently making tons of a cheap product can actually be quite a bit more expensive than a larger company making a few units of a complex product. In the same vein, smaller business may have more complex business optimization processes than larger ones, and require relatively larger amounts of data are required to solve those problems than with larger companies.

I agree with Frank that small business typically don’t always have the resources necessary to solve massive scale problems, but again the problems are relative. For example, small software startups don’t have project managers where larger ones do, not because they can’t afford them, but because they really don’t need them in a full-time capacity. I think this may be part of why SaaS services have been a huge hit and the term cloud has taken off because of it. SaaS allows companies to tackle a wide variety of problems across the entire business, all the while providing cost effective high tech solutions to solve problems in a way you could never have done before. For the first time in history the quality of business processes is experiencing sustained growth.

“These new cloud-based capabilities are on a growth path and are creating more opportunities for even the smallest of businesses to leverage big data without the traditional expenses of compute farms and massive storage arrays.”

Yes. However, compute farms and storage aren’t the main thing that these companies need. They need access to the raw data that contains the data about their business, and the tools to extract the data in which they can take action. Figuring out your company’s problems requires brain power, understanding, data and tools. CPU and disks don’t solve complex problems. People do.

It’s All About Application Analytics

Ohlhprst also describes big data analytics as being comprised of three primary elements: volumes of unstructured data, processing power and algorithms. However, big data doesn’t always imply unstructured data. Log files, or what Loggly calls Big Time Data™ typcially contain a large amount of structure. Dealing with structured data isn’t always easy, and if you write software that ‘expects’ a certain structured format, your analysis can sometimes be broken or flawed if it encounters data that doesn’t fit the structure you coded for. One way around this problem is to apply extra meta data to the data set. One technique to solving to this problem is adding a search index to the data, which is the approach Google pioniered and what Splunk and Loggly do for log files or time series event data. By being able to do text search data, and interact with it in realtime, or near real time, the user can optimize on solving the problem

Ohlhorst continues, “For it to be true big data, there has to be lots of it, and most SMBs don’t generate that volume of data internally, which leads them to seek out alternative data sources. Here, the cloud delivers.” Not true. Big data should be defined as an amount of data that a human can not reasonably digest. Generating large amounts of meaningful data is actually a bigger problem. Again, it’s understanding the problem you have before you can solve it.

Yup. Ohlhorst explains that throughout 2012, data sets and others can be expected to grow exponentially. “The amount of data being generated globally increases by 40 percent a year”, according to the McKinsey Global Institute, a data analytics research firm. True. The access of this data, mostly through the web, generates vast amounts of data as well. Ohlhorst continues that information needs to be organized, sorted and processed- and that takes computing power. Frankly nowadays, CPU is cheap enough that most of these problems can be solved on your laptop. Fast CPUs for crunching ‘big data’ aren’t the problem any more than a search engine’s main problem is crawling for data. The real bottleneck is adding meaning to the data that a customer can digest and make actionable.

PaaS/IaaS Accessibility Is a Problem

I’m glad that Ohlhprst recognized that Amazon isn’t the only one in the game in offering private cloud-based big data analytics platforms. He believes that since this technology is designed as a complete platform and not as a service, these platforms are still out of the reach of the SMB market.

Ohlhorst is right that these platforms are out of reach – but not just because they are designed as a complete platform and not as a service. I think it’s because SMBs don’t know they need it, don’t have the data to put on it, and don’t have the resources to manage it. There are plenty of hosted solutions out there (see SalesForce and their app marketplace) that provide some serious horsepower to the most important task – managing a company’s contacts.

And of course, there had to be a Splunk mention in his article. Splunk sells expensive enterprise software. Their software is often times the most expensive piece of software a company has ever bought. Sounds like Oracle, eh? They aren’t converting big data analytics into cloud services; they are simply taking their product and making a slimmed down version into a cloud offering they can generate leads with. Any serious big data customer they land will have to buy that very expensive solution and install it on a bank of computers and then pay people to manage it. SaaS is not what Splunk is taking to the market when they go IPO. It’s their hellaciously expensive software licenses.

Big Time Data. It’s in the future of your small business.

Share Your Thoughts