Loggly Q&A: A Few Words from Robin Bloor on Big Data and Cloud Trends
Robin Bloor is an IT analyst and blogger and co-founder of The Bloor Group. He has more than 30 years of experience in data and information management and is the author of several books including The Electronic B@zaar, From the Silk Road to the eRoad; a book on e-commerce and three IT books in the Dummies series on SOA, Service Management and The Cloud. Check out his latest work at the Inside Analysis blog. Follow Robin on Twitter at @robinbloor.
Loggly: What do you think about log management in the cloud?
Bloor: This is a very good idea because log records take up so much space and no one wants to look at them most of the time. Putting them in the cloud is sensible, because while you’re not querying them often, they’re still available as a semi archive that you can get to pretty easily.
Loggly: How has the emergence of big data technology improved IT operations?
Bloor: First let’s define what we mean by big data. I think it’s a rather strange idea to put the word Big in front of Data. Back in 2003, Intel realized that they couldn’t improve speed of processing because the chips would burn out. So they began to make chips that could execute twice as many instructions, which is what we call parallel operations. While parallel processing had its origins in WWII for cryptography, no one pursued that line in the computing industry for a long time. Now we can run many things at once, 16 calls on a chip, which means we can move 16,000 times faster as long as we have the software behind it. It’s like flying to Melbourne which is typically a 16-hour flight and it now takes just a minute. So we are really in the era of big processing not big data. Previously we couldn’t easily unlock the value from unstructured data because it took too long, but we can now with parallel processing. That brings us to log management in which finally, we had these new tools to look inside the data center. A company can grab all those log files and analyze them to understand what’s going on with their applications. IT people can get to log files they didn’t even know they had such as from network switches. That’s a great example of unearthing Dark Data. Speed of processing means that you can get logs from every point in the data center.
Loggly: What’s cool about this development for data center managers?
Bloor: I used to work on a help desk, and it was so hard to find answers when a job failed. Now you can look at those log files and work out why something went wrong. Instead of a few days going by and the user wanting to murder someone, now you have something called service. If you’re going to upgrade the database over the weekend, you can fix any issues before Monday when people start complaining. But you can also marry the log files with external data on customers, from the web or social media, and it can all be in a data warehouse for analysis. Of course the volume of these log files is getting really huge, into the terabytes. So companies are looking at using tools like Hadoop or a cloud service to help deal with the volume. You can of course increase the size of your data warehouse but that gets really expensive.
Loggly: How far down the big data maturity curve is IT operations?
Bloor: A perfect data center is instrumented everywhere. Systems know well ahead of time if something’s going to fail and everyone across the IT department is dashed into the status. That scenario is not much of a reality yet. There might be one or two data centers that work like that today. I spoke with a CIO recently and asked him about service levels in IT, which ones are really being monitored 24/7 and instrumented properly? He said, only the apps that if they fail, he gets sacked. One problem is that even focusing on a few critical applications is difficult. There is so much interdependence in the data center that people don’t even know about. It’s like peeling layers of the onion back, and the deeper you get into it the more you realize that there’s still more layers. Why data centers aren’t farther along in terms of big data intelligence is because, other priorities may be higher. A CTO I met with said he had a list of 20 projects that could save his company a lot of money but he could only really focus on three of them. If he wanted to do more than that, he’d need to hire a consultant and then you’re spending vast amounts of money on the project–versus simply a lot of money. IT organizations don’t always have enough capacity for change. They may not have the really smart people on hand to do it. Another problem is, the CIO will come in and stomp his foot and say why isn’t this thing at the top of the list? Then the IT people say okay, it’s now at the top. That’s just a common problem and infrastructure and monitoring projects don’t always make the cut.
Loggly: Are cloud-based services such as for log file analysis or network monitoring liberating IT performance management so that it’s not such a big endeavor to do this well?
Bloor: The cloud allows you to deploy resources much faster and it’s a godsend for prototyping. If I have an application I want to test, I can do that in the cloud and I don’t have to worry about its impact on the data center. Another awesome thing is that data center space is very expensive so you save there, and not having to move a data center is always a relief. Or else, there’s going to be tears before bedtime.
Loggly: With continual innovations happening at cloud IaaS providers, won’t this question of visibility, trust and control get easier for companies?
Bloor: I do think that the control issue will change. AWS is looking into designing its own servers that will be made to run in the AWS cloud networks specifically and will be very energy-efficient. While this may not happen for a few years, the idea that companies can always do better in their own data center is not going to hold. These new cloud servers will run faster at the same price for users today or be offered at a lower price for the same speed. But this could take a very long time before everything is running in the cloud. I suspect the IBM mainframe will live longer than me, and I think I have pretty good longevity.
Loggly: Are the more data-driven companies more prone to deploying cloud technologies?
Bloor: Yes, but they still have their prejudices on what is okay to run in the cloud. The banking industry still won’t do certain things in the cloud. Various laws and risk management initiatives make them pretty inflexible. The other sector that is stuck is healthcare, because of HIPAA. The politicians who wrote HIPAA knew nothing about computers and as a result, wrote really stupid laws. There are battles in Europe of a similar nature. Yet when you get to telecomm companies and retailers, it’s a different story. Their business is to provide an environment where consumers will want to spend lots of money, so they collect a lot of data and use that to drive marketing and sales. And in gaming, every piece of information companies can gather on users is a source of potential revenue. There are tons of data points being collected every minute, and there’s really no need for low-level control. That makes the cloud very sensible for most Web-based companies. Many startups may have never had a data center to begin with to keep their costs down. Once they are proven and established, and they get VC support, why move everything back inside?