Discover PerformanceHP Software's community for IT leaders // January 2013
Can the cloud solve big data's new challenge?
The cofounder and CTO of Hadoop powerhouse Cloudera considers the next phase of taming big data, and whether the solution is in the cloud.
New issues come to enterprise IT all the time, but managing information in 2013 will have at least one continuing challenge: big data. In years past, enterprises were primarily engaged in digging trenches: Where is big data coming from? Why do we need it? And where do we store it? For many organizations, 2013 will be the start of the next step: What can we do with it?
To understand the ascendant challenges and opportunities, we spoke to Amr Awadallah, CTO of Cloudera, a leading developer of Hadoop solutions to big data’s challenges.
Data interactivity is job #1
The most critical issue that CIOs need to be thinking about in the year ahead, says Awadallah, is how to put their data to good use.
“CIOs need to figure out how to extract more value out of the data they have. How to use data not just as numbers that feed into a dashboard, but rather something that can become a product in itself or can help the business make more money,” he says.
Extracting value from data requires interactivity—the ability to dynamically change the view of data or drill down to greater granularity. “Many enterprises want to see that happen right away,” Awadallah says, “but you have to process and deliver large amounts of data in less than a second, and that’s where the challenge lies.”
To make this possible without shrinking or predetermining the available data, the marketplace is responding with two basic approaches:
- heavy parallelism—a fully distributed system using many nodes to attack the data set in parallel;
- in-memory—keeping all of the data in-memory to facilitate new views of data without round trips to the file system.
In-memory systems generally provide the lowest latency, but at a higher cost, especially when data size reaches the hundreds of terabytes.
Adjusting to server sprawl
When data grows, infrastructure grows in kind. Furthermore, “organizations are just getting way more sophisticated in how they run their business,” Awadallah says, which means that more business functions are being computerized.
The result is an orders-of-magnitude increase in servers. Traditionally, enterprises only had a few hundred servers, maybe a few thousand at the high end, Awadallah explains. Thanks to growing data and business complexity, today’s enterprises have to manage not a thousand servers, but tens of thousands. Thus, another key challenge is for enterprises to make their infrastructure more scalable from a management point of view.
“And both of these things—more data and more complexity—are driven by employees and customers being way more connected because of mobile devices,” Awadallah says—mobile devices that are constantly on, constantly connected, and constantly using services and creating data in the cloud.
Speaking of the cloud …
While creating data interactivity is the primary concern in the year ahead, with infrastructure sprawl close behind, a third problem, Awadallah says, is figuring out how to use the cloud to acquire outside help with enterprise data management challenges without putting data or compliance at risk.
With an external cloud—third-party infrastructure-as-a-service, which includes services like Amazon Web Services (AWS), Microsoft Azure, and others—you don’t have to manage it yourself, Awadallah explains. You can leave it to someone else. But that strategy doesn’t work for everyone.
“There are some businesses that will never let their data live outside of their perimeter,” he says. “They look at their data as their bloodline, and nobody would like to have their blood outside the body. That’s why financial institutions, hospitals, and many other institutions might never adopt external cloud.”
But they do have alternatives. Awadallah says an internal cloud, which leverages virtualization to treat your internal infrastructure as a cloud service, is something that these organizations “will certainly move toward.”
Finally, for some, a fully managed external cloud—where a bank of fully dedicated servers are managed by a third party, in a secured environment that can’t be shared or accessed by anyone else—can be an acceptable middle ground for some organizations that need infrastructure management but can’t abide the increased risks of an external cloud.
Awadallah discusses the importance of hierarchy in big data on the Discover Performance blog. To read more about the power of Hadoop, check out Vertica’s enterprise-level approach to Hadoop, which can integrate with Cloudera.
HP Software’s Paul Muller hosts a weekly video digging into the hottest IT issues. Check out the latest episodes.
Welcome to a new reality of split-second decisions and marketing by the numbers.
Looking toward the era when everyone — and everything — is connected.
Introduction to Enterprise 20/20
What will a successful enterprise look like in the future?
Challenges and opportunities for the CIO of the future.
Dev Center 20/20
How will we organize development centers for the apps that will power our enterprises?
IT Operations 20/20
How can you achieve the data center of the future?
What the workforce of 2020 can expect from IT, and what IT can expect from the workforce.
Preparing today for tomorrow’s threats.
Data Center 20/20
The innovation and revenue engine of the enterprise.