
Could your security and performance be in jeopardy?
Nearly half (3.2 billion, or 45%) of the seven billion people in the world used the Internet in 2015, according to a BBC news report. If you think all those people generate a huge amount of data (in the form of website visits, clicks, likes, tweets, photos, online transactions, and blog posts), wait for the data explosion that will happen when the Internet of Things (IoT) meets the Internet of People. Gartner, Inc. forecast that there will be twice as many–6.4 billion–Internet-connected gadgets (everything from light bulbs to baby diapers to connected cars) in use worldwide in 2016, up 30 percent from 2015, and will reach over 20 billion by 2020.
Companies of all sizes and in virtually every industry are struggling to manage the exploding amounts of data. To cope with the problem, many organizations are turning to solutions based on Apache Hadoop, the popular open-source software framework for storing and processing massive datasets. But purchasing, deploying, configuring, and fine-tuning a do-it-yourself (DIY) Hadoop cluster to work with your existing infrastructure can be much more challenging than many organizations expect, even if your company has the specialized skills needed to tackle the job.
But as both business and IT executives know all too well, managing big data involves far more than just dealing with storage and retrieval challenges—it requires addressing a variety of privacy and security issues as well. Beyond the brand damage that companies like Sony and Target have experienced in the last few years from data breaches, there’s also the likelihood that companies that fail to secure the life cycle of their big data environments will face regulatory consequences. Early last year, the Federal Trade Commission released a report on the Internet of Things that contains guidelines to promote consumer privacy and security. The Federal Trade Commission’s document, Careful Connections: Building Security in the Internet of Things, encourages companies to implement a risk-based approach and take advantage of best practices developed by security experts, such as using strong encryption and proper authentication.
While not calling for new legislation (due to the speed of innovation in the IoT space), the FTC report states that businesses and law enforcers have a shared interest in ensuring that consumers’ expectations about the security of IoT products are met. The report recommends several “time-tested” security best practices for companies processing IoT data, such as:
- Implementing “security by design” by building security into your products and services at the outset of your planning process, rather than grafting it on as an afterthought.
- Implementing a defense-in-depth approach that incorporates security measures at several levels.
Business and IT executives who try to follow the FTC’s big data security recommendations are likely to run into roadblocks, especially if you’re trying to integrate Hadoop with your existing IT infrastructure. The main problem with Hadoop is that is it wasn’t originally built with security in mind; it was developed solely to address massive distributed data storage and fast processing, which leads to the following threats:
- DIY Hadoop. A do-it-yourself Hadoop cluster presents inherent risks, especially since many times it’s developed without adequate security by a small group of people in a laboratory-type setting, closed off from a production environment. As a cluster grows from small project to advanced enterprise Hadoop, every period of growth—patching, tuning, verifying versions between Hadoop modules, OS libraries, utilities, user management, and so forth—becomes more difficult and time-consuming.
- Unauthorized access. Built under the principle of “data democratization”—so that all data is accessible by all users of the cluster— Hadoop has had challenges complying with certain compliance standards, such as the Health Insurance Portability and Accountability Act (HIPAA) and the Payment Card Industry Data Security Standard (PCI DSS). That’s due to the lack of access controls on data, including password controls, file and database authorization, and auditing.
- Data provenance. With open source Hadoop, it has been difficult to determine where a particular dataset originated and what data sources it was derived from. Which means you can end up basing critical business decisions on analytics taken from suspect or compromised data.
2X Faster Performance than DIY Hadoop
In his keynote at last year’s Oracle OpenWorld 2015, Intel CEO Brian Krzanich described work Intel has been doing with Oracle to build high performing datacenters using the pre-built Oracle Big Data Appliance, an integrated, optimized solution powered by the Intel Xeon processor family. Specifically, he referred to recent benchmark testing by Intel engineers that showed an Oracle Big Data Appliance solution with some basic tuning achieved nearly two times better performance than a comparable DIY cluster built on comparable hardware.
Not only is it faster, but it was designed to meet the security needs of the enterprise. Oracle Big Data Appliance automates the steps required to deploy a secure cluster – including complex tasks like setting up authentication, data authorization, encryption, and auditing. This dramatically reduces the amount of time required to both set up and maintain a secure infrastructure.
Do-it-yourself (DIY) Apache Hadoop clusters are appealing to many business and IT executives because of the apparent cost savings from using commodity hardware and free software distributions. As I’ve shown, despite the initial savings, DIY Hadoop clusters are not always a good option for organizations looking to get up to speed on an enterprise big data solution, both from a security and performance standpoint.
Find out how your company can move to an enterprise Big Data architecture with Oracle’s Big Data Platform at https://www.oracle.com/big-data.