Is Hadoop secure enough for the enterprise?
An ever-increasing number of organisations are turning to big data to gain valuable insight that can be immediately acted on to increase revenue, lower operating costs, or mitigate risk. For many companies, big data creates new, innovative opportunities and also provides a competitive advantage. In fact 70% of IT decision-makers believe an organisation’s ability to garner value from big data is critical to its future success.
The proliferation of mobile consumer and business devices (BYOD, Internet of Things), and the availability of cheaper storage (cloud) are vastly increasing the amount of data organisations amass. New technologies are required to harness the value from this growing amount of data. Companies are leveraging technologies such as Apache Hadoop, which enable them to run cost-effective, large-scale analysis and processing. However, with any new tool or platform, questions have risen around its security and whether Hadoop is primed for production use in enterprise environments.
Putting security under scrutiny
Evolving from its experimental beginnings, Hadoop is now used in production environments across a wide range of industries. Diverse applications–from data warehouse optimization and clickstream analysis to recommendation engines and fraud and anomaly detection–rely on Hadoop to efficiently manage and compute data.
Its proliferation has led Hadoop to be scrutinised for its security capability – largely a mischaracterisation. Hadoop is successfully used today in security-conscious environments worldwide, such as healthcare, financial services and the government. Rather than focusing on whether Hadoop is suitable for secure enterprise environments, the IT community should instead consider the more important issue of identifying the right approach for each specific environment.
Deploying the native security capabilities correctly
Hadoop offers its own native security capabilities. For secure data and Kerberos integration, for example, authentication is always required. Access controls or authorisation are able to grant and deny permission for accessing specific data sets. Auditing can also be applied in many different ways, from following compliance in order to meet a business’s requirements to analysing user behaviour.
Hadoop also supports encryption. Network communications can be encrypted to prevent eavesdropping on sensitive data-in-motion. However, encryption for data-at-rest is often misunderstood and incorrectly used as a method of access control by authorized users. Its correct use is to protect stored sensitive data, even if the physical storage device were to be stolen.
Obscuring any sensitive elements in files is another means of encrypting data-at-rest, as the data can be made non-sensitive whilst also retaining any analytical value. Many third-party Hadoop vendors are able to manage this style of encryption.
Security to suit business needs
Often times, security in Hadoop deployments is even more restrictive in environments with highly sensitive data.
For some deployment models, for example, network protection schemes like firewalls can secure Hadoop clusters by allowing only trusted users to access it. As the most elementary type of implementation, it doesn’t depend on any of Hadoop’s distinct security capabilities. As an extension, this model can also prevent direct logins to cluster servers, whilst users can be assigned data access through the use of edge nodes linked with Hadoop’s fundamental security controls.
For more sophisticated approaches, Hadoop’s native security controls can enable more users to have access, while the data itself will only be made available to authorised users. In yet more advanced environments, fully deployed Hadoop security capabilities, in conjunction with analytics and monitoring tools on Hadoop clusters, can even detect and impede intrusion, and any other rogue acts.
Hadoop is already used in many companies operating with sensitive data, demonstrating its legitimacy to serve as a secure environment. But, just as with all new technologies, teams considering deploying Hadoop must first consider what is of greatest importance before looking into which of its specific features best support these priorities. Talking to third-party security providers and Hadoop vendors is also a great way to gain further insight to help organisations conduct a rollout that best reflects their requirements.
DIY Hadoop standards
While there are several different options for managing access control for Hadoop, there is no universal standard, and it is up to the security professionals within companies to investigate and determine what the best option is for their unique environment. Different technologies have different approaches to this, whether it’s build as you go, a more data-centric style, or through following-the-data.
This lack of standards shouldn’t stop an organisation from embracing Hadoop. Using these varied approaches allows for different levels of processes to be applied. This is no different than what is seen in other enterprise systems. And with so much sensitive data and so many production environments already using Hadoop, its security capabilities are only set to improve as adoption continues to flourish.