Tokenization: A new approach to enterprise data security
As enterprises seek to protect data from cybercriminals, internal theft or even accidental loss, encryption and key management have become increasingly important and proven weapons in the security arsenal for data stored in databases, files and applications, and for data in transit. No one needs to be reminded of the many high-profile, reputation-damaging and costly data breaches that organizations across industries and governments have suffered over the past few years.
To protect consumers, organizations like the Payment Card Industry have instituted security mandates such as the Data Security Standard (PCI DSS), and governments have passed privacy laws. While these mandates and laws require companies to take certain steps to protect consumer and patient information such as credit card numbers and various types of Personally Identifiable Information (PII), CISOs are also faced with protecting company-confidential information ranging from employee information to intellectual property. Most always, this means finding the best way to secure many types of data stored on a variety of hardware, from mobile devices to desktops, servers and mainframes, and in many different applications and databases. Further, as some companies have learned the hard way, being compliant doesn’t equate to being secure. Breaches have occurred in companies that had taken the necessary steps to pass PCI DSS compliance audits.
Companies typically rely on strong local encryption to protect data. While effective, it does present some challenges. For example, encrypted data takes more space than unencrypted data. Trying to fit the larger cipher text of a 16-digit credit card number back into the 16-digit field poses a “square peg into a round hole” kind of storage problem with consequences that ripple through the business applications that use the data. Storing encrypted values in place of the original data often requires companies to contract for costly programming modifications to existing applications and databases. What’s more, for businesses that must comply with PCI DSS, any system that contains encrypted card data is “in scope” for PCI DSS compliance and audits. Every in-scope system adds to the cost and complexity of compliance.
To reduce the points of risk as well as the scope of PCI DSS audits, and to provide another level of security, a new data security model—tokenization—is gaining traction with CISOs who need to protect all manner of confidential information in an IT environment.
What is Tokenization?
With traditional encryption, when a database or application needs to store sensitive data such as credit card or national insurance numbers, those values are encrypted and then the cipher text is returned to the original location. With tokenization, a token—or surrogate value—is returned and stored in place of the original data. The token is a reference to the actual cipher text, which is usually stored in a central data vault. This token can then be safely used by any file, application, database or backup medium throughout the organization, thus minimizing the risk of exposing the actual sensitive data. Because you can control the format of the token, and because the token is consistent for all instances of a particular sensitive data value, your business and analytical applications continue seamless operation.
Tokenization is an alternative data protection architecture that is ideal for some organizations’ requirements. It reduces the number of points where sensitive data is stored within an enterprise, making it easier to manage and more secure. It’s much like storing all of the Queens’ jewels in the Tower of London. Both are single repositories of important items, well guarded and easily managed.
The newest form of tokenization, called Format Preserving Tokenization, creates a token—or surrogate value—that represents and fits precisely in the place of the original data, instead of the larger amount of storage required by encrypted data. Additionally, to maintain some of the business context of the original value, certain portions of the data can be retained within the token that is generated. The encrypted data the token represents is then locked in the central data vault.
Because tokens are not mathematically derived from the original data, they are arguably safer than even exposing encrypted values. A token can be passed around the network between applications, databases and business processes safely, all the while leaving the encrypted data it represents securely stored in the central repository. Authorized applications that need access to encrypted data can only retrieve it with proper credentials and a token issued from a token server, providing an extra layer of protection for sensitive information and preserving storage space at data collection points.
Tokenization enables organizations to better protect sensitive information throughout the entire enterprise by replacing it with data surrogate tokens. Tokenization not only addresses the unanticipated complexities introduced by traditional encryption, but can also minimize the number of locations where sensitive data resides given that the cipher text is only stored centrally. Shrinking this footprint can help organizations simplify their operations and reduce the risk of breach.
Replacing encrypted data with tokens also provides a way for organizations to reduce the number of employees who can access sensitive data to minimize the scope of internal data theft risk dramatically. Under the tokenization model, only authorized employees have access to encrypted data such as customer information; and even fewer employees have access to the clear text, decrypted data.
Tokenization in an enterprise
The most effective token servers combine tokenization with encryption, hashing and masking to deliver an intelligent and flexible data security strategy. Under the tokenization model, data that needs to be encrypted is passed to the token server where it is encrypted and stored in the central data vault. The token server then issues a token, which is placed into applications or databases where required. When an application or database needs access to the encrypted value, it makes a call to the token server using the token to request the full value.
Referential integrity can introduce problems where various applications (e.g., data warehouses) and databases use the sensitive data values as primary or foreign keys to run queries and to perform data analysis. When the sensitive fields are encrypted, they often impede these operations since, by definition, encryption algorithms generate random encrypted values—this is to say that the same encrypted value (a credit card, for instance) does not always generate the same encrypted value. While there are methods to make it consistent, there are risks associated with removing the “randomization’ from encryption. A consistent, format-sensitive token eliminates this issue.
With format preserving tokenization, the relationship between data and token is preserved—even when encryption keys are rotated. The central data vault contains a single encrypted version of each original plain text field. This is true even when encryption keys change over time, because there is only one instance of the encrypted value in the data silo. This means the returned tokens are always consistent whenever the same data value is encrypted throughout the enterprise. Since the token server maintains a strict one-to-one relationship between the token and data value, tokens can be used as primary and foreign keys and referential integrity can be assured whenever the encrypted field is present across multiple data sets. And since records are only created once for each given data value (and token) within the data vault, storage space requirements are minimized.
Maintaining referential integrity is also useful for complying with European privacy laws that regulate the electronic transfer of social insurance numbers across international borders. Using tokens in place of encrypted values meet the requirement of the law, yet allow for data analysis across borders.
Tokenization in practice
There are two scenarios where implementing a token strategy can be beneficial: to reduce the number of places sensitive encrypted data resides; and to reduce the scope of a PCI DSS audit. The hub and spoke model is the same for both. The hub contains three components: a centralized encryption key manager to manage the lifecycle of keys; a token server to encrypt data and generate tokens; and a central data vault to hold the encrypted values, or cipher text. The spokes are the endpoints where sensitive data originates such as point-of-sale terminals in retail stores or the servers in a department, call center or website.
Conclusion
Tokenization reduces the scope of risk, data storage requirements and changes to applications and databases, while maintaining referential integrity and streamlining the auditing process for regulatory compliance. Suitable to heterogeneous IT environments that use mainframes and distributed systems for back office applications and a variety of endpoints, tokenization presents a number of benefits to CSOs tasked with protecting all types of confidential information. The higher the volume of data and the more types of sensitive data you collect and protect, the more valuable tokenization becomes. Fortunately, incorporating tokenisation requires little more than adding a token server and a central data vault. For companies that need to comply with PCI DSS, tokenisation has the added advantage of taking applications, databases and systems out of scope, reducing the complexity and cost of initial compliance and annual audits.
Astaro is exhibiting at Infosecurity Europe 2010, held on 27th – 29th April at Earl’s Court, London.