The many concerns Australian organisations and regulators have had about data sovereignty when moving sensitive data into cloud services were more than validated by revelations about the American NSA’s data collection programs in recent weeks.

The leaks have only reminded us that any data accessible over a network is vulnerable. Data can be copied by thieves who have a particular penchant for credit card records and login credentials, or by government actors and their commercial co-conspirators.
Putting your data in the cloud increases the attack surface; there are just more ways for unauthorised access to your data.
Cloud Protection Gateways, sometimes referred to as ‘Cloud Security Gateways’, are one attempt to solve this problem for companies that wish to protect their data from the security breaches of their cloud provider. Even in the case of unauthorised access to databases, the data as it sits in the cloud should be unusable.
Data masking
The problem isn’t entirely new. For quite some time people have wanted to have copies of production data for other purposes, testing and training chief among them.
Generating a full set of appropriate test data is hard work, and there are always corner cases you’ve forgotten to consider. People with no surname, for example, or odd characters in their name. Using real data from production helps to test the system properly.
But that creates new problems, not least that there is legislation (such as the Privacy Act 1988) that requires certain data to be kept secure. Credit card numbers, name and address details, phone numbers, that sort of thing.
If you copy data into a test or training system, not only do a lot more people now have access to that data, many of whom shouldn’t, but these systems are frequently not as secure.
An age-old solution has been data masking, where aspects of the data are changed or removed when the data is copied into the test environment. For example, a credit card number is ‘masked’ so that the first 12 characters are all X: XXXX XXXX XXXX 1234. Now no one looking at the test system can figure out what your credit card number is, no matter how hard they try.
This isn’t actually any good for a cloud system where you have to be able to read the data back. By masking the data, you’ve actually lost information, and though it might be secure, it’s also useless.
Tokenisation versus encryption
If you want to be able to read the data again later, you need to use a different method, such as encryption. Most iTnews readers will be fairly well acquainted with encryption, but there’s also a different method you may not be as well versed in, called tokenisation.
Tokenisation replaces the real data with some fake data – or to be precise, a token that is essentially a lookup key for the real data. The real data is stored in a lookup database, and the token replaces the original data in the datastore.
The token can match the format of the original data (16 character numeric, variable length character string with an ‘@’ in the middle, etc.) whereas encryption is often more restrictive about field lengths and available characters in order to preserve its effectiveness.
The downside of tokenisation is that it requires extra storage for the lookup tables. Encryption takes up close to the same amount of space for the data, excluding things like de-duplication and compression, which by definition shouldn’t work on encrypted data.
However encryption is more computationally intensive, whereas a key lookup is a relatively simple transaction for a system to make.
There is always a performance overhead to using these tools. Like most things to do with security, it’s all a matter of evaluating what you’re trying to do and figuring out how to achieve the best trade-off between ease-of-use and security.
Data protection gateways
The solutions we’ve reviewed act as gateways — proxy servers, essentially. They sit between users of your data and the cloud services. They dynamically encrypt and decrypt, or tokenise and de-tokenise the data, protecting it as it moves into the cloud and retrieving data to present to authorised users.
The data you choose to protect is tokenised or encrypted before it gets to the cloud service, so if your cloud provider has some sort of security breach, the data stored at rest in the cloud is protected. The encryption keys (and tokens) are stored on the gateway server that you can keep on your premises to maintain control.
For the purposes of this study we’ve chosen two of the more interesting new flavours of data protection gateways, CipherCloud and PerspecSys. Read on for the review...