What to watch in the Dixons Carphone breach...

There are a number of interesting takeaways from the data breach announced by European electronics firm Dixons Carphone earlier this week.

First, the breach provides partial validation of the new chip-and-pin technology.  Many compromised cards had this new technology and as secondary authenticators, like CVV values and PINs, were NOT compromised these consumers may be relatively safe.  This is is also a validation of the general principle that it is good design to set up multiple necessary points of failure that attackers must compromise before real damage is done.

Second, the Dixons Carphone breach will be worth following going forward as it may involve violations of the new GDPR data privacy regulations.  If Dixons is punished under the new law they may be among the first and their case will set a tone for how the law is applied going forward.

Finally, an interesting side note is the prior 2015 breach at Dixons breach which proceeded by a rather innocuous attack vector: an out of data WordPress site... 

Data breach and identity theft

Identity theft is a growing problem and probably one of the creepier threats to your finances at large today.  There are many things we can personally do to protect ourselves, such as shredding documents when they are no longer needed, regularly checking credit reports, etc.  However, data on consumers lost by business in data breaches is a major source of the sensitive data (names, credit card numbers, and so on) that fuels criminals.  In some cases, the most famous being the loss of information on nearly 150 million consumers at Equifax, there isn't a whole lot for private citizens to do.  This is why it's so important that the public become involved in driving the next wave of data privacy technology - the consumer is the principal beneficiary and can take a role in making sure that enterprise is doing what needs to be done to protect them. 

Blockchain and Data Privacy

There are many businesses considering putting data (supply chain data) on a blockchain but not much conversation of the liabilities this presents.  In most blockchain architectures, all the data on the blockchain is actually held by all the nodes on the network and then each of these nodes has the power to lose this data in a breach.  Given how much trouble the world is having protecting data stored on centralized servers, reproducing the data many times is likely to produce an unprecedented data privacy crisis as multiplication of opportunity produces a multitude of new breaches.

What is P.I.I. (a.k.a. PII or personally identifiable information)?

An important category of data is personally identifiable information, often referred to as P.I.I. or PII, and it's name is suggests accurately what it is: information that can be used to identify an individual person.  There are many very familiar examples like name, social security number, address, etc.  Some more arcane examples are the sorts of things one needs to supply as a secondary verification of identity at the bank, such as mother's maiden name.  P.I.I. is often an explicitly spelled-out regulatory category but there are a number of pieces of information that considered as P.I.I. across jurisdictions and the philosophy defining P.I.I. is consistent even when the level of inclusiveness is not.  (Is the name of your first pet personally identifiable information?)

P.I.I. is important privacy is not just about what information is available, but what information can be tied back to an individual.  Medical records provide a great example.  If someone steals your medical records, this is not perhaps so bad if they lack information to tie these records back to you - from their perspective, they don't have your medical records but only some unknown person's medical records.

 

How can Ghost PII improve the security of what I am building?

Capnion's API is intended to permit developers to enhance the security of the applications using the Ghost PII protocol.  Below I will walk through one, hopefully familiar, example of a transaction involving personal information and explain how its security can be improve with Ghost PII.

We've all had to tell a bank website our mother's maiden name, or pass on some similar private information, to prove our identity at some point.  This is an unfortunate situation as protecting privacy requires moving around more information that jeopardizes privacy.  Capnion's technology has the power to fix this situation - in particular, it can ensure that no computer needs to ever hold your unencrypted response in memory ever at any time.

Here's how it works: software integrated to your browser encrypts your response (your mother's maiden name) when you type it in and this encrypted response is all that is ever sent to the bank.  It is all that sent when you open your account and establish your security questions, just as it is all that is sent when you prove your identity later.  The bank only holds encrypted data and never has the ability to decrypt it.  When the bank needs to check your answer, you can grant them permission to request a special key from Capnion's API that they can use to compare the two ciphertexts you gave them.  This special key lets the bank know whether you gave the same answer both times and nothing else.  

This transaction is an example of what is called a zero-knowledge proof.  You have proven to the bank that you are who you say, and 'zero-knowledge' refers to the fact that the bank has learned nothing about your mother's maiden name. 

How do I use Ghost PII?

Just how Ghost PII works is a bit technical but that doesn't mean you need to be a rocket scientist to use it.  The core of Ghost PII is an API, maintained by Capnion, that provides you with specialized encryption keys for doing homomorphic computations (computations on encrypted data) on personally identifiable information like name, social security number, etc.

In a prototypical use case, the first thing you should do when you obtain personally identifiable information is call Capnion's API for an encryption key.  Once the data is encrypted correctly there is no way to lose it to an attacker unless your system and Capnion's suffer total breaches at exactly the same time, and Capnion's system is designed both with top priority given to security and conservative data governance.

Imagine that that down the road you need to compare encrypted addresses, perhaps out of concerns that two addresses only differ in a superficial way like replacement of "Road" with "Rd." or similar abbreviation.  You can then request a specialized key that will permit you to compute the number of characters that two encrypted addresses have in common without any need of decrypting. 

This has many benefits.  You never need to decrypt, which improves security.  You have cut out the time and computational resources you might have spent on de and re crypting, which saves money and time (which is also money).  You do not need to know for sure what kind of entity resolution you want to do down the road when you encrypt, nor do you need to grant the analyst examining duplicated addresses permission to see this personal information, and these break down the opposition between security and convenience.  Ghost PII is a pure win for your business because it essentially eliminates a hard tradeoff, one that has forced many businesses to work with plaintext in the past.

Ghost PII

Personally identifiable information, commonly abbreviate PII, refers to information like a name, social security number, etc.  It is sometimes a formal regulatory category and it is among the more sensitive information commonly lost in data breaches - to lose a person's medical records, for example, is more serious if there is information that can be used to tie a particular person to those records.  Much of this PII is notable for not having a whole lot of content: your social security number doesn't say much about you on it's own, but it is rather an arbitrary number (originally) used to help the government organize records about you.

Capnion has developed a specialized cryptographic protocol called Ghost PII that lets businesses work with your personally identifiable information while it is still encrypted, permitting them to keep it encrypted it all times.  Let me give some detail on how it works.  Any really secure method of encryption should produce two different ciphertexts when applied to the same social security number twice... without homomorphic encryption, there would be no way to determine if two ciphertexts had come from the same social security number originally without decrypting.  This constant need of decryption is part of what drives the breach crisis.  Capnion's Ghost PII is a technique and set of software tools for encrypting data that allows linking records on encrypted identifying numbers, determing which ciphertexts came from the same social security number without need of decrypting.

The Cost of a Breach

The costs of a data breach, to the company breached and to the public, are considerable.  There are direct costs from things like PR and legal fees from ensuing lawsuits.  There are indirect costs, perhaps more formidable, from damage to reputation and loss of trade secrets.  Each breach is different, but it is common to estimate the cost of a breach at around $140 per lost record and some discussion of these estimates is given at this link.

Once again, each breach is different and institutions may have unique liabilities.  One interesting example was the case of Independent News and Media, discussed at this link, an Irish media company that suffered a data breach.  The case was interesting because their data contained secrets about their confidential sources and thus the breach presented a threat to journalistic freedom of general public interest.

If none of these things move you, it is still never fun to lose your job.

Breaches and Human Frailty

Computer security would probably be easier if there were no humans involved.  Almost anything you would do to protect your system can be nullified by sufficiently negligent or malicious actions by your employees.  After-the-fact analyses of data breaches bear this intuition out, like that described at the link below.  It found that 1 in 4 data breaches was the work of insiders.

https://www.theregister.co.uk/2018/04/10/verizon_dbir/

One of the costs of a data breach is embarrassment, and this risk is heightened when there is a potential for a juicy, shareable headline with phrases like "employee worked with outside criminal" 

https://www.usatoday.com/story/tech/2018/04/20/many-1-5-million-accounts-may-have-been-compromised-suntrust-banks/535687002/

This suggests the power of Ghost PII: Why keep holding data that presents this sort of danger if you can get things done without it?  Do you need to know what a customer's SSN if you were only using it to link records?  The answer to the latter question is "No!" and Capnion is working to build a world where no one has the power to cause a breach because they have no need of it.

What is homomorphic encryption? Who cares?

It seems reasonable to presume that to do any sort of work on encrypted data, you should need to decrypt it first, but this is not the case.  Suppose you have two numbers a and b as well as encryption and decryption algorithms Enc and Dec.  These algorithms are said to be homomorphic in addition (substitute in multiplication throughout the following if you like) if there is a third algorithm Add(_,_) such that Dec(Add(Enc(a),Enc(b))) = a + b.

The very short story here is that homomorphic encryption is about doing work on encrypted data without needing to decrypt it, or otherwise learn about it, and still getting the right answer.

It's a big problem today how much plaintext is lying around.  There is a data breach announced almost every day and the data lost was rarely encrypted because someone needed to do work on it.  Working on encrypted data directly allows full-time encryption, and full-time encryption will allow a standard of security that ends data breaches for good.

The problem we want to solve...

People ask too rarely why the data lost in a breach wasn't encrypted.  The answer, in the past at least, has been that someone needed to do work on that data (analytics, ETL, etc.).  However, it has become possible to do this sort of work directly on ciphertext, answering questions about encrypted data without need of decrypting it, and this permits keeping sensitive data encrypted at all times.  This is what we are working on at Capnion!