What is(n't) blockchain to infosec? Part One: Blockchain is Immutability

The central, novel property of blockchain is immutability.  This means that records can not be changed once they are accepted, and implicit here also is that records receive a timestamp that can not be changed once it is agreed.  Immutability was key to the success of the Bitcoin network as it guaranteed there would be no tampering with the older sections of the ledger.  Immutability is also the property by which blockchain can offer something genuinely new to information security, but to see clearly how this works we should first examine some basic concepts in cryptography and blockchain architecture.

The mysticism around blockchain imagines it as being everywhere and nowhere, but a blockchain is tangibly a group of databases on a number of different computers, commonly called nodes, communicating constantly via a cryptographic protocol in order to make sure they are all keeping records in the same way.  This protocol isn't really an innovation in cryptography in the pure sense, but really a bunch of old ingredients linked together including a very common ingredient called a cryptographic hash function.  Informally, the important properties of these hash functions are that 1) they are easy to compute, but very difficult to invert, 2) their output depends chaotically on every little bit of input, and 3) they (hopefully almost) never produce the same output given two inputs.  If I give you an output from a hash function, you are going to have a hell of a time finding an input that produces this output unless I give you mine.

So thus far we have databases sharing cryptographic information to try and stay on the same page.  This could be a big headache if we are working with a lot of data, and this is where the blocks, their arrangement in a chain, and our hash function get put to work.  All the data goes into blocks as we get it, and we put the data of the n-1 -th block into a hash function and include this output in the n -th block.  Because of property 2) listed above, this means that any change in a block will radically change all the hashes appearing in all later blocks, while properties 1) and 3) make it extremely difficult to cheat and come up with new, fake blocks that prevent these radical changes.  Thus, the efforts of our many databases to stay on the same page need focus on the most recent block as any violation of prior consensus will upset the hash appearing in this most recent block.  This resistance to changes in prior consensus, which we previously called immutability, is what makes the whole enterprise workable, both computationally and on level of human trust.  

Immutability is useful in information security because it guarantees tamper resistance.  We might want to ensure that malicious actors are not doctoring our records towards their own ends, and the transmission of hashes from one block to the next ensures that this is very difficult or impossible.  But...in the blockchain context immutability depends on the  collaboration of the nodes of our network and their consensus process.  The role played by this consensus process, and how the consensus process determines what blockchain can and can't do for information security, will be the topic of the next post in this series. 

Singapore, Healthcare Consolidation, and Data Security

Singapore Health Services was recently hit by a massive breach with 1.5 million records lost.  Although 1.5 million would still be an eye-popping number in the United States, in Singapore this breach affects one in four citizens - comparable to a breach affecting 80+ million plus Americans.  This 80 million numbers seems hard to imagine, but it is becoming more and more plausible as ongoing consolidation in healthcare drives ever greater centralization in data storage.

Even in the past few months, Cigna has purchased Express Scripts for $67 bn and CVS has bought Aetna for $69 bn while the widespread expectation is that the approval of the AT&T & Time Warner merger can only prompt more consolidation.  The time is coming when breaches at single firms, including healthcare firms holding medical information, will compromise big percentages or even majorities of American consumers.

Lessons from Timehop

The Timehop breach, and this TechCrunch article about in particular, have a lot to teach about security as well as the media optics surrounding security.  Timehop is overhauling their security in response to the breach and this has inevitably exposed them to public questions about what they were doing with their security before - it is more than awkward to have TechCrunch stating "questions should be asked why it took an incident response to trigger a “more pervasive” security overhaul."  Pervasive encryption is really an urgent "must" at this point and this is exactly why Capnion is working to minimize its burden on the rest of your business.

The breach itself was another example of lax security, as it attacked a cloud computing account without two-factor authentication.  Most cloud infrastructure services, including AWS and Digital Ocean for example, offer this service free and encourage you to use it aggressively.  You need only download an (also free) app like DuoMobile to get started.  Not only is this kind of security a must, but the spectacular revelation you don't have it is an invitation for consumers to question your competence. 

The ALERRT breach and the many public risks that breaches pose

Conversation about data breaches often focuses on consumer data held by businesses but there are all sorts of databases out there that might be dangerous in the wrong hands.  The recently announced breach of the ALERRT (Advanced Law Enforcement Rapid Response Training) is a great example.  More detail can be found here.   

ALERRT is an organization that provides active shooter response training to law enforcement officers.  The compromised database unfortunately presents significant risks to the public, not only to the 100,000+ law enforcement officers whose personal information was directly compromised but also the public in general via the information on likely targets and response readiness in municipalities across the country.

Exactis & Breaches at Aggregators

The data breach at Exactis is notable for its gravity, how it has probably been under-reported, and for how it provides a window into the buying and selling of consumer data behind the scenes. 

You can find more detailed reporting here.

It is common, much more than the general public is aware of, for companies to buy and sell databases of information on consumers for sales purposes.  Credit rating agencies are certainly not the only ones who strive to keep a record on the habits of every Americans.  Unfortunately, the companies that do this sort of aggregation are positioned to do special damage if they are compromised.

What is P.I.I.? P.I.I. as Regulatory Category

The informal definition of P.I.I. (personally identifiable information) is plain - it is information that can be used to identify an individual person.  It is important, though, that this informal definition is a bit fuzzy around the boundaries and P.I.I. is also a formal regulatory category defined differently in different jurisdictions.

For example, in the United States the NIST defines P.I.I. as "any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual's identity, such as name, social security number, date and place of birth, mother's maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information."

On the other hand, the European Union's General Data Protection Regulation does not utilize the concept of P.I.I. but rather the more inclusive concept of "personal data" which includes all information about an individual whether it can be used to identify them or not.

No Data, No Breach

The most sensitive data often has the least real content.  Your social security number, for example, doesn't say much of anything about you - you might say it was a bit of math that refers to you.  Why should businesses carry it around like it really means something and thus put you at risk?  They could replace it with something else that remembers enough to do the work of your SSN but not enough to be useful to a hacker.  This is the Capnion philosophy: no data, no breach.

Zero-knowledge proofs and the total-knowledge status quo

There are also sorts of processes out there that are really about proof but rarely stated this way.  When you call the bank and verify your identity your mother's maiden name, they are not interested in the name per se but the proof that you know it.  Record linkage processes behind the scenes essentially operate on proof, done in the CPU of a computer, that a collection of records all refers to the same person in real life - just what a person's name is doesn't matter, but how it corresponds to other names in other records.  It's not a term in wide circulation, but you might call these total-knowledge proofs in that the information about the names is exposed.

There is a cryptographic technique called a zero-knowledge proof that allows these linkages and verifications to be performed without giving away anything about the data in question.  They are a natural fit for the P.I.I. (personally identifiable information) held by businesses about consumers, as this information is rarely of interest in it's own right but is instead used for the sort of matching and identification mentioned.  Capnion's position is that these zero-knowledge methods should replace their total-knowledge counterparts throughout the economy, eliminating the need for many businesses to ever hold unencrypted data on consumers.  

What to watch in the Dixons Carphone breach...

There are a number of interesting takeaways from the data breach announced by European electronics firm Dixons Carphone earlier this week.

First, the breach provides partial validation of the new chip-and-pin technology.  Many compromised cards had this new technology and as secondary authenticators, like CVV values and PINs, were NOT compromised these consumers may be relatively safe.  This is is also a validation of the general principle that it is good design to set up multiple necessary points of failure that attackers must compromise before real damage is done.

Second, the Dixons Carphone breach will be worth following going forward as it may involve violations of the new GDPR data privacy regulations.  If Dixons is punished under the new law they may be among the first and their case will set a tone for how the law is applied going forward.

Finally, an interesting side note is the prior 2015 breach at Dixons breach which proceeded by a rather innocuous attack vector: an out of data WordPress site... 

Data breach and identity theft

Identity theft is a growing problem and probably one of the creepier threats to your finances at large today.  There are many things we can personally do to protect ourselves, such as shredding documents when they are no longer needed, regularly checking credit reports, etc.  However, data on consumers lost by business in data breaches is a major source of the sensitive data (names, credit card numbers, and so on) that fuels criminals.  In some cases, the most famous being the loss of information on nearly 150 million consumers at Equifax, there isn't a whole lot for private citizens to do.  This is why it's so important that the public become involved in driving the next wave of data privacy technology - the consumer is the principal beneficiary and can take a role in making sure that enterprise is doing what needs to be done to protect them. 

Blockchain and Data Privacy

There are many businesses considering putting data (supply chain data) on a blockchain but not much conversation of the liabilities this presents.  In most blockchain architectures, all the data on the blockchain is actually held by all the nodes on the network and then each of these nodes has the power to lose this data in a breach.  Given how much trouble the world is having protecting data stored on centralized servers, reproducing the data many times is likely to produce an unprecedented data privacy crisis as multiplication of opportunity produces a multitude of new breaches.

What is P.I.I. (a.k.a. PII or personally identifiable information)?

An important category of data is personally identifiable information, often referred to as P.I.I. or PII, and it's name is suggests accurately what it is: information that can be used to identify an individual person.  There are many very familiar examples like name, social security number, address, etc.  Some more arcane examples are the sorts of things one needs to supply as a secondary verification of identity at the bank, such as mother's maiden name.  P.I.I. is often an explicitly spelled-out regulatory category but there are a number of pieces of information that considered as P.I.I. across jurisdictions and the philosophy defining P.I.I. is consistent even when the level of inclusiveness is not.  (Is the name of your first pet personally identifiable information?)

P.I.I. is important privacy is not just about what information is available, but what information can be tied back to an individual.  Medical records provide a great example.  If someone steals your medical records, this is not perhaps so bad if they lack information to tie these records back to you - from their perspective, they don't have your medical records but only some unknown person's medical records.

 

How can Ghost PII improve the security of what I am building?

Capnion's API is intended to permit developers to enhance the security of the applications using the Ghost PII protocol.  Below I will walk through one, hopefully familiar, example of a transaction involving personal information and explain how its security can be improve with Ghost PII.

We've all had to tell a bank website our mother's maiden name, or pass on some similar private information, to prove our identity at some point.  This is an unfortunate situation as protecting privacy requires moving around more information that jeopardizes privacy.  Capnion's technology has the power to fix this situation - in particular, it can ensure that no computer needs to ever hold your unencrypted response in memory ever at any time.

Here's how it works: software integrated to your browser encrypts your response (your mother's maiden name) when you type it in and this encrypted response is all that is ever sent to the bank.  It is all that sent when you open your account and establish your security questions, just as it is all that is sent when you prove your identity later.  The bank only holds encrypted data and never has the ability to decrypt it.  When the bank needs to check your answer, you can grant them permission to request a special key from Capnion's API that they can use to compare the two ciphertexts you gave them.  This special key lets the bank know whether you gave the same answer both times and nothing else.  

This transaction is an example of what is called a zero-knowledge proof.  You have proven to the bank that you are who you say, and 'zero-knowledge' refers to the fact that the bank has learned nothing about your mother's maiden name. 

How do I use Ghost PII?

Just how Ghost PII works is a bit technical but that doesn't mean you need to be a rocket scientist to use it.  The core of Ghost PII is an API, maintained by Capnion, that provides you with specialized encryption keys for doing homomorphic computations (computations on encrypted data) on personally identifiable information like name, social security number, etc.

In a prototypical use case, the first thing you should do when you obtain personally identifiable information is call Capnion's API for an encryption key.  Once the data is encrypted correctly there is no way to lose it to an attacker unless your system and Capnion's suffer total breaches at exactly the same time, and Capnion's system is designed both with top priority given to security and conservative data governance.

Imagine that that down the road you need to compare encrypted addresses, perhaps out of concerns that two addresses only differ in a superficial way like replacement of "Road" with "Rd." or similar abbreviation.  You can then request a specialized key that will permit you to compute the number of characters that two encrypted addresses have in common without any need of decrypting. 

This has many benefits.  You never need to decrypt, which improves security.  You have cut out the time and computational resources you might have spent on de and re crypting, which saves money and time (which is also money).  You do not need to know for sure what kind of entity resolution you want to do down the road when you encrypt, nor do you need to grant the analyst examining duplicated addresses permission to see this personal information, and these break down the opposition between security and convenience.  Ghost PII is a pure win for your business because it essentially eliminates a hard tradeoff, one that has forced many businesses to work with plaintext in the past.

Ghost PII

Personally identifiable information, commonly abbreviate PII, refers to information like a name, social security number, etc.  It is sometimes a formal regulatory category and it is among the more sensitive information commonly lost in data breaches - to lose a person's medical records, for example, is more serious if there is information that can be used to tie a particular person to those records.  Much of this PII is notable for not having a whole lot of content: your social security number doesn't say much about you on it's own, but it is rather an arbitrary number (originally) used to help the government organize records about you.

Capnion has developed a specialized cryptographic protocol called Ghost PII that lets businesses work with your personally identifiable information while it is still encrypted, permitting them to keep it encrypted it all times.  Let me give some detail on how it works.  Any really secure method of encryption should produce two different ciphertexts when applied to the same social security number twice... without homomorphic encryption, there would be no way to determine if two ciphertexts had come from the same social security number originally without decrypting.  This constant need of decryption is part of what drives the breach crisis.  Capnion's Ghost PII is a technique and set of software tools for encrypting data that allows linking records on encrypted identifying numbers, determing which ciphertexts came from the same social security number without need of decrypting.

The Cost of a Breach

The costs of a data breach, to the company breached and to the public, are considerable.  There are direct costs from things like PR and legal fees from ensuing lawsuits.  There are indirect costs, perhaps more formidable, from damage to reputation and loss of trade secrets.  Each breach is different, but it is common to estimate the cost of a breach at around $140 per lost record and some discussion of these estimates is given at this link.

Once again, each breach is different and institutions may have unique liabilities.  One interesting example was the case of Independent News and Media, discussed at this link, an Irish media company that suffered a data breach.  The case was interesting because their data contained secrets about their confidential sources and thus the breach presented a threat to journalistic freedom of general public interest.

If none of these things move you, it is still never fun to lose your job.

Breaches and Human Frailty

Computer security would probably be easier if there were no humans involved.  Almost anything you would do to protect your system can be nullified by sufficiently negligent or malicious actions by your employees.  After-the-fact analyses of data breaches bear this intuition out, like that described at the link below.  It found that 1 in 4 data breaches was the work of insiders.

https://www.theregister.co.uk/2018/04/10/verizon_dbir/

One of the costs of a data breach is embarrassment, and this risk is heightened when there is a potential for a juicy, shareable headline with phrases like "employee worked with outside criminal" 

https://www.usatoday.com/story/tech/2018/04/20/many-1-5-million-accounts-may-have-been-compromised-suntrust-banks/535687002/

This suggests the power of Ghost PII: Why keep holding data that presents this sort of danger if you can get things done without it?  Do you need to know what a customer's SSN if you were only using it to link records?  The answer to the latter question is "No!" and Capnion is working to build a world where no one has the power to cause a breach because they have no need of it.

What is homomorphic encryption? Who cares?

It seems reasonable to presume that to do any sort of work on encrypted data, you should need to decrypt it first, but this is not the case.  Suppose you have two numbers a and b as well as encryption and decryption algorithms Enc and Dec.  These algorithms are said to be homomorphic in addition (substitute in multiplication throughout the following if you like) if there is a third algorithm Add(_,_) such that Dec(Add(Enc(a),Enc(b))) = a + b.

The very short story here is that homomorphic encryption is about doing work on encrypted data without needing to decrypt it, or otherwise learn about it, and still getting the right answer.

It's a big problem today how much plaintext is lying around.  There is a data breach announced almost every day and the data lost was rarely encrypted because someone needed to do work on it.  Working on encrypted data directly allows full-time encryption, and full-time encryption will allow a standard of security that ends data breaches for good.

The problem we want to solve...

People ask too rarely why the data lost in a breach wasn't encrypted.  The answer, in the past at least, has been that someone needed to do work on that data (analytics, ETL, etc.).  However, it has become possible to do this sort of work directly on ciphertext, answering questions about encrypted data without need of decrypting it, and this permits keeping sensitive data encrypted at all times.  This is what we are working on at Capnion!