Defeating Model Inversion

Another prominent attack on the OWASP Top 10 list is model inversion. Given the output of the ML model, it recovers the inputs approximately well. This is serious security and privacy concern. Consider a face authentication service. When you enroll for face authentication, it uses a face recognition model to compute a vector representation (called an embedding) and stores it on the server. When you authenticate later, the service recomputes the vector and checks if it is close to the original stored on the server. Now imagine the server gets hacked and the embedding vectors are leaked, similar to how passwords are leaked in common databases breaches. In the last few years, powerful model inversion attacks have been shown that can recover the original face pretty accurately given the breached embedding vectors. The question is: How can we protect users face images from being recovered from the breached information? The problem I describe here isn’t specific to face authentication; it is applicable to vector representations of different modalities (text, image, sound) generated by ML models from sensitive inputs, when faced with the threat of model inversion attacks.

It is worth noting that the textbook defense for protecting password strings, namely hashing the password along with a random salt, doesn’t work anymore in this context. Why? Vector embeddings generated from ML models are often locality-sensitive (or fuzzy), i.e., they map similar inputs to similar vector representations or embeddings. This noise tolerance is precisely why they are useful1. So using a standard cryptographic hash on the embedding vector will destroy locality-sensitivity, i.e, a small change in the input will not match the original hash value stored at the service, breaking the desired functionality. In order to address this threat, we conceive a novel cryptographic primitive that we call a fuzzy one-way hash. Put simply, it must be both locality-sensitive (fuzzy) and hard to invert even approximately (fuzzy one-way). Our initial work shows attacks against present ad-hoc mechanisms that are being used in place of the idealized primitive we propose. We also show a negative result about the existence of statistically secure fuzzy one-way hashes. Computationally secure schemes may still exist but we aren’t aware of any. All and all, our proposed notion of fuzzy one-way hashing can be seen as the first systematic characterization of a primitive to address model inversion attacks generically