Cryptographic Primitives for ML Security

ML systems are increasingly deployed in security-critical applications, but traditional defenses often rely on heuristics that lack formal guarantees. In this project, we explore how cryptographic primitives can provide provable guarantees for model integrity, provenance, and privacy.

We study data repudiation – proving a given data point was not used in model training – and give the first algebraic conditions for unforgeability of stochastic gradient descent, which guarantees that a model checkpoint cannot be forged from different training data.

We develop CLUE-Mark, the first provably undetectable watermarking scheme for diffusion models. Unlike prior schemes, it leverages the hardness of the CLWE problem to guarantee security against all possible attacks (including steganographic attacks) while maintaining output quality, making it suitable for verifying the provenance of AI-generated content.

To defend against model inversion attacks and preserve privacy of training data, we introduce a cryptographic embedding protection scheme that ensures fuzzy one-wayness for the L2 norm. It allows for similarity search functionality – such as in face authentication ML systems – while preventing reconstruction of sensitive training data from model outputs.