Our paper “Deep Classifier Mimicry without Data Access” got accepted at AISTATS 2024 as an oral presentation

Our paper “Deep Classifier Mimicry without Data Access” got accepted at the International Conference on Artificial Intelligence and Statistics (AISTATS) 2024. We are honored to have been selected for an oral presentation!

In the paper, we present a method for data-free and model-agnostic knowledge distillation. The essence of knowledge distillation is to train a model from scratch to mimic an already trained one, i.e. extract its knowledge. When we speak of data-free knowledge distillation, we mean that we do not rely on having access to any original or real data at all, that is we do not know what the model was trained on, only what it was trained for. When we speak of model-agnostic knowledge distillation, we mean we also don’t assume the model to follow a particular structure, so we do not craft solutions by e.g. mimicking a specific internal layer or accessing its representations. We show that for classifiers, we can create such a knowledge distillation process through diffusion and contrastive learning. We call the respective method CAKE: contrastive abductive knowledge extraction. In summary, we diffuse pairs of random synthetic noise examples to attract each other until they just barely fall on two different sides of a decision boundary. Colloquially speaking, we try to empirically “cover” the decision boundary as closely as possible, without the need to mirror the original training data distribution.

For more details read the abstract below and the full paper:

“Access to pre-trained models has recently emerged as a standard across numerous machine learning domains. Unfortunately, access to the original data the models were trained on may not equally be granted. This makes it tremendously challenging to fine-tune, compress models, adapt continually, or to do any other type of data-driven update. We posit that original data access may however not be required. Specifically, we propose Contrastive Abductive Knowledge Extraction (CAKE), a model-agnostic knowledge distillation procedure that mimics deep classifiers without access to the original data. To this end, CAKE generates pairs of noisy synthetic samples and diffuses them contrastively toward a model’s decision boundary. We empirically corroborate CAKE’s effectiveness using several benchmark datasets and various architectural choices, paving the way for broad application.”
Abstract of our paper “Deep Classifier Mimicry without Data Access”