Official implementation of SPECTRE: Defending Against Backdoor Attacks Using Robust Covariance Estimation.
Prerequisites
- Python 3.9
- Julia 1.6
- Poetry is recommended for Python package management.
A requirements.txt is provided as a fallback for use with pip or Anaconda.
Installation
poetry install
julia --project=. -e "using Pkg; Pkg.instantiate()"Experiments are named using a specific convention:
{model}-{trainer}-{source_label}{target_label}-{m}x{attack_type}{eps_times_n}model: Can ber18for a ResNet-18 orr32pfor ResNet-32.trainer: Can besgdorranger. This also selects a set of hyperparameters (learning rate schedule, weight decay, etc.) that work well for that optimizer.rangeris recommended forr18models andsgdis recommended forr32pmodels.source_labelandtarget_label: Integers from0to9corresponding to labels of CIFAR-10.m: An integer, which is the number of ways to split the attack. We tried values of1,2, and3.attack_type: Can bepfor pixel attacks orsfor periodic (i.e. sinusoidal) attacks.eps_times_n: Integer number of poisoned samples.
Example: name=r32p-sgd-94-1xp500
The files related to experiment $name are stored in the directory output/$name.
Initial training
First we train a model on the poisoned dataset.
poetry run python train.py $nameThis should save a PyTorch serialized model to output/$name/model.pth.
Compute hidden representations
Next we run the training data through the network and save the hidden representations to a file to be read later.
poetry run python rep_saver.py $nameThis should save NumPy serialized arrays to output/$name/label_$label_reps.npy for $label from 0 to 9.
Ususally, we are only interested in the file corresponding to the target label.
Run defences
We read the representations and execute the filters against them, producing three samples masks specifying which samples should be used for retraining.
julia --project=. run_filters.jl $nameThis produces three files in output/$name/:
mask-pca-target.npyfor the PCA defense.mask-kmeans-target.npyfor the Clustering defense.mask-rcov-target.npyfor the SPECTRE defense.
Retrain the networks on the cleaned datasets
poetry run python train.py $name $mask_nameThis reads the mask from output/$name/$mask_name.npy and trains the network from scratch on the resulting masked dataset.
For attacks not implemented here, you will need to find a way to obtain the hidden representations of the network in npy format.
You can then put it in a directory under output with an arbitrary name as long as it ends in {eps_times_n}, which is needed by to determine how many samples to remove.
You can then pass that name to run_filters.jl.