activation-clustering poison detection

AI / MLtransform

(Dataset, Model) -> CleanedDataset

Isolate poisoned training data by clustering the activation vectors of a model's deep neural layers and isolating outlier sub-clusters within each class label.

Problem it solves

Stealthy backdoor poisoning attacks insert malicious behaviors into training sets without shifting raw input statistical means.

Consumes

DatasetModel

Emits

CleanedDataset

Distilled from 1 source

The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.

Trusted-AI/adversarial-robustness-toolboxgithub