Interpreting CNNs Under Adversarial Attacks

This work investigates how convolutional neural networks (CNNs) utilize different frequency components of input images, revealing that many adversarial vulnerabilities stem from reliance on high-frequency features. The authors introduce Occluded Frequency, a metric that quantifies each frequency band’s contribution to predictions. They show that adversarial attacks disturb high-frequency content, and that robust models—particularly those adversarially trained—tend to depend more on low-frequency information, thus improving resilience to perturbations

Refer paper (Wang et al., 2020) for details.

References

  1. Towards Frequency-Based Explanation for Robust CNN
    Zifan Wang, Yilin Yang, Ankit Shrivastava, and 2 more authors
    CoRR, Apr 2020