How Well do Feature Visualizations Support Causal Understanding of CNN Activations?

Published in NeurIPS 2021, 2021

Zimmermann, R. S., Borowski, J., Geirhos, R. , Bethge, M., Wallis, T. S. A. and Brendel, W., How Well do Feature Visualizations Support Causal Understanding of CNN Activations?

A precise understanding of why units in an artificial network respond to certain stimuli would constitute a big step towards explainable artificial intelligence. One widely used approach towards this goal is to visualize unit responses via activation maximization. These feature visualizations are purported to provide humans with precise information about the image features that \emph{cause} a unit to be activated — an advantage over other alternatives like highly activating dataset samples. If humans indeed gain causal insight from visualizations, this should enable them to predict the effect of an intervention, such as how occluding a certain patch of the image (say, a dog’s head) changes a unit’s activation. Here, we test this hypothesis by asking humans to decide which of two square occlusions causes a larger change to a unit’s activation. Both a large-scale crowdsourced experiment and measurements with experts show that on average the extremely activating feature visualizations by Olah et al. 2017 indeed help humans on this task ($68 \pm 4$% accuracy; baseline performance without any visualizations is $60 \pm 3$%). However, they do not provide any significant advantage over other visualizations (such as e.g. dataset samples), which yield very similar performance ($65 \pm 3$% to $66 \pm 5$% accuracy). Taken together, we propose an objective psychophysical task to quantify the benefit of unit-level interpretability methods for humans, and find no evidence that a widely-used feature visualization method provides humans with better ``causal understanding’’ of unit activations than simple alternative visualizations.

Project website    Full paper

 author = {
  Zimmermann, Roland S. and
  Borowski, Judy and
  Geirhos, Robert and
  Bethge, Matthias and
  Wallis, Thomas S. A., and
  Brendel, Wieland
 title = {
  How Well do Feature Visualizations
  Support Causal Understanding
  of CNN Activations?
 journal = {CoRR},
 volume = {abs/2106.12447},
 year = {2021},