Demystifying amortized causal discovery with transformers


Journal article


Francesco Montagna, Max Cairney-Leeming, Dhanya Sridhar, Francesco Locatello
arXiv.org, 2024

Semantic Scholar ArXiv DBLP DOI
Cite

Cite

APA   Click to copy
Montagna, F., Cairney-Leeming, M., Sridhar, D., & Locatello, F. (2024). Demystifying amortized causal discovery with transformers. ArXiv.org.


Chicago/Turabian   Click to copy
Montagna, Francesco, Max Cairney-Leeming, Dhanya Sridhar, and Francesco Locatello. “Demystifying Amortized Causal Discovery with Transformers.” arXiv.org (2024).


MLA   Click to copy
Montagna, Francesco, et al. “Demystifying Amortized Causal Discovery with Transformers.” ArXiv.org, 2024.


BibTeX   Click to copy

@article{francesco2024a,
  title = {Demystifying amortized causal discovery with transformers},
  year = {2024},
  journal = {arXiv.org},
  author = {Montagna, Francesco and Cairney-Leeming, Max and Sridhar, Dhanya and Locatello, Francesco}
}

Abstract

Supervised learning approaches for causal discovery from observational data often achieve competitive performance despite seemingly avoiding explicit assumptions that traditional methods make for identifiability. In this work, we investigate CSIvA (Ke et al., 2023), a transformer-based model promising to train on synthetic data and transfer to real data. First, we bridge the gap with existing identifiability theory and show that constraints on the training data distribution implicitly define a prior on the test observations. Consistent with classical approaches, good performance is achieved when we have a good prior on the test data, and the underlying model is identifiable. At the same time, we find new trade-offs. Training on datasets generated from different classes of causal models, unambiguously identifiable in isolation, improves the test generalization. Performance is still guaranteed, as the ambiguous cases resulting from the mixture of identifiable causal models are unlikely to occur (which we formally prove). Overall, our study finds that amortized causal discovery still needs to obey identifiability theory, but it also differs from classical methods in how the assumptions are formulated, trading more reliance on assumptions on the noise type for fewer hypotheses on the mechanisms.