Course syllabus#

Content#

Eight sessions of three hours presenting important concepts and methods on machine learning for econometrics. The course focuses on flexible models and causal inference in high dimensions. Most of the sessions display a mix between theoretical considerations and practical application with hands-on in python or R.

The last session is a presentation of the course evaluation project by groups of students.

For now, the website covers only the contents for the following topics:

1. Statistical learning and regularized linear models
1. Flexible models for tabular data
1. Reminders of potential outcomes and Directed Acyclic Graphs
4.a. Event studies: Causal methods for panel data

Motivation#

High dimension: sparsity in confounders (lasso, double lasso), nonlinearities in confounders (double ML), heterogeneities of effects (generic ML).

Session 1 – Statistical learning and regularized linear models#

Reminder of statistical learning: Bias variance tradeoff, appropriate representation, over/under-fitting
Regularized regression : lasso, ridge, elastic net, post-lasso
Practical session:
- Common pitfalls in the interpretation of coefficients of linear models.
References:
- Estève et al. (2022)
- Hastie et al. (2017)

Session 2 – Flexible models for tabular data#

Trees, random forests, boosting
Cross-validation, nested cross-validation
Practical session: Hyper-parameters selection for flexible models
References:
- Estève et al. (2022)
- Hastie et al. (2017)
- Murphy (2022)

Session 3 – Potential outcomes, Directed Acyclic Graphs, confounder selection#

Reminder on causal inference: prediction/causation, potential outcomes, asking a sound causal question (PICO)
Causal graph, front-door criteria, and valid adjustment sets.
Practical Session: DAGs, valid and invalid adjustment sets, with simple linear models and simulations. Introduction to the DoubleML package.
References:
- Chernozhukov et al. (2024), chapter 2 , chapter 3 , chapter 4 , chapter 5 , chapter 6 , chapter 7 , chapter 8
- Wager (2020), Chapter 1
- VanderWeele (2019)

Session 4a – Event studies: Causal methods for panel data#

A causal approach to Difference-in-Differences
Synthetic controls
Interrupted time series analysis and state space models
Practical session: Comparison of different methods for panel data
References:

Chernozhukov et al. (2024), chapter 4

Gaillac and L’Hour (2019), Chapter 8

Session 4b – Double-lasso for statistical inference#

Partial linear model
Double-lasso, introduction to Neyman-orthogonality
Practical session: Wage analysis from a statistical inference point of view
References: - Chernozhukov et al. (2024), chapter 4
- Wager (2020), Chapter 4
- Gaillac and L’Hour (2019), Chapter 2

Session 5 – Double machine learning: Neyman-orthogonality#

Importance of sample splitting for double machine learning
Double-robust estimator approach (also known as augmented inverse propensity weighting)
Debiased (or double) machine learning, Neyman-orthogonality, method-of-moments
Practical session: The Effect of Gun Ownership on Gun-Homicide Rates
References:
- Chernozhukov et al. (2024), chapter 10
- Gaillac and L’Hour (2019), Chapter 2
- Abadie (2021)

Session 6 – Heterogeneous treatment effect#

Learners : S, T, X, R learners
Best linear approximation
Causal forests
Practical session: CATE estimation on 401(k) dataset
References:
- Chernozhukov et al. (2024), chapter 14
- Wager and Athey (2018)
- Nie and Wager (2021)

Session 7 – Heterogeneous treatment effect#

Practical session: - Gaillac and L’Hour (2019), Chapter 6
- Kitagawa and Tetenov (2018)

Bibliography#

[1] (1,2)

Loïc Estève, Guillaume Lemaitre, Olivier Grisel, Gael Varoquaux, Arturo Amor, Lilian, Benoit Rospars, Thomas Schmitt, Lucy Liu, Bruno P. Kinoshita, hackmd-deploy, ph4ge, Peter Steinbach, Alexandre Boucaud, Benson Muite, Jérémie du Boisberranger, Michael Notter, Pierre, Shane P, alagarrigue, Mehrdad Mohammadian, and parmentelat. Inria/scikit-learn-mooc: third mooc session. Zenodo, 2022. URL: https://doi.org/10.5281/zenodo.7220307, doi:10.5281/zenodo.7220307.

[2] (1,2)

Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning: data mining, inference, and prediction. 2017.

[3]

Kevin P Murphy. Probabilistic machine learning: an introduction. MIT press, 2022.

[4] (1,2,3,4,5)

Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin Spindler, and Vasilis Syrgkanis. Applied causal inference powered by ml and ai. arXiv preprint arXiv:2403.02467, 2024. URL: https://causalml-book.org.

[5] (1,2)

Stefan Wager. Stats 361: causal inference. Stanford University, 2020. URL: https://web.stanford.edu/~swager/stats361.pdf.

[6]

Tyler J VanderWeele. Principles of confounder selection. European journal of epidemiology, 34:211–219, 2019.

[7] (1,2,3,4)

C Gaillac and J L’Hour. Machine learning for econometrics, lecture notes ensae paris. Lecture notes, 2019.

[8]

Alberto Abadie. Using synthetic controls: feasibility, data requirements, and methodological aspects. Journal of economic literature, 59(2):391–425, 2021.

[9]

Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228–1242, 2018.

[10]

Xinkun Nie and Stefan Wager. Quasi-oracle estimation of heterogeneous treatment effects. Biometrika, 108(2):299–319, 2021.

[11]

Toru Kitagawa and Aleksey Tetenov. Who should be treated? empirical welfare maximization methods for treatment choice. Econometrica, 86(2):591–616, 2018.