Yunyan Duan

Data Science | Linguistics | Cognitive Science

All posts in one long list


Resources for learning causal inference

Causal inference is a statistical method that aims at determining the causal effect of a factor on an outcome. While common statistical tools (e.g. regression) describe the correlative relationship between variables, causal inference goes beyond correlation and tries to estimate the causal influence of one variable on the other. It could be a difficult philosophy question to say what causal actually means, but we can simply say that variable A (called an ‘intervention’ or a ‘treatment’) has a causal effect on variable B (called an ‘outcome’) if B would be different without A. Counterfactual is thus a useful concept to think about causality. Nowadays, data scientists may need causal inference to answer questions about the influence of an intervention from observational data, if an A/B test of that intervention is not available or too expensive.

A famous example that illustrates the drawbacks of only looking at correlational relationships between variables in the data is the Simpson’s paradox. Since confounding factors widely exist in observational data, misinterpretations based on correlation are very likely. It may not be a problem if the task is not to interpret, and the task of prediction may actually benefit from correlative features as long as the data and the model can generalize. But if interpretation is what one really concerns, then they should look for the causal effect of the intervention on the outcome. Whenever possible, an A/B test should be carried out and conclusions should be based on these experimental data, as this approach provides a gold standard of the intervention’s effect. If an A/B test is not possible, such as studying the influence of a state-wise policy, or studying some economic phenomena, then causal inference should be adopted instead of other commonly-used statistical tools. Even so, one should always be cautious when they draw causal conclusions from observational data, as prerequisites of a causal inference method may or may not be satisfied.

Here is a list for beginners who may want to use causal inference in their work:

  • I highly recommend one starts with this series of blog posts, as these posts are very beginner-friendly and give an overview of causal inference.

  • For a high-level understanding of causal inference, I recommend The Book of Why by Judea Pearl. You may also check out the author’s page for more books/tutorials on related topics.

  • There are many powerful Python and R packages. Here are my picks:
    • CausalML, a Python package from Uber. Easy to use, with many causal inference methods.
    • EconML and DoWhy, two Python packages from Microsoft. Great documentation and theoretical explanations.
    • grf, an R package that implements a tree-based algorithm called ‘causal forest’. See these papers for more details (paper1,paper2).
  • The above-mentioned packages provide good tutorials and examples. In addition, this page lists several industrial use cases with slides and code as presented in KDD 2021. This page shows a detailed example of running meta-learners using CausalML.

  • Some resources in Chinese:
    • This blog for introduction to causal inference, especially propensity score matching.
    • This series of notes for a glimpse of The Book of Why and more illustrations of the theory.
    • This talk for a general introduction to causal inference.

Tower of Babel in Fanworks: An Analysis of AO3 Tags

This is an analysis of fantom tags in terms of their media and language. This is also a guidance to plot in Python.

Report here: report.

Code here: code.

How to build this website

Here is what I did to build this my very first website. I used Jekyll, a static website generator, to build the website, and used GitHub Pages to host this website.

For building the website with Jekyll, I find this video very useful. You can start with a theme you like from Jekyll Themes; my theme is jekyllDecent.

For publishing the website with GitHub Pages, I basically follow this tutorial and this video(I don’t follow their suggestion modifying the head.html and my website just works fine).

Tips:

  • Installing Jekyll: If you use a Mac like I do, it’s likely that gem install jekyll does not work (you may check this troubleshooting page). Try sudo gem install jekyll instead.

  • Publishing the website: You need to go to _ config.yml and change url: and baseurl: to be the base hostname & protocol for your site. If you use the same theme as I do, modify robots.txt as well. Make sure your url is something like https://<yourusername>.github.io, not http://..., which may lead to a display error like This page is trying to load scripts from unauthenticated sources (find the solution here).

  • To change the order of your pages shown in the navigation, go to _pages, and add a weight: <#orderYouWant> parameter to the layout section of each page.