Yunyan Duan

Data Science | Computational Linguistics | Cognitive Science

Rational models of eye movements in reading

(Research at Northwestern University)

To understand how eye movement control and word identification work in reading, we use rational analysis with computational modeling to study how different sources of information (visual, linguistic, and contextual) act interactively to identify words and influence eye movement decisions. (Some code here).

Reduce word error rate in speech-to-text transcriptions

(Internship at Tencent, Summer 2020)

  • Evaluated and improved the quality of Mandarin speech-to-text subtitles to increase user engagement with an audiobook app.
  • Measured word error rate by annotating representative samples from a corpus of 1.8k documents with 8 million tokens, and identified a specific genre as the target to improve by analyzing root causes of word errors.
  • Implemented and evaluated text error correction pipelines with different components, including language models, neural network models, and NER models, and reported the best model based on my evaluation results.
  • Incorporated the offline evaluation model into the subtitle feature, yielding reduced word error rate and ensuring satisfactory quality of subtitles.

Predict readmission in pediatric ICU from text data

(Research Assistant at Feinberg School of Medicine, Northwestern University, 2016 – 2017)

Useful information for predicting readmission after pediatric ICU hospitalization can come from electronic health records, especially social workers’ notes. We formalize the idea by extracting text features from this unstructured data and implementing classifiers to predict readmission probability.

Two Shanghai library open data challenge projects

(Part-time)

  • (2019) Architecture highlights in Shanghai We (a team of 7 people) developed a website featuring architectures of historical importance in Shanghai. I was responsible for the core backend logic, which was a text classification model that put architectures into general categorized based on their short descriptions.
  • (2018) Word evolution in ancient Chinese poems I developed a website aiming to help researchers gain insights about word evolution, style change, and social evolution as reflected in ancient Chinese poems over hundreds of years. I implemented the website using Python/Django and visualized data patterns using R Shiny.

Causal inference

I use causal inference techniques (e.g. propensity score weighting, meta learners, uplift modeling) to understand whether and when a strategy helps product growth.