Data

The Emutivo research has leveraged datasets collected externally and internally. Each of these datasets is detailed below with any papers our team has published on that data. The git repository links following each paper contain the feature datasets, model code, results, and/or additional visualizations for that paper. All internal datasets were collected under WPI IRB 00007374 File 18-0031 first approved 23 October 2017.

LEMURS

The Leveraging Early Mental health Uncovering Risk for Suicide (LEMURS) dataset is currently being collected from November 2025 under Prof. Rundensteiner and Prof. Dixon-Gordon, learn more here.

DepreST-CAT

The DepreST Call and Text (DepreST-CAT) dataset was collected between December 2020 and April 2021 by Miranda Reisch, ML Tlachac, and Prof. Rundensteiner. The DepreST-CAT dataset contains retrospective call and text logs labeled with demographics, PHQ-9 depression screening scores, and GAD-7 anxiety screening scores from over 369 Prolific crowd-sourced participants.

Paper: https://dl.acm.org/doi/10.1145/3534596
Data: https://github.com/mltlachac/DepreST-CAT

If you use the DepreST-CAT dataset, cite:

ML Tlachac, Ricardo Flores, Miranda Reisch, Katie Housekeeper, Elke Rundensteiner, “DepreST-CAT: Retrospective Smartphone Call and Text Logs Collected During the COVID-19 Pandemic to Screen for Mental Illnesses”, ACM Proceedings on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), vol. 6, no. 2, 2022.

Additional papers that leverage the DepreST-CAT logs include:

StudentSADD

The Student Suicidal Ideation and Depression Detection (StudentSADD) dataset was collected between August 2020 and January 2021 by the 2020 REU team and 2020-2021 MQP team advised by ML Tlachac and Prof. Rundensteiner with assistance from Ermal Toto. The StudentSADD dataset contains text prompts, unscripted voice transcripts, unscripted voice openSMILE features, and scripted voice openSMILE features labeled with demographics and PHQ-9 depression screening scores from over 300 college student participants.

Paper: https://dl.acm.org/doi/10.1145/3534604
Data: StudentSADD data access
Code: StudentSADD code for baseline models

To obtain access to the data please fill the StudentSADD data agreement form and email it to
gr-studentsadd@wpi.edu using an email account affiliated with a higher education institution.

If you use the StudentSADD dataset, cite:

ML Tlachac, Ricardo Flores, Miranda Reisch, Rimsha Kayastha, Nina Taurich, Veronica Melican, Connor Bruneau, Hunter Caouette, Joshua Lovering, Ermal Toto, Elke Rundensteiner, “StudentSADD: Rapid Mobile Depression and Suicidal Ideation Screening of College Students during the Coronavirus Pandemic”, ACM Proceedings on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), vol. 6, no. 2, 2022.

Additional papers that leverage the StudentSADD data include:

EMU

The Early Mental Health Uncovering (EMU) framework is for mental illness screening with active and passive modalities. The EMU dataset was collected by the 2019-2020 MQP team advised by Ermal Toto, ML Tlachac, and Prof. Rundensteiner. This dataset contains scripted and unscripted voice recordings, retrospective smartphone logs, and Twitter data labeled with demographics, PHQ-9 depression screening scores, and GAD-7 anxiety screening scores from over 60 crowd-sourced participants. Available portions of the EMU dataset are accessible through the EMU GitHub repository. Additional EMU dataset features are described in the following linked papers and accessible through the corresponding linked GitHub repositories.

If you use the EMU dataset, cite:

ML Tlachac, Ermal Toto, Joshua Lovering, Rimsha Kayastha, Nina Taurich, Elke Rundensteiner, “EMU: Early Mental Health Uncovering Framework and Dataset”, 20th IEEE International Conference on Machine Learning and Applications (ICMLA), 2021.

If you use the text message or Twitter subset of this dataset, cite:

ML Tlachac and Elke Rundensteiner, “Screening for depression with retrospectively harvested private versus public text,” IEEE Journal of Biomedical and Health Informatics (BHI), volume 24, no. 11, 2020, pp. 3326-3332.

Moodable

The Mood Assessment Capable (Moodable) framework is for depression assessment with retrospectively harvested smartphone and social media data. This was the first dataset collected by a team at WPI, namely the 2017-2018 MQP team advised by Ermal Toto, Prof. Agu, and Prof. Rundensteiner. This dataset contains scripted voice recordings, retrospective smartphone logs, and social media data labeled with PHQ-9 depression screening scores from over 300 crowd-sourced participants. Available Moodable dataset features are described in the following linked papers and accessible through the corresponding linked GitHub repositories.

If you use the Moodable dataset, cite:

Ada Dogrucu, Alex Perucic, Anabella Isaro, Damon Ball, Ermal Toto, Elke A. Rundensteiner, Emmanuel Agu, Rachel Davis-Martin, Edwin Boudreaux, “Moodable: On feasibility of instantaneous depression assessment using machine learning on voice samples and retrospectively harvested smartphone and social media data,” Smart Health, 2020.

If you use the text message or Twitter subset of this dataset, cite:

DAIC-WOZ & E-DAIC

The external Wizard-of-Oz (WOZ) and Extended-WOZ subsets of the Distress Analysis Interview Corpus (DAIC) contains clinical interviews with mental illness labels. We have leveraged the audio, transcript, and facial feature components of these interviews to screen for depression and PTSD.

StudentLife

The external StudentLife dataset contains sensor and survey data from 48 students. We have leveraged the GPS component of this dataset to screen for depression.

Paper: Classifying Depression in Imbalanced Datasets Using an Autoencoder- Based Anomaly Detection Approach

Emotivo

This project started by performing emotion detection. Emotivo is our name for the combination of the external Surrey Audio-Visual Expressed Emotion (SAVEE) database, the external RML Emotion Database, and the external Berlin Database of Emotional Speech. RML used 6 basic emotions: anger, disgust, fear, happiness, sadness, and surprise, while SAVEE and Berlin added a neutral state.

Paper: Improving Emotion Detection with Sub-clip Boosting (gitlab)