Data
The Emutivo research has leveraged datasets collected externally and internally. Each of these datasets is detailed below with any papers our team has published on that data. The git repository links following each paper contain the feature datasets, model code, results, and/or additional visualizations for that paper. All internal datasets were collected under WPI IRB 00007374 File 18-0031 first approved 23 October 2017.
DepreST-CAT
The DepreST Call and Text (DepreST-CAT) dataset was collected between December 2020 and April 2021 by Miranda Reisch, ML Tlachac, and Prof. Rundensteiner. The DepreST-CAT dataset contains retrospective call and text logs labeled with demographics, PHQ-9 depression screening scores, and GAD-7 anxiety screening scores from over 369 Prolific crowd-sourced participants.
If you use the DepreST-CAT dataset, cite:
ML Tlachac, Ricardo Flores, Miranda Reisch, Katie Housekeeper, Elke Rundensteiner, “DepreST-CAT: Retrospective Smartphone Call and Text Logs Collected During the COVID-19 Pandemic to Screen for Mental Illnesses”, ACM Proceedings on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), vol. 6, no. 2, 2022.
Additional papers that leverage the DepreST-CAT logs include:
- Paper: Mental Health and Mobile Communication Profiles of Crowdsourced Participants (github)
- Paper: Symptom Detection with Text Message Log Distributions for Holistic Depression and Anxiety Screening (github)
- Paper: Left on Read: Reply Latency for Anxiety & Depression Screening (github)
StudentSADD
The Student Suicidal Ideation and Depression Detection (StudentSADD) dataset was collected between August 2020 and January 2021 by the 2020 REU team and 2020-2021 MQP team advised by ML Tlachac and Prof. Rundensteiner with assistance from Ermal Toto. The StudentSADD dataset contains text prompts, unscripted voice transcripts, unscripted voice openSMILE features, and scripted voice openSMILE features labeled with demographics and PHQ-9 depression screening scores from over 300 college student participants.
- Paper: https://dl.acm.org/doi/10.1145/3534604
- Data: StudentSADD data access
- Code: StudentSADD code for baseline models
To obtain access to the data please fill the StudentSADD data agreement form and email it to
gr-studentsadd@wpi.edu using an email account affiliated with a higher education institution.
If you use the StudentSADD dataset, cite:
ML Tlachac, Ricardo Flores, Miranda Reisch, Rimsha Kayastha, Nina Taurich, Veronica Melican, Connor Bruneau, Hunter Caouette, Joshua Lovering, Ermal Toto, Elke Rundensteiner, “StudentSADD: Rapid Mobile Depression and Suicidal Ideation Screening of College Students during the Coronavirus Pandemic”, ACM Proceedings on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), vol. 6, no. 2, 2022.
Additional papers that leverage the StudentSADD data include:
- Paper: BERT Variants for Depression Screening with Typed and Transcribed Responses
- Paper: Early Mental Health Uncovering with Short Scripted and Unscripted Voice Recordings (github)
EMU
The Early Mental Health Uncovering (EMU) framework is for mental illness screening with active and passive modalities. The EMU dataset was collected by the 2019-2020 MQP team advised by Ermal Toto, ML Tlachac, and Prof. Rundensteiner. This dataset contains scripted and unscripted voice recordings, retrospective smartphone logs, and Twitter data labeled with demographics, PHQ-9 depression screening scores, and GAD-7 anxiety screening scores from over 60 crowd-sourced participants. Available portions of the EMU dataset are accessible through the EMU GitHub repository. Additional EMU dataset features are described in the following linked papers and accessible through the corresponding linked GitHub repositories.
- Paper: Mobile Communication Log Time Series to Detect Depressive Symptoms (github)
- Paper: Text Generation to Aid Depression Detection: A Comparative Study of Conditional Sequence Generative Adversarial Networks (github)
- Paper: Left on Read: Reply Latency for Anxiety & Depression Screening (github)
- Paper: Automated Construction of Lexicons to Improve Depression Screening with Text Messages (github)
- Paper: Early Mental Health Uncovering with Short Scripted and Unscripted Voice Recordings (github)
- Paper: EMU: Early Mental Health Uncovering Framework and Dataset (github)
- Paper: Mobile Depression Screening with Time Series of Text Logs and Call Logs (github)
- Paper: Screening for Suicidal Ideation with Text Messages (github)
- Paper: Topological Data Analysis to Engineer Features from Audio Signals for Depression Detection (github)
- Paper: Audio-based Depression Screening using Sliding Window Sub-clip Pooling (gitlab)
- Paper: Screening for depression with retrospectively harvested private versus public text (github)
- Paper: Depression screening from text message reply latency (github)
If you use the EMU dataset, cite:
ML Tlachac, Ermal Toto, Joshua Lovering, Rimsha Kayastha, Nina Taurich, Elke Rundensteiner, “EMU: Early Mental Health Uncovering Framework and Dataset”, 20th IEEE International Conference on Machine Learning and Applications (ICMLA), 2021.
If you use the text message or Twitter subset of this dataset, cite:
ML Tlachac and Elke Rundensteiner, “Screening for depression with retrospectively harvested private versus public text,” IEEE Journal of Biomedical and Health Informatics (BHI), volume 24, no. 11, 2020, pp. 3326-3332.
Moodable
The Mood Assessment Capable (Moodable) framework is for depression assessment with retrospectively harvested smartphone and social media data. This was the first dataset collected by a team at WPI, namely the 2017-2018 MQP team advised by Ermal Toto, Prof. Agu, and Prof. Rundensteiner. This dataset contains scripted voice recordings, retrospective smartphone logs, and social media data labeled with PHQ-9 depression screening scores from over 300 crowd-sourced participants. Available Moodable dataset features are described in the following linked papers and accessible through the corresponding linked GitHub repositories.
- Paper: Mobile Communication Log Time Series to Detect Depressive Symptoms (github)
- Paper: Text Generation to Aid Depression Detection: A Comparative Study of Conditional Sequence Generative Adversarial Networks (github)
- Paper: Left on Read: Reply Latency for Anxiety & Depression Screening (github)
- Paper: Automated Construction of Lexicons to Improve Depression Screening with Text Messages (github)
- Paper: Mobile Depression Screening with Time Series of Text Logs and Call Logs (github)
- Paper: Screening for Suicidal Ideation with Text Messages (github)
- Paper: Moodable: On feasibility of instantaneous depression assessment using machine learning on voice samples and retrospectively harvested smartphone and social media data
- Paper: Topological Data Analysis to Engineer Features from Audio Signals for Depression Detection (github)
- Paper: Audio-based Depression Screening using Sliding Window Sub-clip Pooling (gitlab)
- Paper: You’re Making Me Depressed: Leveraging Texts from Contact Subsets to Predict Depression (github)
- Paper: Screening for depression with retrospectively harvested private versus public text (github)
- Paper: Depression screening from text message reply latency (github)
If you use the Moodable dataset, cite:
Ada Dogrucu, Alex Perucic, Anabella Isaro, Damon Ball, Ermal Toto, Elke A. Rundensteiner, Emmanuel Agu, Rachel Davis-Martin, Edwin Boudreaux, “Moodable: On feasibility of instantaneous depression assessment using machine learning on voice samples and retrospectively harvested smartphone and social media data,” Smart Health, 2020.
If you use the text message or Twitter subset of this dataset, cite:
ML Tlachac and Elke Rundensteiner, “Screening for depression with retrospectively harvested private versus public text,” IEEE Journal of Biomedical and Health Informatics (BHI), volume 24, no. 11, 2020, pp. 3326-3332.
DAIC-WOZ & E-DAIC
The external Wizard-of-Oz (WOZ) and Extended-WOZ subsets of the Distress Analysis Interview Corpus (DAIC) contains clinical interviews with mental illness labels. We have leveraged the audio, transcript, and facial feature components of these interviews to screen for depression and PTSD.
- Paper: DeepScreen: Boosting Depression Screening Performance with an Auxiliary Task
- Paper: Multi-Task Learning Using Facial Features for Mental Health Screening
- Paper: Text Generation to Aid Depression Detection: A Comparative Study of Conditional Sequence Generative Adversarial Networks (github)
- Paper: Temporal Facial Features for Depression Screening
- Paper: AudiFace: Multimodal Deep Learning for Depression Screening
- Paper: Transfer Learning for Depression Screening from Follow-up Clinical Interview Questions
- Paper: Ensembles of BERT for Depression Classification (github)
- Paper: Depression Screening Using Deep Learning on Follow-up Questions in Clinical Interviews
- Paper: AudiBERT: A Deep Transfer Learning Multimodal Classification Framework for Depression Screening (colab)
- Paper: Topological Data Analysis to Engineer Features from Audio Signals for Depression Detection (github)
- Paper: Audio-based Depression Screening using Sliding Window Sub-clip Pooling (gitlab)
StudentLife
The external StudentLife dataset contains sensor and survey data from 48 students. We have leveraged the GPS component of this dataset to screen for depression.
- Paper: Classifying Depression in Imbalanced Datasets Using an Autoencoder- Based Anomaly Detection Approach
Emotivo
This project started by performing emotion detection. Emotivo is our name for the combination of the external Surrey Audio-Visual Expressed Emotion (SAVEE) database, the external RML Emotion Database, and the external Berlin Database of Emotional Speech. RML used 6 basic emotions: anger, disgust, fear, happiness, sadness, and surprise, while SAVEE and Berlin added a neutral state.