Supplementary Material for: Leveraging NLP for Psychiatric Phenotyping from Spanish EHR: Enabling the Investigation of Transdiagnostic Symptom Profiles at Scale
posted on 2025-06-07, 05:55authored byfigshare admin kargerfigshare admin karger, DeLaHoz J.F., Frydman-Gani C., Arias A., PerezVallejo M., LondoñoMartínez J.D., Mena L., Seroussi A., Service S.K., Diaz-Zuluaga A.M., Ramirez-Diaz A.M., Valencia-Echeverry J., Castaño M., Reus V.I., Bui A.A.T., Freimer N.B., Lopez-Jaramillo C., OldeLoohuis L.M.
Clinical notes in electronic health records (EHRs) offer valuable insight into the symptom profiles and trajectories of patients with severe mental illness (SMI). However, systematically extracting symptoms at scale remains a challenge, especially in languages other than English. We developed a light, accurate and interpretable natural language processing (NLP) algorithm to extract psychiatric phenotypes from Spanish clinical notes.
We selected a set of 136 core psychiatric phenotypes and annotated 4,000 clinical note sections (e.g., Chief complaint, Plan; called “documents”) and 240 complete visit notes (called “entries”) from two psychiatric hospitals in Colombia: Hospital Mental de Antioquia (HOMO) and Clínica San Juan de Dios Manizales (CSJDM). For phenotypes meeting frequency and inter-annotator reliability thresholds, we developed three NLP algorithms (HOMO, CSJDM, and COMBINED) for phenotype extraction and context labeling (e.g., negation, family history, uncertainty). We evaluated performance at the document and entry levels, as well as across hospitals.
Document-level performance at both hospitals was high (average F1 scores of 0.84 and 0.85). Moreover, on phenotypes meeting our document-level performance threshold of F1≥0.7, entry-level performance was high as well (average F1 of 0.75 and 0.78), as was cross-hospital transportability of the algorithms (F1 of 0.75 HOMO-to-CSJDM and 0.77 CSJDM-to-HOMO). The COMBINED algorithm improved overall recall, without significantly decreasing precision (F1 of 0.78 and 0.77 on HOMO and CSJDM, respectively).
The application of our algorithm for 50 high-performing phenotypes to the notes of 9,737 SMI patients highlighted the transdiagnostic nature of many core SMI phenotypes; 44/50 phenotypes were recorded in over 10% of patients across diagnoses. Multiple correspondence analysis further revealed variation in symptom-space across diagnoses; while major depressive disorder and schizophrenia form distinct clusters, patients with bipolar disorder span the entire phenotypic spectrum.
Our tool enables the systematic investigation of psychiatric symptoms from psychiatric notes, facilitating large-scale investigations in Spanish-speaking populations.