Background: Linguistic analysis, notably using conceptually derived linguistic categories, has been used to quantify various aspects of serious mental illness. It has the potential for understanding paranoia, defined in terms of perceived and intentional threats from others. However, paranoia and the language expressing it potentially varies due to demographic factors, notably race and sex.
Aims: This study aims to expand upon prior findings linking linguistic expression and serious mental illness symptoms by focusing on paranoia and evaluating potential moderating roles of race and sex in two archived studies using two separate speaking tasks.
Methods: We hypothesized that a limited feature set of linguistic categories derived from these speaking tasks would accurately classify clinical ratings of paranoia using regularized regression. It was further hypothesized that these relationships would vary as a function of Black versus White and male versus female identities.
Results: Unexpectedly, there were no differences in model accuracy as a function of race and sex, suggesting no overt bias or differential functioning from demographics in our models.
Conclusions: Results highlight the strengths and limitations of using linguistic analysis to understand paranoia. Exploring variation amongst paranoia scoring could improve model accuracy across different demographic groups.
Keywords: Positive symptoms; linguistics; machine learning; natural language processing; serious mental illness.