Special issue: Research reportCategory fluency, latent semantic analysis and schizophrenia: a candidate gene approach
Introduction
A complex combination of susceptibility genes and environmental factors is assumed to contribute to the overall clinical presentation of psychiatric disorders. Applying a reductionist approach to the diverse presenting phenomenology is not only daunting, but likely overlooks much of the associated deficits in the case of schizophrenia (but see Morar et al., 2011) where cognitive deficits are quite central to the neurodevelopmental course of the illness (Elvevåg & Weinberger, 2001). With such complex medical disorders one way to reduce the complexity of genetic effects is the ‘intermediate phenotype’ approach where it is argued that the putative risk genes should show greater effects at the intermediate level. Applied to psychiatry, this research strategy argues for bridging the gap between the emergent psychosis and the effects of genes on cells that directly modulate neurocognition (Goldberg and Weinberger, 2004, Meyer-Lindenberg and Weinberger, 2006, Tan et al., 2008). Such a research framework is appealing (but see Flint and Munafo (2007) for a different opinion), as the resulting intermediate phenotypes (e.g., working memory, episodic memory, semantic memory) are more amenable to systematic neurobiological research than the transient phenomenology (Elvevåg & Weinberger, 2009). Crucially, in psychiatric disorders it is at this intermediate phenotype level that genetic associations often show both stronger penetrance (Tan et al., 2008) and inheritance (Snitz, MacDonald, & Carter, 2006) than at the level of clinical diagnosis. Consequently, several major challenges emerge, namely the unavoidable required refinements to the intermediate phenotype and the management of the huge amount of data resulting from investigations of intermediate phenotypes.
Given the increasing importance of genome-wide association studies (GWAS) in neuropsychiatric research, it is increasingly apparent that intermediate phenotypes are potentially the means with which genomic discoveries will be made, but also may be limiting factors. Indeed, this new approach is magnitudes more complex than any enterprise embarked on hitherto in psychiatric genetics and arguably requires sophisticated phenotypes in order to unravel the complexities and thus eventually the pathologies within neural functional systems. Bilder and colleagues argue that cognitive ontologies need to be developed and refined to not only enable greater consistence and collaboration in research, but also to facilitate connections between intermediate phenotypes and genes (Bilder et al., 2009).
One crucial part of this puzzle is a modern cognitive neuroscientific re-operationalization of common psychometric concepts and terms. Here we focus on one of the most widely used neuropsychological tests – the category fluency task – to illustrate the current limitations of the ‘verbal descriptions’ of the underlying cognitive constructs and the issues that emerge when trying to explore the genetic architecture of the associated constructs. Specifically, the recall process likely involves a search for meanings as reflected in the ‘clustering’ of words in the output. Many approaches have been employed to examine the structure of the clustering, but are often problematic given the subjective judgements of cluster boundaries or have turned out to be simply unreliable (Voorspoels et al., 2013). We have previously adopted Latent Semantic Analysis (LSA) as an objective and reliable methodology to chart the flow of meaning in words and discourse (Elvevåg, Foltz, Weinberger, & Goldberg, 2007), and briefly describe this technique below. Our current motivation is that the ‘content’ of words has rarely been considered a useful candidate in investigations concerning genomics. This absence may be partially due to the notoriously subjective and labour intense efforts required in quantifying the content of words. However, advances in computational linguistics provide a viable framework within which the meanings of words can be rigorously investigated.
Latent Semantic Analysis (LSA) is a statistical approach to the acquisition and representation of meaning, which allows similarities among the elements of a language (e.g., words, sentences, or passages) to be computed based on word co-occurrence patterns in large corpora of naturally produced discourse. LSA is a computational model of meaning that closely mimics human understanding of the contextual use of language, which has been widely used for information retrieval, machine understanding of text, and applications such as automated essay scoring (for an overview, see Landauer, Kintsch, McNamara, & Dennis, 2007). Unlike standard keyword-based methods, LSA can detect subtle aspects of semantic content. LSA has been widely used for cognitive modelling of learning and memory processes as well as for computing coherence in language and thought processes. The reduced dimension semantic representation from LSA allows comparison by computing the semantic similarity between individual terms or groups of terms (see Supplementary Methods for further details and an example).
In the case of the category fluency task, the total number of words produced has been shown to be an important metric and poor performance (i.e., production of substantially fewer words than expected based on demographically based normative data) has been associated with a variety of clinical disorders, including schizophrenia (Bokat and Goldberg, 2003, Lezak, 1995). A possible common mechanism associated with less than optimal performance on this simple task relates to speed of performance, but there are many other components, namely language, speech, verbal learning and recall, semantic organization (Schwartz, Baldo, Graves, & Brugger, 2003), and fluency in general.
Section snippets
Methods
To explore the genetic architecture of traditional measures (e.g., number of valid words generated) and LSA-derived measures of verbal fluency (e.g., average vector length; measures described in Section 2.1) we adopted a candidate gene approach and focused on SNPs with known function that were available from genome-wide association SNP chips. Note that although for some SNPs the function is known based on the sequence of the DNA (e.g., whether there is an amino acid change), for most SNPs the
Discussion
We have shown that genes previously associated with verbal fluency (DISC1) and verbal learning and recall (ZNF804A and KIAA0319) were associated and replicated using traditional measures of category fluency (e.g., the number of valid words generated to the word ‘animal’) and also to a novel LSA-derived measure of average vector length, which is a measure of the quality of information retrieved. We further found that the genes associated with verbal learning and recall were significantly more
Acknowledgements
This publication has emanated from research conducted with the financial support of Science Foundation Ireland and the Marie-Curie Action COFUND under Grant Number 11/SIRG/B2183 to Dr. Nicodemus. Dr. Elvevåg was supported by the Northern Norwegian Regional Health Authority (Helse Nord RHF). All calculations were performed on the Lonsdale cluster maintained by the Trinity Centre for High Performance Computing. This cluster was funded through grants from Science Foundation Ireland. This research
References (58)
- et al.
Finding suitable phenotypes for genetic studies of schizophrenia: heritability and segregation analysis
Biological Psychiatry
(2008) - et al.
Letter and category fluency in schizophrenic patients: a meta-analysis
Schizophrenia Research
(2003) Twin study of verbal and spatial abilities
Personality and Individual Differences
(1996)- et al.
Quantifying incoherence in speech: an automated methodology and novel application to schizophrenia
Schizophrenia Research
(2007) - et al.
Cognitive state and connectivity effects of the genome-wide psychosis variant in ZNF804A
NeuroImage
(2011) - et al.
Genes and the parsing of cognitive processes
Trends in Cognitive Sciences
(2004) - et al.
Decoding the genetics of speech and language
Current Opinion in Neurobiology
(2013) - et al.
Effect of CACNA1C rs1006737 on neural correlates of verbal fluency in healthy individuals
NeuroImage
(2010) - et al.
Whole genome association scan for genetic polymorphisms influencing information processing speed
Biological Psychology
(2011) - et al.
Genetics of human episodic memory: dealing with complexity
Trends in Cognitive Sciences
(2011)
DISC1 at 10: connecting psychiatric genetics and neuroscience
Trends in Molecular Medicine
PLINK: a toolset for whole-genome association and population-based linkage analysis
American Journal of Human Genetics
Genetic influences on prefrontal activation during a verbal fluency task in adults: a twin study based on multichannel near-infrared spectroscopy
Neuroimage
Pervasive influence of semantics in letter and category fluency: a multidimensional approach
Brain and Language
Text-based analysis of genes, proteins, aging, and cancer
Mechanisms of Ageing and Development
ZNF804A may be associated with executive control of attention
Genes Brain and Behaviour
Evidence for the involvement of ZNF804A in cognitive processes of relevance to reading and spelling
Translational Psychiatry
Cognitive ontologies for neuropsychiatric phenomics research
Cognitive Neuropsychiatry
Variation in DISC1 affects hippocampal structure and function and increases risk for schizophrenia
PNAS (USA)
Impact of DISC1 variation on neuroanatomical neurocognitive phenotypes
Molecular Psychiatry
Knockdown of the dyslexia-associated gene Kiaa0319 impairs temporal responses to speech stimuli in rat primary auditory cortex
Cerebral Cortex
Evidence of IQ-modulated association between ZNF804A gene polymorphism and cognitive function in schizophrenia patients
Neuropsychopharmacology
Evidence for a genetic aetiology in reading disability in twins
Nature
California verbal learning test
Neuropsychology in context of the neurodevelopmental model of schizophrenia
Introduction: genes, cognition and neuropsychiatry
Cognitive Neuropsychiatry
The endophenotype concept in psychiatric genetics
Psychological Medicine
Latent semantic variables are associated with formal thought disorder and adaptive behavior in older inpatients with schizophrenia
Cortex
A primate-specific, brain isoform of KCNH2 affects cortical physiology, cognition, neuronal repolarization and risk of schizophrenia
Nature Medicine
Cited by (68)
Dissecting the neurobiology of linguistic disorganisation and impoverishment in schizophrenia
2022, Seminars in Cell and Developmental BiologyQuantified language connectedness in schizophrenia-spectrum disorders
2021, Psychiatry ResearchCognitive-perceptual and disorganized schizotypal traits are nonlinearly related to atypical semantic content on tasks of semantic fluency
2021, Journal of Psychiatric ResearchCitation Excerpt :Indeed, prior content-based work examining atypicality in schizotypy also finds that higher total schizotypy is related to more atypical responses in semantic fluency (Kiang and Kutas, 2006; Minor and Cohen, 2012). Some studies in schizophrenia using non-traditional scoring methods employ latent semantic analysis (LSA) and find that patients exhibit greater semantic incoherence during semantic fluency tasks compared to unaffected controls (Holshaussen et al., 2014; Nicodemus et al., 2014); however, these results have not been replicated in schizotypy (Marggraf et al., 2018). Limitations of the one existing schizotypy study using LSA include: 1) small sample size, 2) restricted range of schizotypy scores, as individuals who scored >95th percentile on any SPQ-BRU factor was included, 3) use of pre-existing corpora, which may not include all words generated in the study sample and could yield inaccurate results.