Elsevier

Cortex

Volume 55, June 2014, Pages 182-191
Cortex

Special issue: Research report
Category fluency, latent semantic analysis and schizophrenia: a candidate gene approach

https://doi.org/10.1016/j.cortex.2013.12.004Get rights and content

Abstract

Background

Category fluency is a widely used task that relies on multiple neurocognitive processes and is a sensitive assay of cortical dysfunction, including in schizophrenia. The test requires naming of as many words belonging to a certain category (e.g., animals) as possible within a short period of time. The core metrics are the overall number of words produced and the number of errors, namely non-members generated for a target category. We combine a computational linguistic approach with a candidate gene approach to examine the genetic architecture of this traditional fluency measure.

Methods

In addition to the standard metric of overall word count, we applied a computational approach to semantics, Latent Semantic Analysis (LSA), to analyse the clustering pattern of the categories generated, as it likely reflects the search in memory for meanings. Also, since fluency performance probably also recruits verbal learning and recall processes, we included two standard measures of this cognitive process: the Wechsler Memory Scale and California Verbal Learning Test (CVLT). To explore the genetic architecture of traditional and LSA-derived fluency measures we employed a candidate gene approach focused on SNPs with known function that were available from a recent genome-wide association study (GWAS) of schizophrenia. The selected candidate genes were associated with language and speech, verbal learning and recall processes, and processing speed. A total of 39 coding SNPs were included for analysis in 665 subjects.

Results and discussion

Given the modest sample size, the results should be regarded as exploratory and preliminary. Nevertheless, the data clearly illustrate how extracting the meaning from participants' responses, by analysing the actual content of words, generates useful and neurocognitively viable metrics. We discuss three replicated SNPs in the genes ZNF804A, DISC1 and KIAA0319, as well as the potential for computational analyses of linguistic and textual data in other genomics tasks.

Introduction

A complex combination of susceptibility genes and environmental factors is assumed to contribute to the overall clinical presentation of psychiatric disorders. Applying a reductionist approach to the diverse presenting phenomenology is not only daunting, but likely overlooks much of the associated deficits in the case of schizophrenia (but see Morar et al., 2011) where cognitive deficits are quite central to the neurodevelopmental course of the illness (Elvevåg & Weinberger, 2001). With such complex medical disorders one way to reduce the complexity of genetic effects is the ‘intermediate phenotype’ approach where it is argued that the putative risk genes should show greater effects at the intermediate level. Applied to psychiatry, this research strategy argues for bridging the gap between the emergent psychosis and the effects of genes on cells that directly modulate neurocognition (Goldberg and Weinberger, 2004, Meyer-Lindenberg and Weinberger, 2006, Tan et al., 2008). Such a research framework is appealing (but see Flint and Munafo (2007) for a different opinion), as the resulting intermediate phenotypes (e.g., working memory, episodic memory, semantic memory) are more amenable to systematic neurobiological research than the transient phenomenology (Elvevåg & Weinberger, 2009). Crucially, in psychiatric disorders it is at this intermediate phenotype level that genetic associations often show both stronger penetrance (Tan et al., 2008) and inheritance (Snitz, MacDonald, & Carter, 2006) than at the level of clinical diagnosis. Consequently, several major challenges emerge, namely the unavoidable required refinements to the intermediate phenotype and the management of the huge amount of data resulting from investigations of intermediate phenotypes.

Given the increasing importance of genome-wide association studies (GWAS) in neuropsychiatric research, it is increasingly apparent that intermediate phenotypes are potentially the means with which genomic discoveries will be made, but also may be limiting factors. Indeed, this new approach is magnitudes more complex than any enterprise embarked on hitherto in psychiatric genetics and arguably requires sophisticated phenotypes in order to unravel the complexities and thus eventually the pathologies within neural functional systems. Bilder and colleagues argue that cognitive ontologies need to be developed and refined to not only enable greater consistence and collaboration in research, but also to facilitate connections between intermediate phenotypes and genes (Bilder et al., 2009).

One crucial part of this puzzle is a modern cognitive neuroscientific re-operationalization of common psychometric concepts and terms. Here we focus on one of the most widely used neuropsychological tests – the category fluency task – to illustrate the current limitations of the ‘verbal descriptions’ of the underlying cognitive constructs and the issues that emerge when trying to explore the genetic architecture of the associated constructs. Specifically, the recall process likely involves a search for meanings as reflected in the ‘clustering’ of words in the output. Many approaches have been employed to examine the structure of the clustering, but are often problematic given the subjective judgements of cluster boundaries or have turned out to be simply unreliable (Voorspoels et al., 2013). We have previously adopted Latent Semantic Analysis (LSA) as an objective and reliable methodology to chart the flow of meaning in words and discourse (Elvevåg, Foltz, Weinberger, & Goldberg, 2007), and briefly describe this technique below. Our current motivation is that the ‘content’ of words has rarely been considered a useful candidate in investigations concerning genomics. This absence may be partially due to the notoriously subjective and labour intense efforts required in quantifying the content of words. However, advances in computational linguistics provide a viable framework within which the meanings of words can be rigorously investigated.

Latent Semantic Analysis (LSA) is a statistical approach to the acquisition and representation of meaning, which allows similarities among the elements of a language (e.g., words, sentences, or passages) to be computed based on word co-occurrence patterns in large corpora of naturally produced discourse. LSA is a computational model of meaning that closely mimics human understanding of the contextual use of language, which has been widely used for information retrieval, machine understanding of text, and applications such as automated essay scoring (for an overview, see Landauer, Kintsch, McNamara, & Dennis, 2007). Unlike standard keyword-based methods, LSA can detect subtle aspects of semantic content. LSA has been widely used for cognitive modelling of learning and memory processes as well as for computing coherence in language and thought processes. The reduced dimension semantic representation from LSA allows comparison by computing the semantic similarity between individual terms or groups of terms (see Supplementary Methods for further details and an example).

In the case of the category fluency task, the total number of words produced has been shown to be an important metric and poor performance (i.e., production of substantially fewer words than expected based on demographically based normative data) has been associated with a variety of clinical disorders, including schizophrenia (Bokat and Goldberg, 2003, Lezak, 1995). A possible common mechanism associated with less than optimal performance on this simple task relates to speed of performance, but there are many other components, namely language, speech, verbal learning and recall, semantic organization (Schwartz, Baldo, Graves, & Brugger, 2003), and fluency in general.

Section snippets

Methods

To explore the genetic architecture of traditional measures (e.g., number of valid words generated) and LSA-derived measures of verbal fluency (e.g., average vector length; measures described in Section 2.1) we adopted a candidate gene approach and focused on SNPs with known function that were available from genome-wide association SNP chips. Note that although for some SNPs the function is known based on the sequence of the DNA (e.g., whether there is an amino acid change), for most SNPs the

Discussion

We have shown that genes previously associated with verbal fluency (DISC1) and verbal learning and recall (ZNF804A and KIAA0319) were associated and replicated using traditional measures of category fluency (e.g., the number of valid words generated to the word ‘animal’) and also to a novel LSA-derived measure of average vector length, which is a measure of the quality of information retrieved. We further found that the genes associated with verbal learning and recall were significantly more

Acknowledgements

This publication has emanated from research conducted with the financial support of Science Foundation Ireland and the Marie-Curie Action COFUND under Grant Number 11/SIRG/B2183 to Dr. Nicodemus. Dr. Elvevåg was supported by the Northern Norwegian Regional Health Authority (Helse Nord RHF). All calculations were performed on the Lonsdale cluster maintained by the Trinity Centre for High Performance Computing. This cluster was funded through grants from Science Foundation Ireland. This research

References (58)

  • D.J. Porteous et al.

    DISC1 at 10: connecting psychiatric genetics and neuroscience

    Trends in Molecular Medicine

    (2011)
  • S. Purcell et al.

    PLINK: a toolset for whole-genome association and population-based linkage analysis

    American Journal of Human Genetics

    (2007)
  • E. Sakakibara et al.

    Genetic influences on prefrontal activation during a verbal fluency task in adults: a twin study based on multichannel near-infrared spectroscopy

    Neuroimage

    (2014)
  • S. Schwartz et al.

    Pervasive influence of semantics in letter and category fluency: a multidimensional approach

    Brain and Language

    (2003)
  • J.R. Semeiks et al.

    Text-based analysis of genes, proteins, aging, and cancer

    Mechanisms of Ageing and Development

    (2005)
  • Z. Balog et al.

    ZNF804A may be associated with executive control of attention

    Genes Brain and Behaviour

    (2011)
  • J. Becker et al.

    Evidence for the involvement of ZNF804A in cognitive processes of relevance to reading and spelling

    Translational Psychiatry

    (2012)
  • R.M. Bilder et al.

    Cognitive ontologies for neuropsychiatric phenomics research

    Cognitive Neuropsychiatry

    (2009)
  • J.H. Callicott et al.

    Variation in DISC1 affects hippocampal structure and function and increases risk for schizophrenia

    PNAS (USA)

    (2005)
  • M.A. Carless et al.

    Impact of DISC1 variation on neuroanatomical neurocognitive phenotypes

    Molecular Psychiatry

    (2011)
  • T.M. Centanni et al.

    Knockdown of the dyslexia-associated gene Kiaa0319 impairs temporal responses to speech stimuli in rat primary auditory cortex

    Cerebral Cortex

    (2013)
  • M. Chen et al.

    Evidence of IQ-modulated association between ZNF804A gene polymorphism and cognitive function in schizophrenia patients

    Neuropsychopharmacology

    (2012)
  • J.C. DeFries et al.

    Evidence for a genetic aetiology in reading disability in twins

    Nature

    (1987)
  • D.C. Delis et al.

    California verbal learning test

    (1987)
  • B. Elvevåg et al.

    Neuropsychology in context of the neurodevelopmental model of schizophrenia

  • B. Elvevåg et al.

    Introduction: genes, cognition and neuropsychiatry

    Cognitive Neuropsychiatry

    (2009)
  • J. Flint et al.

    The endophenotype concept in psychiatric genetics

    Psychological Medicine

    (2007)
  • K. Holshausen et al.

    Latent semantic variables are associated with formal thought disorder and adaptive behavior in older inpatients with schizophrenia

    Cortex

    (2013)
  • S.J. Huffaker et al.

    A primate-specific, brain isoform of KCNH2 affects cortical physiology, cognition, neuronal repolarization and risk of schizophrenia

    Nature Medicine

    (2009)
  • Cited by (68)

    • Cognitive-perceptual and disorganized schizotypal traits are nonlinearly related to atypical semantic content on tasks of semantic fluency

      2021, Journal of Psychiatric Research
      Citation Excerpt :

      Indeed, prior content-based work examining atypicality in schizotypy also finds that higher total schizotypy is related to more atypical responses in semantic fluency (Kiang and Kutas, 2006; Minor and Cohen, 2012). Some studies in schizophrenia using non-traditional scoring methods employ latent semantic analysis (LSA) and find that patients exhibit greater semantic incoherence during semantic fluency tasks compared to unaffected controls (Holshaussen et al., 2014; Nicodemus et al., 2014); however, these results have not been replicated in schizotypy (Marggraf et al., 2018). Limitations of the one existing schizotypy study using LSA include: 1) small sample size, 2) restricted range of schizotypy scores, as individuals who scored >95th percentile on any SPQ-BRU factor was included, 3) use of pre-existing corpora, which may not include all words generated in the study sample and could yield inaccurate results.

    View all citing articles on Scopus
    View full text