Implementing Appropriate Multivariate Methods for Higher Quality Results from Genetic Association Studies in Substance Abuse Populations

DSpace/Manakin Repository

Implementing Appropriate Multivariate Methods for Higher Quality Results from Genetic Association Studies in Substance Abuse Populations

Show full item record

Title: Implementing Appropriate Multivariate Methods for Higher Quality Results from Genetic Association Studies in Substance Abuse Populations
Author(s):
Beaton, Derek;
0000-0001-6118-4366
Advisor: Abdi, Hervé
Date Created: 2017-05
Format: Dissertation
Keywords: Least squares
Genetics
Substance abuse
Correspondence analysis (Statistics)
Abstract: For nearly a century, detecting the genetic contributions to cognitive and behavioral phenomena has been a core interest for psychological research. Recently, this interest has been reinvigorated across many related domains including and especially psychiatric research. Furthermore, genotyping technologies (e.g., microarrays) that provide genetic data, such as single nucleotide polymorphisms (SNPs), are routinely available and easily accessible to almost any researcher. These SNPs—which represent pairs of nucleotide letters (e.g., AA, AG, or GG) found at specific positions on human chromosomes—are best considered as categorical variables. However, a categorical coding scheme can make difficult the analysis of their relationships with behavioral, diagnostic, or clinical measurements because most multivariate techniques developed for the analysis between sets of variables are designed for quantitative variables. Furthermore, there are many—not just one or a few—genetic contributions to complex behaviors and disorders such as substance abuse, thus requiring multivariate techniques to fully understand the many genetic contributions. To palliate this problem, I present a generalization of partial least squares (PLS)—a technique used to extract the information common to two different data tables measured on the same observations—called partial least squares correspondence analysis (PLS-CA)—that is specifically tailored for the analysis of categorical and mixed (“heterogeneous”) data types. I further extend PLS-CA with a ridge-like regularization called Smoothed PLS-CA (SmooPLS-CA). SmooPLS-CA adjusts for overfitting and noise that can lead to the interpretation of spurious effects in high dimensional-low sample size data such as genetics and genomics. PLS-CA and SmooPLS-CA were both applied to two genetic data sets within substance use disorders (SUDs) that focused on a large number of genes: an archived set (“discovery”) and an external set (“validation”). The goal of the two data sets were to discover markers of SUDs in one set, and then validate those markers in an independent and completed sequestered set. SmooPLS-CA showed no advantage over standard PLS-CA: bootstrap resampling techniques provided robust results regardless of regularization. Finally, multiple genes were identified as contributors to a broad case-control (i.e., SUDs vs. control group) effect. Some of the identified genes play key roles in the glutamatergic (e.g., GRIN2B) and dopaminergic systems (e.g., CCKBR), where other genes play complex or even undefined roles (e.g., PRKCE). In sum there are many robust, albeiet small, genetic effects as opposed to only a few large effects that contribute to SUDs.
Degree Name: PHD
Degree Level: Doctoral
Persistent Link: http://hdl.handle.net/10735.1/5403
Type : text
Degree Program: Cognition and Neuroscience

Files in this item

Files Size Format View
BEATON-DISSERTATION-2017.pdf 73.07Mb PDF View/Open
BSR3__Flip_SAGE__COMP1s.csv 11.44Kb Unknown View/Open
BSR3_Matrix_ImpSS_L100_C1.csv 1.436Kb Unknown View/Open
BSR3_Matrix_ImpSS.csv 5.325Kb Unknown View/Open
BSR3_Matrix_4G.csv 13.33Kb Unknown View/Open
BSR3_Matrix_2G.csv 4.133Kb Unknown View/Open

This item appears in the following Collection(s)


Show full item record