A Bayesian Hierarchical Framework for Pathway Analysis in Genome-Wide Association Studies




Journal Title

Journal ISSN

Volume Title




The genome-wide association studies (GWAS) aim to identify genetic variants, typically single nucleotide polymorphisms (SNPs), associated with a disease/trait. A commonly used analytic strategy in GWAS is to test for association with one single SNP at a time. However, such a strategy lacks power to detect associations that are caused by joint effects of multiple variants, each with a modest effect of its own. Pathway analysis jointly tests the combined effects of all SNPs in all genes belonging to a molecular pathway. This analysis is usually more powerful than single-SNP analyses for detecting joint effects of variants in a pathway. Moreover, due to biological functionality of pathways, a significant result lends itself more easily to interpretation. In this dissertation, we develop a Bayesian hierarchical model that fully models the natural three-level hierarchy inherent in pathway structure, namely SNP—gene—pathway, unlike most other methods that use ad hoc ways of combining such information. We model the effects at each level conditional on the effects of the levels preceding them within the generalized linear model framework. This joint modeling allows detection of not only the associated pathways but also testing for association with genes and SNPs within significant pathways and significant genes in a hierarchical manner, which can be useful for follow-up studies. To deal with the high dimensionality of such a unified model, we regularize the regression coefficients through an appropriate choice of priors. We fit the model using a combination of Iteratively Weighted Least Squares and Expectation-Maximization algorithms to estimate the posterior modes and their standard errors. The inference is carried out in a hierarchical manner from pathways to genes to SNPs. Hierarchical false discovery rate (FDR) is used for multiplicity adjustment of the entire inference procedure. We also explore the utility of effective number of parameters proposed in the Bayesian literature in our context of multiplicity adjustment using the hierarchical FDR. To study the proposed approach, we conduct simulations with samples generated under realistic linkage disequilibrium patterns obtained from the HapMap project. We find that our method has higher power than some standard approaches in several settings for identifying pathways that have multiple modest-sized variants. Moreover, it can also pinpoint associated genes once a pathway is implicated, a feature unavailable in other methods. We also find that the use of the effective number of parameters can boost the power to detect associated genes and helps in distinguishing them from the null genes. We apply the proposed method to two GWAS datasets on breast and renal cancer.



Bayesian statistical decision theory, Single nucleotide polymorphisms, Expectation-maximization algorithms, Dimensional analysis