Bayesian Statistical Methods for Urinary Microbiome Data Analysis
Microbiome data is generated by high-throughput next-generation sequencing technology. These data are typically characterized by zero inflation, overdispersion, high dimensionality, sample heterogeneity, non-linearity, and compositionality. Three popular areas of interest in microbiome research requiring statistical methods that can account for the characterizations of microbiome data include detecting differentially abundant taxa across phenotype groups, identifying associations between the microbiome and covariates, and constructing microbiome networks to characterize ecological associations of microbes. These three areas are referred to as differential abundance analysis, integrative analysis, and network analysis, respectively. Bayesian statistical methods can account for the uncertainty in model param- eter estimation, provides posterior summaries that are easy to interpret, and can handle small sample sizes unlike frequentist parametric statistical methods. Here, we present three Bayesian statistical methods applied to urinary microbiome data of recurrent urinary tract infections in postmenopausal women from a collaborative study between The University of Texas at Dallas and The University of Texas Southwestern Medical Center. First, we present our Bayesian Proportion Test to perform differential abundance analysis to determine if taxonomic functional data are significantly different between experimental groups. Second, we present our Bayesian Correlation Test to conduct exploratory integrative analysis of associations between microbiome and clinical data. Last, we present our Bayesian stochastic block model with a Markov random field prior that performs community detection using information from both the adjacency matrix and the taxonomic tree hierarchy. To the best of our knowledge, current stochastic block models only incorporate the network information given by the adjacency matrix and none of them incorporate information from the taxonomic tree hierarchy. Thus, the inclusion of the taxonomic tree information is the novelty of our model and we demonstrate its superior performance to other commonly used methods. We also show that the inclusion of the taxonomic tree information does not affect model performance even in the case when this information is not relevant for performing community detection.