Prioritization of causal genes from genome-wide association studies by Bayesian data integration across loci
0
by Zeinab Mousavi, Marios Arvanitis, ThuyVy Duong, Jennifer A. Brody, Alexis Battle, Nona Sotoodehnia, Ali Shojaie, Dan E. Arking, Joel S. Bader
Motivation: Genome-wide association studies (GWAS) have identified genetic variants, usually single-nucleotide polymorphisms (SNPs), associated with human traits, including disease and disease risk. These variants (or causal variants in linkage disequilibrium with them) usually affect the regulation or function of a nearby gene. A GWAS locus can span many genes, however, and prioritizing which gene or genes in a locus are most likely to be causal remains a challenge. Better prioritization and prediction of causal genes could reveal disease mechanisms and suggest interventions. Results: We describe a new Bayesian method, termed SigNet for significance networks, that combines information both within and across loci to identify the most likely causal gene at each locus. The SigNet method builds on existing methods that focus on individual loci with evidence from gene distance and expression quantitative trait loci (eQTL) by sharing information across loci using protein-protein and gene regulatory interaction network data. In an application to cardiac electrophysiology with 226 GWAS loci, only 46 (20%) have within-locus evidence from Mendelian genes, protein-coding changes, or colocalization with eQTL signals. At the remaining 180 loci lacking functional information, SigNet selects 56 genes other than the minimum distance gene, equal to 31% of the information-poor loci and 25% of the GWAS loci overall. Assessment by pathway enrichment demonstrates improved performance by SigNet. Review of individual loci shows literature evidence for genes selected by SigNet, including PMP22 as a novel causal gene candidate.