Novel statistical methods for genome-wide association summary statistics
Author: Ning, Zheng
Date: 2020-09-11
Location: Atrium, Nobel väg 12B, Karolinska Institutet, Solna
Time: 09.00
Department: Inst för medicinsk epidemiologi och biostatistik / Dept of Medical Epidemiology and Biostatistics
View/ Open:
Thesis (2.076Mb)
Abstract
A general objective of genetic studies is to understand the genetic basis of complex traits such as height, body mass index (BMI), disease endpoints, etc. Such researches have been facilitated due to the completion of the human genome project and developments of high-throughput technologies. With the help of high-throughput genotyping and sequencing technologies, the information on millions of genetic markers can be measured for each individual.
The most widely used strategy to detect the associations between genetic variants and a complex trait is genome-wide association study (GWAS). Because the genetic architecture of most complex traits is highly polygenic, the signal to noise ratio is usually tiny. Thus, especially in human populations, GWAS often requires large samples to obtain sufficient power. Unfortunately, given the restrictions on sharing individual-level data, it is often not feasible to pool data from different cohorts. Despite that, in each cohort, it is possible to report and share GWAS summary statistics, such as sample sizes, allele frequencies, estimates of genetic effect sizes, and their standard errors for the genetic markers across the genome. Therefore one recent focus in statistical methodology development for genetic studies has been on meta-analysis techniques using summary-level data. The objective of this thesis is to develop novel statistical genetics methods based on GWAS summary statistics and to apply these methods to better understand the genetic architecture underlying complex traits.
In Study I, we developed a Selection Operator for JOint analyzing multiple SNPs (SOJO). We mathematically proved and empirically showed that the least absolute shrinkage and selection operator (LASSO) could be achieved using GWAS summary-level data. Compared to the stepwise selection procedures, SOJO performs better in variable selection. SOJO is useful for detecting additional variants with independent effects and assessing the magnitude of allelic heterogeneity within loci. In Study II, we developed a High-Definition Likelihood (HDL) method to improve the accuracy in genetic correlation estimation using GWAS summary statistics. Compared to the stateof-the-art method LD Score regression (LDSC), HDL achieves higher statistical power to detect genetic correlations between phenotypes by fully accounting for linkage disequilibrium (LD) information across the genome. In Study III, we introduced a four-level strategy for replication of loci detected by multi-trait GWAS methods. The four methods provide different degrees of replication strength, useful for providing additional evidence when a locus has been discovered and replicated by multivariate analysis of variance (MANOVA) or other multi-trait methods. The replication methods only require summary association statistics and are straightforward to be applied to multi-trait GWAS analyses. In Study IV, using GWAS summary statistics, we developed a method named Genetic Correlation Contrast for Causality (G3C) as a more robust test for the existence and direction of causal relationships between phenotypes. In contrast to Mendelian Randomization (MR), G3C does not rely on the assumption of no horizontal pleiotropy. G3C takes full advantage of genome-wide genetic association data and account for underlying genetic correlations between complex traits.
The most widely used strategy to detect the associations between genetic variants and a complex trait is genome-wide association study (GWAS). Because the genetic architecture of most complex traits is highly polygenic, the signal to noise ratio is usually tiny. Thus, especially in human populations, GWAS often requires large samples to obtain sufficient power. Unfortunately, given the restrictions on sharing individual-level data, it is often not feasible to pool data from different cohorts. Despite that, in each cohort, it is possible to report and share GWAS summary statistics, such as sample sizes, allele frequencies, estimates of genetic effect sizes, and their standard errors for the genetic markers across the genome. Therefore one recent focus in statistical methodology development for genetic studies has been on meta-analysis techniques using summary-level data. The objective of this thesis is to develop novel statistical genetics methods based on GWAS summary statistics and to apply these methods to better understand the genetic architecture underlying complex traits.
In Study I, we developed a Selection Operator for JOint analyzing multiple SNPs (SOJO). We mathematically proved and empirically showed that the least absolute shrinkage and selection operator (LASSO) could be achieved using GWAS summary-level data. Compared to the stepwise selection procedures, SOJO performs better in variable selection. SOJO is useful for detecting additional variants with independent effects and assessing the magnitude of allelic heterogeneity within loci. In Study II, we developed a High-Definition Likelihood (HDL) method to improve the accuracy in genetic correlation estimation using GWAS summary statistics. Compared to the stateof-the-art method LD Score regression (LDSC), HDL achieves higher statistical power to detect genetic correlations between phenotypes by fully accounting for linkage disequilibrium (LD) information across the genome. In Study III, we introduced a four-level strategy for replication of loci detected by multi-trait GWAS methods. The four methods provide different degrees of replication strength, useful for providing additional evidence when a locus has been discovered and replicated by multivariate analysis of variance (MANOVA) or other multi-trait methods. The replication methods only require summary association statistics and are straightforward to be applied to multi-trait GWAS analyses. In Study IV, using GWAS summary statistics, we developed a method named Genetic Correlation Contrast for Causality (G3C) as a more robust test for the existence and direction of causal relationships between phenotypes. In contrast to Mendelian Randomization (MR), G3C does not rely on the assumption of no horizontal pleiotropy. G3C takes full advantage of genome-wide genetic association data and account for underlying genetic correlations between complex traits.
List of papers:
I. Zheng Ning, Youngjo Lee, Peter K. Joshi, James F. Wilson, Yudi Pawitan, and Xia Shen. A selection operator for summary association statistics reveals allelic heterogeneity of complex traits. The American Journal of Human Genetics. 101: 903–912.
Fulltext (DOI)
Pubmed
View record in Web of Science®
II. Zheng Ning, Yudi Pawitan, and Xia Shen. High-definition likelihood inference of genetic correlations across human complex traits. Nature Genetics. 52: 859–864.
Fulltext (DOI)
Pubmed
View record in Web of Science®
III. Zheng Ning, Yakov A. Tsepilov, Sodbo Zh. Sharapov, Zhipeng Wang, Alexander K. Grishenko, Xiao Feng, Masoud Shirali, Peter K. Joshi, James F. Wilson, Yudi Pawitan, Chris S. Haley, Yurii S. Aulchenko, and Xia Shen. Nontrivial replication of loci detected by multi-trait methods. [Submitted]
IV. Zheng Ning, Peter K. Joshi, Youngjo Lee, James F. Wilson, Yudi Pawitan, and Xia Shen. Inferring causation from heterogeneity in genetic correlations of complex traits. [Manuscript]
I. Zheng Ning, Youngjo Lee, Peter K. Joshi, James F. Wilson, Yudi Pawitan, and Xia Shen. A selection operator for summary association statistics reveals allelic heterogeneity of complex traits. The American Journal of Human Genetics. 101: 903–912.
Fulltext (DOI)
Pubmed
View record in Web of Science®
II. Zheng Ning, Yudi Pawitan, and Xia Shen. High-definition likelihood inference of genetic correlations across human complex traits. Nature Genetics. 52: 859–864.
Fulltext (DOI)
Pubmed
View record in Web of Science®
III. Zheng Ning, Yakov A. Tsepilov, Sodbo Zh. Sharapov, Zhipeng Wang, Alexander K. Grishenko, Xiao Feng, Masoud Shirali, Peter K. Joshi, James F. Wilson, Yudi Pawitan, Chris S. Haley, Yurii S. Aulchenko, and Xia Shen. Nontrivial replication of loci detected by multi-trait methods. [Submitted]
IV. Zheng Ning, Peter K. Joshi, Youngjo Lee, James F. Wilson, Yudi Pawitan, and Xia Shen. Inferring causation from heterogeneity in genetic correlations of complex traits. [Manuscript]
Institution: Karolinska Institutet
Supervisor: Shen, Xia
Co-supervisor: Pawitan, Yudi
Issue date: 2020-08-20
Rights:
Publication year: 2020
ISBN: 978-91-7831-887-2
Statistics
Total Visits
Views | |
---|---|
Novel ... | 1196 |
Total Visits Per Month
November 2023 | December 2023 | January 2024 | February 2024 | March 2024 | April 2024 | May 2024 | |
---|---|---|---|---|---|---|---|
Novel ... | 26 | 26 | 23 | 16 | 29 | 38 | 16 |
File Visits
Views | |
---|---|
Thesis_Zheng_Ning.pdf | 1145 |
Top country views
Views | |
---|---|
Sweden | 269 |
United States | 246 |
Ireland | 129 |
China | 79 |
Australia | 66 |
United Kingdom | 63 |
Germany | 45 |
South Korea | 35 |
Hong Kong | 33 |
Russia | 21 |
Top cities views
Views | |
---|---|
Dublin | 129 |
Stockholm | 66 |
Sydney | 56 |
Ashburn | 43 |
Uppsala | 29 |
Solna | 27 |
Hangzhou | 22 |
Central | 18 |
Moscow | 14 |
Umeå | 12 |