Karolinska Institutet
Browse

Novel statistical methods for genome-wide association summary statistics

Download (2.08 MB)
thesis
posted on 2024-09-02, 15:05 authored by Zheng Ning

A general objective of genetic studies is to understand the genetic basis of complex traits such as height, body mass index (BMI), disease endpoints, etc. Such researches have been facilitated due to the completion of the human genome project and developments of high-throughput technologies. With the help of high-throughput genotyping and sequencing technologies, the information on millions of genetic markers can be measured for each individual.

The most widely used strategy to detect the associations between genetic variants and a complex trait is genome-wide association study (GWAS). Because the genetic architecture of most complex traits is highly polygenic, the signal to noise ratio is usually tiny. Thus, especially in human populations, GWAS often requires large samples to obtain sufficient power. Unfortunately, given the restrictions on sharing individual-level data, it is often not feasible to pool data from different cohorts. Despite that, in each cohort, it is possible to report and share GWAS summary statistics, such as sample sizes, allele frequencies, estimates of genetic effect sizes, and their standard errors for the genetic markers across the genome. Therefore one recent focus in statistical methodology development for genetic studies has been on meta-analysis techniques using summary-level data. The objective of this thesis is to develop novel statistical genetics methods based on GWAS summary statistics and to apply these methods to better understand the genetic architecture underlying complex traits.

In Study I, we developed a Selection Operator for JOint analyzing multiple SNPs (SOJO). We mathematically proved and empirically showed that the least absolute shrinkage and selection operator (LASSO) could be achieved using GWAS summary-level data. Compared to the stepwise selection procedures, SOJO performs better in variable selection. SOJO is useful for detecting additional variants with independent effects and assessing the magnitude of allelic heterogeneity within loci. In Study II, we developed a High-Definition Likelihood (HDL) method to improve the accuracy in genetic correlation estimation using GWAS summary statistics. Compared to the stateof-the-art method LD Score regression (LDSC), HDL achieves higher statistical power to detect genetic correlations between phenotypes by fully accounting for linkage disequilibrium (LD) information across the genome. In Study III, we introduced a four-level strategy for replication of loci detected by multi-trait GWAS methods. The four methods provide different degrees of replication strength, useful for providing additional evidence when a locus has been discovered and replicated by multivariate analysis of variance (MANOVA) or other multi-trait methods. The replication methods only require summary association statistics and are straightforward to be applied to multi-trait GWAS analyses. In Study IV, using GWAS summary statistics, we developed a method named Genetic Correlation Contrast for Causality (G3C) as a more robust test for the existence and direction of causal relationships between phenotypes. In contrast to Mendelian Randomization (MR), G3C does not rely on the assumption of no horizontal pleiotropy. G3C takes full advantage of genome-wide genetic association data and account for underlying genetic correlations between complex traits.

List of scientific papers

I. Zheng Ning, Youngjo Lee, Peter K. Joshi, James F. Wilson, Yudi Pawitan, and Xia Shen. A selection operator for summary association statistics reveals allelic heterogeneity of complex traits. The American Journal of Human Genetics. 101: 903–912.
https://doi.org/10.1016/j.ajhg.2017.09.027

II. Zheng Ning, Yudi Pawitan, and Xia Shen. High-definition likelihood inference of genetic correlations across human complex traits. Nature Genetics. 52: 859–864.
https://doi.org/10.1038/s41588-020-0653-y

III. Zheng Ning, Yakov A. Tsepilov, Sodbo Zh. Sharapov, Zhipeng Wang, Alexander K. Grishenko, Xiao Feng, Masoud Shirali, Peter K. Joshi, James F. Wilson, Yudi Pawitan, Chris S. Haley, Yurii S. Aulchenko, and Xia Shen. Nontrivial replication of loci detected by multi-trait methods. [Submitted]

IV. Zheng Ning, Peter K. Joshi, Youngjo Lee, James F. Wilson, Yudi Pawitan, and Xia Shen. Inferring causation from heterogeneity in genetic correlations of complex traits. [Manuscript]

History

Defence date

2020-09-11

Department

  • Department of Medical Epidemiology and Biostatistics

Publisher/Institution

Karolinska Institutet

Main supervisor

Shen, Xia

Co-supervisors

Pawitan, Yudi

Publication year

2020

Thesis type

  • Doctoral thesis

ISBN

978-91-7831-887-2

Number of supporting papers

4

Language

  • eng

Original publication date

2020-08-20

Author name in thesis

Ning, Zheng

Original department name

Department of Medical Epidemiology and Biostatistics

Place of publication

Stockholm

Usage metrics

    Theses

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC