Computational Epigenetics & Bioinformatics

Research Interests

Dr. Sun's specializes in the development of statistical methods and bioinformatics software, and the computational analysis on high-throughput sequencing data, such as Bisulfite-Seq, ChIP-Seq, and RNA-Seq, to extract biological insights especially in transcriptional and epigenetic regulation in development, aging and diseases, particularly cancer.

Dr. Sun’s primary research generally lies in bioinformatics, a growing research area where computational, mathematical, and statistical methods are applied to solve biological problems. Biology is digital by nature, demonstrated by the fact that all living organisms store their genetic information in DNA using 4 nucleotides. The detection of 5th and 6th nucleotides, 5-methylcytosine and 5-hydroxymethycytosine as epigenetic modification to nucleotide C, also becomes digital through popular bisulfite sequencing and promising single molecule direct sequencing. In addition to DNA methylation, the other two major epigenetic factors are histone modification and nucleosome positioning. Long Noncoding RNA has also gained attention in the past several years. With the fast growing technology, the question of how Epigenome interacts with Genome and Transcriptome is gradually approached with systematic computation using big data on genome scale.

Research Highligts

One Bioinformatics Project

        CDIF (Credible Difference) method and MOABS software (Genome Biology 2014)

        Other papers in submission and preparation

Two DNA Methylation Pertubation Projects

        Dnmt3a Single Knockout (Nature Genetics 2012)

        Dnmt3a Dnmt3b Double Knockout (Cell Stem Cell 2014)

One Aging Project ==>> Three Data Mining Papers

        DNA Methylation Canyon from deep WGBS data (Nature Genetics 2014)

        Epigenome in Hematopoietic Stem Cell Aging (Cell Stem Cell 2014)

        Novel HSC Specific LncRNA from deep RNA-Seq data (Cell Stem Cell 2015)




Credible Difference measures both statistical and biological significance of methylation difference 

D Sun, et. al. MOABS: Model based Analysis of Bisulfite Sequencing data. Genome Biology 15 (2), R38. (2014)


Currently the most popular method that tests differential methylation of a single cytosine is Fisher’s exact test. However, Fisher’s exact test makes two assumptions, which render the test inexact: (1) it assumes a contingency table with margins fixed, and (2) it assumes a point-null hypothesis on a continuous variable. Here we propose an exact numerical solution, available in C++ source library and R package “exactNumCI”, to calculate the confidence interval (CI) for single methylation ratio (statistically, a binomial proportion), the CI for the difference of two binomial proportions, and the CI for the difference of difference, for example the difference of 5hmc ratio in two samples. Based on this truly exact solution, we propose the credible difference, which is a conservative estimation of methylation ratio difference adjusted by sequencing depth. The credible difference turns out to have combined biological significance and statistical significance better than the Fisher’s exact test p-value in terms of ranking cytosines. This major model and critical concept is the method part of the manuscript.




A complete, accurate and efficient solution for DNA methylation analysis of WGBS (Whole Genome Bisulfite Sequencing) data 


D Sunet. al. MOABS: Model based Analysis of Bisulfite Sequencing data. Genome Biology 15 (2), R38. (2014).


We have developed “MOABS”, model based analysis of bisulfite sequencing data. It provides a complete, accurate, efficient and biologist friendly solution for analysis of large-scale DNA methylation data from single cytosine level to region level. Currently it seamlessly integrates four modules including bisulfite reads alignment, methylation ratio calling, identification of hypomethylated region and variable methylation region for one sample, identification of differential methylation from two samples, and other downstream analysis. By wrapping BSMAP to efficiently utilize threads and clusters, it maps 2 billion 100bp long reads in 10 hours on a cluster with 160 CPUs. It used advanced algorithms for call of methylation per base and detection of differential methylation so that 2 billion aligned reads from two conditions can be processed lightening fast in 1 hour on around 30 million CpGs in human on 24 CPUs while other R packages take more than 1 day.  




De novo DNA Methylation Balances Hematopoietic Stem Cell Self-Renewal and Differentiation


GA Challen, D Sun^et. al. Dnmt3a is essential for hematopoietic stem cell differentiation. Nature genetics 44 (1), 23-31 (2012)


Cytosine methylation is an epigenetic mark usually associated with gene repression. Despite a requirement for de novo DNA methylation for differentiation of embryonic stem cells, its role in somatic stem cells is unknown. Using conditional ablation, we show that loss of either, or both, Dnmt3a or Dnmt3b, progressively impedes hematopoietic stem cell (HSC) differentiation during serial in vivo passage. Concomitantly, HSC self-renewal is immensely augmented in absence of either Dnmt3, particularly Dnmt3a. Dnmt3-KO HSCs show upregulation of HSC multipotency genes and downregulation of early differentiation factors, and the differentiated progeny of Dnmt3-KO HSCs exhibit hypomethylation and incomplete repression of HSC-specific genes. HSCs lacking Dnmt3a manifest hyper-methylation of CpG islands and hypo-methylation of genes which are highly correlated with human hematologic malignancies. These data establish that aberrant DNA methylation has direct pathologic consequences for somatic stem cell development, leading to inefficient differentiation and maintenance of a self-renewal program.



Dnmt3b has distinct roles than Dnmt3a in maintaining balance of hematopoietic stem cells 


GA Challen^, D Sun^, et. al Dnmt3a and Dnmt3b Have Overlapping and Distinct Functions in Hematopoietic Stem Cells. Cell Stem Cell 15 (3), 350-364. (2014).


Hematopoietic stem cells (HSCs) are the precursors of the hematopoietic system responsible for the lifelong production of blood and bone marrow. Given the emerging importance of epigenetic regulation in HSC fate decisions and malignant transformation, we investigated the role of the DNA methyltransferase Dnmt3b through genetic ablation in HSCs - either alone or in combinatorial deletion with its paralog Dnmt3a. While conditional inactivation of Dnmt3b alone in adult HSCs had minor functional impact, simultaneous deletion of Dnmt3a and Dnmt3b was synergistic resulting in a severe block in differentiation and enhanced HSC self-renewal. Dnmt3a/b-null HSCs displayed activated -catenin signaling, partly accounting for the differentiation block. Loss of Dnmt3a in HSCs resulted in global DNA hypomethylation, but a paradoxical hypermethylation of CpG islands, most of which was eliminated in Dnmt3a/b-null HSCs. These data demonstrate distinct roles for Dnmt3b in HSC differentiation and provide unprecedented resolution into the epigenetic regulation of HSC fate decisions.



DNA Methylation Canyon maintained by competition between Dnmt3a and Tets 


M Jeong^, D Sun^, et. alLarge conserved domains of low DNA methylation maintained by Dnmt3a. Nature Genetics 46, 17–23. (2014).


Gains and losses in DNA methylation are prominent genomic features of all mammalian cell types. To gain insight into mechanisms that could promote shifts in DNA methylation patterns and thus contribute to cell fate, including malignant transformation, we performed genome-wide mapping of 5-methylcytosine and 5-hydroxymethylcytosine in purified hematopoietic stem cells (HSCs). We discovered exceptionally large DNA methylation lacunae that span highly conserved domains frequently containing transcription factors and are quite distinct from CpG islands and shores. The genes in about half of these “methylation canyons” are coated with repressive histone marks. The remaining canyons are coated with activating histone marks and harbor genes highly expressed in HSCs. Their cliffs are demarked by 5-hydroxymethylcytosine and become eroded in the absence of DNA methyltransferase 3a (Dnmt3a). Genes within expanding canyons are highly enriched for those dysregulated in human leukemias. Hence, the novel epigenetic landscape we describe may provide a mechanism for the regulation of hematopoiesis and, if disrupted, may contribute to leukemia development. 





Data Mining on Aging of Hematopoietic Stem Cell Epigenome and Transcriptome 


D Sun, et. al. Epigenetic dysregulation underlying HSC aging. Cell Stem Cell 14 (5), 673-688. (2014)


We profiled the transcriptome, DNA methylome, and histone modifications from purified young and old murine bone marrow-derived hematopoietic stem cells (HSCs). Somatic stem cells replenish differentiated cells of many tissues, but cannot avert the process of aging.  A comprehensive and integrated genomic analysis of young and aged somatic stem cells is needed to elucidate stem cell-intrinsic aging mechanisms that erode their function. Transcriptome analysis indicates reduced TGFb signaling perturbing genes involved in HSC proliferation and differentiation with age. Examination of histone modifications shows that the repressive Polycomb H3K27me3 mark increases with age; concomitantly, H3K4me3 marking breadth increases across many genes associated with HSC identity. DNA methylation increases, particularly at binding sites for transcription factors involved in HSC differentiation, and decreases at those associated with HSC maintenance. Together these changes reinforce the self-renewal program and diminish differentiation capacity, paralleling their behavior. Ribosomal biogenesis emerges as a particular target of HSC aging, with increased transcription of ribosomal protein as well as RNA genes, and marked hypomethylation of rRNA genes.  Together, the data provide the first detailed epigenetic signature of purified HSCs, which will serve as a reference epigenome for young and aged adult stem cell. 




Long Non-coding RNAs Control Hematopoietic Stem Cell (HSC) Function 


M Luo^, M Jeong^, D Sun^et. al. Long Non-Coding RNAs Control Hematopoietic Stem Cell Function. Cell stem cell 16 (2015) 426-438.


Long non-coding RNAs (lncRNAs) have recently emerged as new players in gene expression regulation. Whether and how lncRNAs might control hematopoietic stem cell (HSC) function remains largely unknown. Here, we profiled the transcriptome of purified long-term HSCs by deep RNA-sequencing and identified 503 un-annotated transcripts of which 323 are predicted to be lncRNAs. Comparison of their expression in differentiated lineages represented by B cells and Granulocytes revealed 159 that are likely to be HSC-specific novel lncRNAs. These genes share many epigenetic features with protein coding genes and expression of many is regulated by DNA methylation.  Knockdown of LncHSC-1 and LncHSC-2 suggest they regulate HSC lineage differentiation. Taken together, we comprehensively identify HSC-specific lncRNAs and demonstrate their role in regulating HSC function, adding an additional layer to the HSC regulatory circuit.