Finding functional variants in complex diseases using DNaseI hypersensitivity maps
Abstract
The genetic risk for developing complex diseases most likely arises from many common, noncoding variants. While genome wide association studies (GWAS) have identified associations between genetic loci and disease phenotypes, they are unable to pinpoint the causal, functional variant(s). This is compounded by our lack of understanding of the regulatory role of noncoding sequences. We investigated the use of DNaseI hypersensitivity (DHS), a generic marker for open chromatin and regulatory elements, to identify candidate functional variants in noncoding DNA since, we hypothesize, they are more likely to be functional. We analyzed publicly-available ENCODE DHS data, produced by groups at the University of Washington and Duke University, to find that 12-13% of the genome is defined as DHS. This may be a conservative figure since data saturation has not been reached and additional cell-type specific DHS sites remain to be found. In characterizing DHS distribution and sequence features, we find a two-to-three-fold enrichment of DHS in exons and a two-fold enrichment within 5Kb of transcription start sites compared to the genomic average. In addition, DHS sites are GC-rich, repeat-poor, and tend to cluster. DHS regions are enriched for HapMap SNPs (1.2-to-1.3-fold) as well as GWAS phenotype-associated SNPs (1.8-to-2-fold). While lack of data saturation limits the power of this approach today, the enrichment of phenotype-associated SNPs suggests promise in using DHS data in predicting functional variants in noncoding DNA to perform more targeted assays.
Related articles