Supplementary Material for: Evolutionary implications of pericentromeric gene expression in humans
datasetposted on 12.11.2004 by Mudge J.M., Jackson M.S.
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Human pericentromeric sequences are enriched for recent sequence duplications. The continual creation and shuffling of these duplications can create novel intron-exon structures and it has been suggested that these regions have a function as gene nurseries. However, these sequences are also rich in satellite repeats which can repress transcription, and analyses of chromosomes 10 and 21 have suggested that they are transcript poor. Here, we investigate the relationship between pericentromeric duplication and transcription by analyzing the in silico transcriptional profiles within the proximal 1.5 Mb of genomic sequence on all human chromosome arms in relation to duplication status. We identify an ∼5× excess of transcripts specific to cancer and/or testis in pericentromeric duplications compared to surrounding single copy sequence, with the expression of >50% of all transcripts in duplications being restricted to these tissues. We also identify an ∼5× excess of transcripts in duplications which contain large quantities of interspersed repeats. These results indicate that the transcriptional profiles of duplicated and single copy sequences within pericentromeric DNA are distinct, suggesting that pericentromeric instability is unlikely to represent a common route for gene creation but may have a disproportionate effect upon genes whose function is restricted to the germ line.