Are there rearrangement hotspots in the human genome




















We previously identified homogenization events between these AZFa -HERVs that are incompatible with this model Figure 10a , and suggested a double-crossover mechanism, which might either occur in a single meiosis, or via single unequal crossovers in different meioses - in other words, through the deletion of a duplicated intermediate Figure 10b.

The recent finding of these AZFa duplicated intermediates in the general population, and their apparent compatibility with fertility, suggest that the latter two-step scenario is more plausible [ 25 ].

This second mechanism for paralog homogenization has the potential to homogenize much longer tracts of segmental duplications than the mismatch repair-based mechanism. The same may also be true of other segmental duplications underlying known genomic disorders where the reciprocal duplications of pathogenic deletions can be passed on from parent to offspring.

The constitutively haploid nature of the Y chromosome means that the substrates for the unequal crossing-over event that generates the duplicated intermediate must be sister chromatids.

The unequal crossing-over event that results in deletion back to two copies might be either intra-chromosomal or between sister chromatids.

The demonstration that gene conversion can exaggerate levels of sequence divergence over and above those observed at non-duplicated loci in the same genome has important implications for evolutionary studies investigating variation in mutation rates between loci.

Comparisons between sequence divergence at duplicated and non-duplicated loci are likely to be confounded by the presence of gene conversion in the former but not the latter. This factor may help to explain large discrepancies between recent estimates of the male-driven mutation rate in humans [ 26 — 30 ].

Against this background it is intriguing to note the recent study identifying reduced sequence divergence within long palindromic sequences undergoing gene conversion on the Y chromosome [ 12 ].

These palindromes have approximately The authors of this study proposed that by reducing sequence divergence, gene conversion might provide a means to maintain the functional integrity of genes residing in these palindromes.

The simulations presented here indicate that gene conversion alone is unlikely to be capable of reducing orthologous sequence divergence. The reduction in sequence divergence is more likely to be due to an interplay of factors. For example, if gene conversion and base substitution were GC-biased in opposing directions, gene conversion events might preferentially maintain the ancestral state of a paralogous sequence variant.

Gene conversion between paralogous sequences has important consequences for the analysis of human genomic diversity. It has recently been inferred from the preferential mapping of dbSNP entries to segmental duplications that some , dbSNP entries are not true single-nucleotide polymorphisms SNPs , but are misassembled paralogous sequence variants PSVs [ 24 ].

However, this assertion does not take into account the possibility that gene conversion can elevate levels of sequence diversity at segmental duplications [ 31 ]. This suggestion draws support from both empirical [ 32 ] and theoretical [ 22 , 33 , 34 ] studies. Our analyses reveal that gene conversion provides a mechanism to elevate nucleotide variability by introducing new variants from paralogous sequences. Indeed, the simulations presented here for interspecific sequence divergence are formally no different from the processes operating at non-recombining loci to generate intraspecific diversity.

Gene conversion generates haplotypes with complex evolutionary histories. Further work is required to explore the haplotypic structure within segmental duplications. Each primate AZFa -HERV was amplified in its entirety, either in a single amplification using primers located in flanking single-copy sequence, or by using two overlapping long PCR reactions, each using one internal primer and one primer in the proximal- or distal-specific flanking sequences. Primer sequences are documented in Additional data file 1.

Sequences were aligned using Se-Al [ 39 ]. The alignment is documented in Additional data file 2. Jukes-Cantor distances and neighbor-joining trees were calculated using Phylip [ 40 ]. Phylogenetic networks [ 41 ] were constructed using SplitsTree [ 42 ]. Sliding-window analyses of sequence similarities, concerted indices and directionality indices was performed using code written in Interactive Data Language 5. These sliding windows were bp long and were analysed at 15 bp intervals across the alignment.

Concerted evolution can be defined as the maintenance of homogeneity of nucleotide sequences among duplicated sequences within a species, although the nucleotide sequences change over time. We devised a statistic we call the concerted index CI to quantify this within-species similarity in relation to the observed variation between species. If D p1p2 represents the distance between two orthologous sequences p1 and p2 in terms of the percentage of variant nucleotides between them, and D p1d1 represents the distance between two paralogous sequences p1 and d1 using the sequence nomenclature of paralogs and orthologs shown in Figure 5 , then the CI is calculated using the equation:.

Consequently, when sequences are evolving in a concerted fashion, the mean distance between orthologs is relatively high, but the distance between paralogs is low, and the CI will tend to 1. This statistic is extended to the current situation where three species are represented by calculating the mean of the CI across each of the three possible pairwise comparisons.

This equation could be extended to include any model of sequence evolution in the distance calculation. However, the high levels of similarity between the sequences being analysed here means that any such correction has negligible impact in this analysis.

The distribution of CI values is strongly bimodal in this analysis data not shown , thus clearly distinguishing between those portions of the alignment that are undergoing concerted evolution and those that are not. The DI measures the difference between orthologous sequence divergence at proximal and distal copies of a duplicated sequence, as a function of the mean orthologous sequence divergence.

Thus if there is strong proximal-to-distal directionality, the discrepancy between proximal and distal orthologous divergences will generate a more negative DI. High distal-to-proximal directionality will generate a more positive DI and with minimal directionality, the DI will tend towards 0.

Monte Carlo stochastic simulations were written in Interactive Data Language 5. The simulation models the post-speciation evolution of a pair of 10 kb duplicated sequences in two daughter species, for example human and chimpanzee. A model of sequence divergence is implemented in which each base in the four sequences is equally mutable, and is capable of undergoing reversions and parallel mutations.

In addition, infrequent gene conversion events between paralogs are incorporated at random, but limited to the first half of the sequences.

This does not imply that gene conversion was absent before speciation, only that it is immaterial, given that variation between paralogs accumulates even in the presence of gene conversion, as homogenization is rarely perfect.

The probability P that the conversion tract length is n nucleotides long is given by the equation:. This conversion tract is positioned at random within the 5 kb portion of the duplicated sequence that is capable of undergoing gene conversion. The directionality of the gene conversion event is stochastically assigned according to a single parameter, x, which reflects the probability that the gene conversion donor is the proximal sequence, and the acceptor is the distal sequence.

After an amount of evolutionary time equivalent to a fixed number of generations, each simulation is halted and pairwise sequence similarities are calculated in non-overlapping bp windows across the 10 kb sequence for each pair of orthologs and paralogs.

Each simulation is replicated 1, times under the same parameters, and the sequence similarities for each pairwise comparison are averaged over all replications. Additional data available with the online version of this paper comprise: additional data file 1, a table of primers used in amplifying and sequencing the AZFa -HERVs Additional data file 1 ; additional data file 2, the alignment of proximal and distal AZFa -HERVs from human, chimpanzee and gorilla in FASTA format Additional data file 2 ; and additional data file 3, the source code for simulations of gene conversion used in this study, written in Interactive Data Language IDL Additional data file 3.

See Materials and methods for details. Nat Genet. Trends Genet. Curr Biol. Hum Mol Genet. Papadakis MN, Patrinos GP: Contribution of gene conversion to the evolution of the human beta-like globin gene family. Hum Genet. Mol Cell Biol. J Med Genet. BMC Genomics. Chen FC, Li WH: Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees.

Am J Hum Genet. Genome Res. Taghian DG, Nickoloff JA: Chromosomal double-strand breaks induce gene conversion at high frequency in mammalian cells. Innan H: A method for estimating the mutation, gene conversion and recombination parameters in small multigene families. Nat Rev Genet. Once we have sequenced genomes in the previous course, we would like to compare them to determine how species have evolved and what makes them different.

In the first half of the course, we will compare two short biological sequences, such as genes i. In the second half of the course, we will "zoom out" to compare entire genomes, where we see large scale mutations called genome rearrangements, seismic events that have heaved around large blocks of DNA over millions of years of evolution.

Looking at the human and mouse genomes, we will ask ourselves: just as earthquakes are much more likely to occur along fault lines, are there locations in our genome that are "fragile" and more susceptible to be broken as part of genome rearrangements? We will see how combinatorial algorithms will help us answer this question. Finally, you will learn how to apply popular bioinformatics software tools to solve problems in sequence alignment, including BLAST.

They were consistently excellent. We will find it convenient to represent a circular chromosome with genes x 1 , The directions of the edges correspond to signs strand of the genes. We label the tail and head of a directed edge x i as x i t and x i h , respectively.

Vertex x i t is called the obverse of vertex x i h , and vice versa. Vertices in a chromosome connected by an undirected edge are called adjacent. We represent a genome as a graph consisting of disjoint cycles one for each chromosome. Let P be a genome represented as a collection of alternating black-obverse cycles a cycle is alternating if the colors of its edges alternate. For any two black edges u , v and x , y in the genome graph P , we define a 2-break rearrangement as replacement of these edges with either a pair of edges u , x , v , y , or a pair of edges u , y , v , x Figure 2.

This definition of elementary rearrangement operations follows the standard definitions of reversals, translocations, fissions, and fusions for the case of circular chromosomes. For circular chromosomes, fusions and translocations are not distinguishable; i. The 2-break rearrangements can be generalized as follows.

Given k black edges forming a matching i. Let P and Q be two signed genomes on the same set of genes. Edges of each color form a matching on V: obverse matching pairs of obverse vertices , black matching adjacent vertices in P , and gray matching adjacent vertices in Q. Every pair of matchings forms a collection of alternating cycles in G P , Q called black-gray , black-obverse , and gray-obverse cycles, respectively. The chromosomes of the genome P respectively, Q can be read along black-obverse respectively, gray-obverse cycles.

The black-gray cycles in the breakpoint graph play an important role in analyzing rearrangements [ 28 ] see Chapter 10 of [ 29 ] for background information on genome rearrangements. The k -break distance d k P,Q between circular genomes P and Q is defined as the minimum number of k -breaks required to transform one genome into the other. Every k -break in the genome P corresponds to a transformation of the breakpoint graph G P , Q. Since the breakpoint graph of two identical genomes is a collection of trivial black-gray cycles with one black and one gray edges the identity breakpoint graph , the problem of transforming the genome P into the genome Q by k -breaks can be formulated as the problem of transforming the breakpoint graph G P , Q into the identity breakpoint graph G Q , Q.

Different from the genomic distance problem [ 5 , 30 , 31 ] for linear multichromosomal genomes , the 2-break distance problem for circular multichromosomal genomes has a trivial solution first given in [ 32 ] in a slightly different context. For the sake of completeness, we reproduce a proof from [ 33 ]:. Theorem 1. Since every 2-break adds two new edges, it can create at most two new black-gray cycles.

On the other hand, since every 2-break removes two old edges, it should remove at least one old black-gray cycle. While 2-breaks correspond to standard rearrangements, 3-breaks add transposition-like operations transpositions and inverted transpositions as well as three-way fissions to the set of rearrangements Figure 3.

Different from standard rearrangements modeled as 2-breaks , transpositions introduce three breaks in the genome, making them notoriously difficult to analyze.

Despite many studies, the complexity of sorting by transpositions remains unknown [ 41 — 45 ]. A 3-break on edges u , v , x , y and z , t corresponding to a transposition of the segment y…t from one chromosome to another. A transposition cuts off a segment of one chromosome and inserts it into the same or another chromosome.

Underlining shows a piece of chromosome that was transposed from one chromosome to another. Let c odd P , Q be the number of black-gray cycles in the breakpoint graph G P , Q with an odd number of black edges odd cycles. Theorem 2. It is easy to see that as soon as there is a nontrivial black-gray odd cycle in the breakpoint graph G P , Q , it can be split into three odd cycles by a 3-break, thus increasing the number of odd cycles by two.

On the other hand, if there exists a black-gray even cycle, it can be split into two odd cycles, thus again increasing the number of odd cycles by two. For the sake of completeness, below we formulate the duality theorem for the k -break distance for an arbitrary k from [ 27 ]. Theorem 3. The k-break distance between circular genomes P and Q is. Sankoff summarized arguments against FBM in the following sentence [ 21 ]:.

And we cannot infer whether mutually randomized synteny block orderings derived from two divergent genomes were created through bona fide breakpoint re-use or rather through noise introduced in block construction or through processes other than reversals and translocations. The flaw in the first argument was revealed in [ 20 ].

In this paper, we study transformations between the human genome H and the mouse genome M with 3-breaks, using the synteny blocks from [ 46 ] and assume that all chromosomes are circular. While analyzing linear chromosomes would be more adequate than analyzing their circularized versions, it poses additional algorithmic challenges that remain beyond the scope of this paper. The related paper [ 47 ] addressed these challenges and demonstrated that switching from linear to circular chromosomes does not lead to significant changes in the multi-break distance.

While this is a high breakpoint re-use rate inconsistent with RBM and the scan statistics , this estimate relies on the assumption that each 3-break on the evolutionary path from H to M makes three breaks complete 3-breaks.

In reality, some 3-breaks can make two breaks incomplete 3-breaks as 2-breaks are particular cases of 3-breaks, reducing the estimate for the number of breakpoint re-uses. Moreover, the minimum number of breakpoint re-uses may be achieved on a suboptimal evolutionary path from H to M. The rebuttal of RBM raises a question about finding a transformation of H into M by 3-breaks that makes the minimal number of individual breaks.

The following theorem shows that there exists a series of 3-breaks that makes the minimum number of breaks while transforming P into Q.

Theorem 4. Consider a shortest series of complete 3-breaks transforming every odd black-gray cycle into a trivial cycle and every even black-gray cycle into trivial cycles and a single cycle with two black edges. Corollary 5. Every transformation between the circularized human genome H and mouse genome M by 3-breaks requires at least breakpoint re-uses implying that there exist rearrangement hotspots in the human genome.

This is still higher than the expected breakpoint re-use rate of RBM as computed by scan statistics see [ 4 ] and simulations in the next section. Below, we show how the number of breaks made in a series of 3-breaks depends on the number of complete 3-breaks in this series. Theorem 6. Corollary 7. Corollary 7 gives the lower bound for the breakpoint re-use rate as a function of the number of complete 3-breaks i. For the human genome H and mouse genome M, this lower bound is shown in Figure 4.

A lower bound for the breakpoint re-use rate as a function of the number of complete 3-breaks in a series of 3-breaks between the circularized human and mouse genomes based on conserved segments from [ 46 ]. Corollaries 5 and 7 address only the case of circularized chromosomes and further analysis is needed to extend it to the case of linear chromosomes see [ 47 ]. Recently, Bergeron et al. A more realistic analysis of 3-breaks leads to a much higher estimate of the breakpoint re-use see Figure 4.

The papers [ 7 , 21 ] claim that deletion of some synteny blocks in [ 4 ] may create an appearance of breakpoint re-use even if there was no breakpoint re-use at all. Science News. ScienceDaily, 13 November Public Library of Science.

Retrieved November 11, from www. Islands Are Cauldrons of Evolution Oct.



0コメント

  • 1000 / 1000