EqualTDRL: illustrating equivalent tandem duplication random loss rearrangements

EqualTDRL: illustrating equivalent tandem duplication random loss rearrangements Background: To study the differences between two unichromosomal circular genomes, e.g., mitochondrial genomes, under the tandem duplication random loss (TDRL) rearrangement it is important to consider the whole set of potential TDRL rearrangement events that could have taken place. The reason is that for two given circular gene orders there can exist different TDRL rearrangements that transform one of the gene orders into the other. Hence, a TDRL event cannot always be reconstructed only from the knowledge of the circular gene order before a TDRL event and the circular gene order after it. Results: We present the program EqualTDRL that computes and illustrates the complete set of TDRLs for pairs of circular gene orders that differ by only one TDRL. EqualTDRL considers the circularity of the given genomes and certain restrictions on the TDRL rearrangements. Examples for the latter are sequences of genes that have to be conserved during a TDRL or pairs of genes that frame intergenic regions which might represent remnants of duplicated genes. Additionally, EqualTDRL allows to determine the set of TDRLs that are minimum with respect to the number of duplicated genes. Conclusion: EqualTDRL supports scientists to study the complete set of TDRLs that possibly could have taken place in the evolution of mitochondrial genomes. EqualTDRL is implemented in C++ using the ggplot2 package of the open source programming language R and is freely available from http://pacosy.informatik.uni-leipzig.de/equaltdrl. Keywords: Circular permutation, Gene order, Genome rearrangement, Mitochondria, Tandem duplication random loss Background to determine the distance between two genomes, i.e., the The genetic information of species is stored in DNA (or smallest number of rearrangements (of certain types) that RNA) molecules. These molecules are called chromo- are needed to transform one gene order into the other. The somes and can either be linear or circular.The setof sorting problem asks forashortest sequenceofrearrange- all these molecules of a species forms its genome.The ments for such a transformation. Such a sequence is called genome consists of genes which are DNA segments with a shortest scenario. In case that costs can be assigned to certain functions. Mutations can modify the arrangement different types of rearrangements the distance problem or the multiplicity of the genes within the genome. Such (the sorting problem) asks for the minimum total cost mutations are called rearrangements. The research field for the transformation from one gene order to the other of genome rearrangement analysis tries to explain the dif- (respectively, for a corresponding minimum cost sequence ferences between two genomes that are represented by of rearrangements). the arrangement of their genes, in order to infer phylo- One important type of rearrangements is the tandem genetic information. The following two central problems duplication random loss (TDRL) rearrangement. A TDRL exist in this research field [1]. The distance problem aims consists of a tandem duplication of a contiguous set of genes followed by a random loss of one copy of each duplicated gene. TDRLs have occurred several times in *Correspondence: thartmann@informatik.uni-leipzig.de the evolution of mitochondrial genomes, e.g., in gulper Swarm Intelligence and Complex Systems Group, Faculty of Mathematics and Computer Science, Leipzig University, Augustusplatz 10, D-04109 Leipzig, eels [2] and in millipedes [3]. Several mechanisms that Germany explain a TDRL have been discussed in the biological Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 2 of 10 literature, e.g., slipped strand mispairing during replica- Software tools that regard TDRL rearrangements for tion or imprecise termination [4]. It has been shown that the construction of evolutionary scenarios are CREx [17], TDRLs are a major factor of gene order evolution for CREx2 [18], and TreeREx [19]. However, these tools mitochondrial genomes [4–6]. present only a single shortest scenario in the case that a The TDRL rearrangement was initially studied formally TDRL is present in a shortest scenario. Hence, when a sce- for linear genomes in [7]. The cost of a TDRL rearrange- nario contains a TDRL, the tools do not show all possible ment is defined as α ,where α ≥ 1 is a parameter and k ∈ alternative TDRLs. Therefore, it is important to further N is thenumberofgenes thatareinfluenced by theTDRL. analyze the solutions that are generated and the software For the cases α = 1and α ≥ 2 polynomial time algo- EqualTDRL that we presentinthispaper is designed to rithms that solve the sorting problem (and therefore the support scientists to do such an analysis. distance problem) have been presented in [7]. In addition, EqualTDRL uses results from [15] to consider the cir- it has been shown for α = 1, that it is sufficient to consider cularity of the given gene orders. It provides figures that TDRLs that duplicate the whole genome, because the cost illustrate all equivalent TDRL operations for two given of every TDRL is the same in this case. Here we consider gene orders that differ by only a single TDRL. Therefore, the case α = 1 as well. Note that this definition of a TDRL EqualTDRL supports scientists to study the whole set of considers explicitly only whole genome duplications but TDRLs that possibly could have taken place. This is bene- implicitly also partial genome duplications, as explained ficial in several aspects of inferring a most reliable TDRL. in detail in “Implementation”section. For example: i) for identifying whether the gene loss of a The authors of [3] studied an alternative version of TDRL is random or is dependent on gene orientation or TDRL rearrangement where genes with the same orienta- transcript structure, ii) for finding a (partial duplication) tion or genes that belong to the same transcript are lost TDRL that duplicates only a minimum number of genes, jointly, i.e., the loss is not completely random but depends iii) to determine a subset of TDRLs that satisfy certain on gene orientation or transcript structure. conditions, e.g., to preserve certain sequences of genes In order to reconstruct genome rearrangements reliably that cannot be broken during a TDRL, and iv) for identi- it is important to consider the possibility of alternative fying the positions of potential duplication remnants for a shortest rearrangement scenarios and equivalent rear- further analyses of the nucleotide sequence. rangements, i.e., rearrangements that when applied to This article is organized as follows. In the next “Imple- the same gene order lead also to the same resulting gene mentation” section a formal background is given on cir- order. This has been studied for the inversion, the trans- cular gene orders and TDRL rearrangements. Further, position and the double cut and join (DCJ) rearrangement an overview on EqualTDRL is presented. In “Results [8, 9]. Equivalent inversions for signed circular permu- and discussion” section the benefits of EqualTDRL tations, which represent circular genomes, have been are shown for a biological example of mitochondrial discussed in [10] and all shortest scenarios for the sorting gene orders. The article ends with a conclusion in of signed permutations have been studied in depth in “Conclusion”section. [11–14]. Since every transposition can be represented by a TDRL, the set of equivalent transpositions has been Implementation studied in [15]. The set of all shortest scenarios of the Methods problem to sort a multichromosomal genome by DCJ In this article it is assumed that the genes in a genome rearrangements has been investigated in [16]. In this (before and after a TDRL) are not duplicated and that work the authors also determined the exact number of genes do not overlap. Therefore, a gene order of a (cir- optimal shortest scenarios for a particular set of problem cular) genome can be represented by a (circular) permu- instances. tation. A permutation π of length n, denoted by π = An analysis of TDRL rearrangements on circular (π(1) ... π(n)),isabijection π:[1 : n] →[1 : n]. genomes has been presented in [15]. It has been shown The set of all permutations of length n is denoted by that the circularity of the genomes should be considered, S .The shift operation φ : S → S is defined by π = n n n since the TDRL distance for an unfavorable choice of lin- (π(1) ... π(n)) → (π(2) ... π(n)π(1)) and for k ∈ N >0 k k k−1 1 ear representatives (of the circular genomes) may lead to the k-shift is φ ◦ π,where φ := φ ◦ φ and φ := φ. an overestimation of the distance. In addition, it has been Note that with f ◦g the composition of two functions f and shown that it is not always possible to uniquely recon- g is denoted, i.e., (f ◦ g)(x) := f (g(x)).With ∼ we denote struct a TDRL only from the knowledge of the two circular the equivalence relation on S ,where π, π ∈ S are n n gene orders before and after the application of the TDRL equivalent, denoted by π ∼ π , if and only if there exists because there exist several TDRLs that can explain the an m ∈[1 : n]with φ (π ) = π . The equivalence class change from one circular genome to the other. π :=[ π] of ∼ on S is called a circular permutation of ∼ n length n and the set of all circular permutations of length Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 3 of 10 n is denoted by S . In other words, a circular permutation sets of genes that are kept (i.e., they are not deleted) in the ◦ ◦ π ∈ S is the set of all permutations that are equivalent first and second copy of π in the duplicated intermedi- ◦ n−1 with respect to ∼, i.e., π ={π, ρ(π), ... , ρ (π )} or, ate, respectively. Here “first” and “second” are defined with less formally, a circular permutation is the set of all per- respect to π . The elements that are deleted in the dupli- mutations that become equivalent when the first and cated intermediate are called lost elements. The effect of a ◦ ◦ ◦ ◦ the last element of a permutation are considered to be TDRL τ = (F, S, p) on π is defined as τ ◦π :=[ τ ◦π ] , p ∼ adjacent. Figure 1a shows an illustration of a circular per- where τ ◦ π is a rearranged permutation of π such that p p ◦ ◦ mutation. Each π ∈ π is called a representative of π and the elements of F are moved in front of the elements of S the representative which starts with element p ∈[1 : n] and the relative order of all elements of F (respectively S) is denoted by π . Circular permutations are used as a for- of π is unchanged. More precisely, τ ◦ π is the (linear) p p p mal model for unichromosomal circular genomes, e.g., permutation for which it holds that 1) if i ∈ F, j ∈ S,then mitochondrial genomes, in which each element repre- i is to the left of j in τ ◦ π and 2) if i, j ∈ F or i, j ∈ S,then sents a gene and each representative stands for a possible i is to the left of j in τ ◦ π if and only if i is to the left of j linearization of the considered genome. in π . Note that by this definition, a TDRL (F, S, p), which ◦ ◦ ◦ ◦ ATDRL τ : S → S is a bijection that is denoted by maps a set π of permutations to another set of permuta- n n atriple (F, S, p),where (F, S) is a bipartition of [ 1 : n], tions, can be visualized by reordering the elements of the i.e., F, S ⊂[1 : n], F ∩ S =∅,and F ∪ S =[1 : n], and representative π according to the sets F and S,see Fig. 1b p ∈[1 : n]. Element p is called the origin of (F, S, p) and for an example. For the combinatorics of TDRLs it is irrel- it denotes the position where the whole genome duplica- evant whether F or S is the set of genes that are preserved tion of π starts resulting in a duplicated intermediate. in the original part of the duplicated intermediate, since The sets F and S are a bipartition of [ 1 : n] and denote the both applications result in the same circular permutation ab Fig. 1 Application of TDRL ({4, 5}, {1, 2, 3},1) to the circular permutation [ (14253)] . The resulting circular permutation is [ (45123)] . The circular ∼ ∼ permutation [ (14253)] (respectively [ (45123)] )isrepresented in (a) by a circular illustrations on the top (respectively bottom) which gives ∼ ∼ the corresponding representatives when read in clockwise direction. Whereas (a) shows the application of the TDRL by using circular illustrations, (b) illustrates the same application by applying a tandem duplication to the representative π ∈[ (14253)] followed by the subsequent loss of 1 ∼ one copy of every duplicated gene. In particular, the elements of F ={4, 5} are kept in the first copy (illustrated by a bright gray) and the elements of S ={1, 2, 3} are kept in the second copy (illustrated by a dark gray) of the duplicated intermediate. Elements that are lost during this process are crossed out. Permutation τ ◦ π = (45123) is a representative of the resulting circular permutation, namely [ τ ◦ π ] =[ (45123)] 1 1 ∼ ∼ Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 4 of 10 ◦ ◦ ◦ ◦ [15], i.e., (F, S, p)◦π = (S, F, p)◦π . Note that the strand- that transform ι into π : ({3, 5, 7, 8}, {1, 2, 4, 6},1), edness of a gene (also called its orientation) is not relevant ({1, 3, 5, 7, 8}, {2, 4, 6},2), ({4, 6},{1, 2, 3, 5, 7, 8},3), ({1, 2, 5, 7, 8}, for this paper since a TDRL does not change the strand- {3, 4, 6},4), ({3, 6}, {1, 2, 4, 5, 7, 8},5), ({1, 2, 4, 7, 8}, {3, 5, 6},6), edness of genes. This is also the reason why we represent ({3, 5}, {1, 2, 4, 6, 7, 8},7), ({3, 5, 7}, {1, 2, 4, 6, 8},8),and a gene order as an (unsigned) permutation. all TDRLs that can be obtained from the listed TDRLs Two TDRLs are called equivalent if their application by interchanging the sets F and S.RecallthatTDRL to the same (circular) permutation results in the same ({3, 5}, {1, 2, 4, 6, 7, 8},7) is illustrated in Fig. 2. (circular) permutation. It was shown that the number of Moreover, EqualTDRL can illustrate all partial TDRLs equivalent TDRLs is 2n if the TDRLs are identity maps in addition to the whole genome TDRLs. This is shown in and 2n otherwise, where n is the length of permutation Fig. 3 where a black square surrounds the circle of a gene ◦ ◦ ◦ ◦ ◦ π [15]. The TDRL distance of π and σ is d(π , σ ) = if and only if the gene is part of the duplicated sequence of ◦ ◦ ◦ ◦ ◦ ◦ min({d ∈ N |∃ TDRLs τ , ... , τ : τ ◦...◦τ ◦π = σ }). a partial duplication TDRL. Note that for partial TDRLs 1 1 d d The definitions related to TDRL rearrangements and it is not important whether genes that are excluded from circular permutations are exemplified for the circular per- the duplication are considered to be in F or in S since they mutation π =[ (14253)] .See Fig. 1 for an illustration. are not duplicated and therefore no copy of these genes is The representatives of π are π = (14253), π = lost. As an example, consider the origin 3 in Fig. 3.The 1 2 (25314), π = (31425), π = (42531),and figure shows that TDRL ({4, 6}, {1, 2, 3, 5, 7, 8},3), which 3 4 π = (53142), i.e., π ={π , ... , π }.Itholds that π = considers the whole permutation to be duplicated, can be 5 1 5 2 2 2 2 2 φ (π ), π = φ (π ), π = φ (π ), π = φ (π ),and π = replaced by a partial duplication TDRL that only dupli- 1 3 2 4 3 5 4 1 2 ◦ φ (π ). The application of TDRL τ = ({4, 5}, {1, 2, 3},1) cates the sequence 3 4 5 6 and contains the genes 4, 6 in to π gives [ (45123)] . Since genes 1 and 2 are in S the – with respect to the origin – first copy of the dupli- and 1 is to the left of 2 in π ,itholdsthat 1istothe cated intermediate and genes 3, 5 in the second copy of the left of 2 in (45123).Also, since5 ∈ F and 3 ∈ S,it duplicated intermediate. See Fig. 2 for an illustration of holds that 3 is to the right of 5 in (45123).Figure 1 the specified partial duplication TDRL and a correspond- illustrates the application of τ to [ (14253)] .Since ing TDRL that duplicates the whole permutation. ◦ ◦ ({3, 4}, {1, 2, 5},5) ◦ π = ({4, 5}, {1, 2, 3},1) ◦ π it holds Finally, EqualTDRL is also able to illustrate only those that TDRLs ({3, 4}, {1, 2, 5},5) and ({4, 5}, {1, 2, 3},1) are TDRLs that satisfy the following types of conditions which equivalent. can be given by the user: i) specific sets of sequences of To see that the formal model also covers partial duplica- genes that are conserved by a TDRL and ii) intergenic tion TDRLs (i.e., TDRLs were not all genes are duplicated) regions that are framed by specific pairs of genes. Both consider a partial duplication TDRL that transforms π conditions and the type of input that they require are into σ . For every element e ∈[1 : n] it holds that either explained in the following. e ∈ F (i.e., e is kept in the first copy), e ∈ S (i.e., e is kept Forthe firsttypeofcondition theuserhas to specify in the second copy), or e ∈ N (i.e., e is not duplicated). the corresponding sets of genes. Then, EqualTDRL pro- Then the TDRL (F, S, p),where F = S , S = N ∪ F ,and ceeds as explained in the following. Consider a set of genes ◦ ◦ p being the unique non-duplicated element adjacent to G ⊂[1 : n], two circular permutations π and σ such that ◦ ◦ ◦ ◦ ◦ one element of the second copy in the partially duplicated d(π , σ ) = 1, and a TDRL τ = (F, S, p) with τ ◦ π = intermediate, gives the same circular output permutation σ .When G is used as a condition EqualTDRL only con- (see Fig. 2 for an example). Consequently, the same rear- siders equivalent TDRLs (F , S , p ) of τ such that either rangement can be achieved by a TDRL that duplicates all G ⊆ F or G ⊆ S . Therefore, if there exist such an equiva- elements [15]. lent TDRL all genes of G are lost in the same copy and the loss of this TDRL depends on G. In the case that set G con- EqualTDRL tains exactly all genes that belong to the same transcript The software tool EqualTDRL calculates for two the first type of condition is used to determine whether ◦ ◦ circular permutations π and σ that represent two or not the loss of a TDRL depends on a given transcript circular unichromosomal genomes the distances structure. Note that EqualTDRL can also use multiple ◦ ◦ ◦ ◦ ◦ ◦ d(π , σ ) and d(σ , π ).Inthe case that d(π , σ ) = 1 gene sets. For an example consider Fig. 3 that illustrates all ◦ ◦ ◦ (respectively d(σ , π ) = 1) EqualTDRL produces equivalent TDRLs that transform ι =[ (12345678)] an illustration which shows all equivalent TDRL rear- into π =[ (12463578)] . Let 1 2 and 7 8 be a sequence ◦ ◦ ◦ rangements, i.e., all triples (F, S, p) that transform π of genes (of π and ι ) that shall be conserved by a TDRL, into σ or vice versa. An example of such an illustra- hence G ={1, 2} and G ={7, 8}. If these conditions are 1 2 specified, then EqualTDRL provides an illustration sim- tionis giveninFig. 3 for the circular permutations ◦ ◦ ι =[ (12345678)] and π =[ (12463578)] with ilar to Fig. 3 but with the difference that it does not show ∼ ∼ ◦ ◦ d(ι , π ) = 1. Figure 3 illustrates all equivalent TDRLs the TDRLs for origin 2 and 8, i.e., ({1, 3, 5, 7, 8}, {2, 4, 6},2) Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 5 of 10 Fig. 2 Partial duplication TDRL and a corresponding TDRL that achieves the same circular output permutation [ (12463578)] .Notationisasin Fig. 1. The left-hand side shows a partial tandem duplication of the sequence 3 4 5 6 of [ (12345678)] followed by a subsequent loss of the elements 3 and 5 in the first copy, and 4 and 6 in the second copy, i.e., F ={4, 6}, S ={3, 5}, and the elements of N ={1, 2, 7, 8} are not duplicated. The same rearrangement can be achieved by the TDRL (F, S, p),where F = S , S = N ∪ F ,and p = 7 is the unique non-duplicated element adjacent to the second copy. The corresponding TDRL ({3, 5}, {1, 2, 4, 6, 7, 8},7) is illustrated on the right-hand side and ({3, 5, 7}, {1, 2, 4, 6, 8},8). These TDRLs are not illus- by an incomplete deletion of the gene (or the set of genes) trated since for i ∈[ 1 : 2] neither G ⊆ F nor G ⊆ S. that were lost between x and y. Therefore, if such a TDRL i i However, for all other origins p ∈{1, 3, 4, 5, 6, 7} it holds exists EqualTDRL provides evidence that an intergenic that G and G are subsets of either F or S. region might be a remnant of a gene (or the set of genes) 1 2 The second type of condition requires to find all possible that is formed by an incomplete deletion of the same. TDRLs that allow to deduce intergenic regions between For an example consider Fig. 3 that illustrates all equiva- pairs of genes that are given to EqualTDRL as an input lent TDRLs that transform ι =[ (12345678)] into from the user. To see this, consider two circular permuta- π =[ (12463578)] . Assume that an intergenic region ◦ ◦ ◦ ◦ ◦ ◦ tions π and σ ,aTDRL τ = (F, S, p) such that τ ◦ π = is between genes 3 and 5 in π and that one is interested σ , and a pair of two distinct genes x, y ∈[1, n] that is given to know which TDRLs (and which corresponding gene by the user. Such a pair of genes should frame an inter- losses)appliedto ι could possibly result into this arrange- genic region in the genome that is represented by σ .Then ment. Hence x = 3and y = 5 are chosen. If this condition EqualTDRL considers only equivalent TDRLs (F , S , p ) is given, EqualTDRL would produce Fig. 3.Hence,all of τ where at least one gene of the duplicated intermedi- illustrated TDRLs allow to deduce the intergenic region ate that is with respect to π between x and y is deleted, between 3 and 5. This holds because in all TDRLs of Fig. 3 i.e., there exists at least one gene z ∈[1 : n] \{x, y} such either 3 ∈ F and 5 ∈ S (e.g., origin 5), 5 ∈ F and 3 ∈ S (e.g., that z is between x and y in the duplicated intermedi- origin 4), or 3, 5 ∈ F (respectively 3, 5 ∈ S)and thegene4, ate, z is lost, and either x ∈ F and y ∈ S or if x, y ∈ F which is between 3 and 5 in the duplicated intermediate, (respectively x, y ∈ S)then z ∈ S (respectively z ∈ F). is lost. If, for example, condition x = 1and y = 2isused, ForsuchaTDRL theintergenicregionbetween x and y then EqualTDRL provides a figure that illustrates only in the genome that is represented by σ can be explained TDRL ({1, 3, 5, 7, 8}, {2, 4, 6},2) of Fig. 3 (i.e., the row of Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 6 of 10 taxon which includes all considered species) and esti- mated the history of rearrangements using the condition- based coding algorithm MacClade [23], which does not support TDRL rearrangements, and the TreeREx soft- ware [19]. For both given mitochondrial gene orders the TreeREx analysis resulted in a large TDRL rearrange- ment which transforms the gene order of Prionoglaris stygia into the gene order of Lepidopsocidae sp. by dupli- cating 34 of 38 genetic markers (37 genes plus the control region). The corresponding TDRL is illustrated in Fig. 4 for the origin cox2. Since for circular gene orders several equivalent TDRLs exist [15], we investigate the set of all possible TDRL rearrangements in the following. Since we also want to consider the intergenic sequences the mitochondrial genomes of the Trogiomorpha species Lepidopsocidae sp. and Dorypteryx domestica have been reannotated with an extended version of MITOS (unpublished http://mitos2. bioinf.uni-leipzig.de)[24]. In order to examine whether an Fig. 3 Output created by EqualTDRL: all TDRL rearrangements that intergenic region contains remnants of genes caused by transform [ (12345678)] into [ (12463578)] .Eachrow ∼ ∼ an incomplete gene loss, the following data analysis was illustrates the TDRL rearrangements (F, S, p),where p is the origin executed for Lepidopsocidae sp. and Dorypteryx domes- (y-axis) and a gene (x-axis) is in the set F (respectively S)ifthe tica. For every tRNA and ribosomal gene (respectively corresponding circle is filled with white colour (respectively black protein-coding gene) the Covariance Models (respectively colour). A square surrounding a circle shows that the corresponding gene is part of the duplicated sequence of the corresponding partial Hidden Markov Models) that are used in MITOS were duplication TDRL applied i) to search in every intergenic region for every gene sequence and ii) if a gene (or a remnant of a gene) has been found to (locally) align the gene sequence to the corresponding intergenic region. The analysis was carried out for the tRNA and ribosomal genes with CMsearch Fig. 3 with origin 2). This holds since 1 ∈ F,2 ∈ S,and 3is and CMalign from the Infernal 1.1rc4 software package between 1 and 2 in π . For all other origins p ∈[1 : 8] \{2} [25] and for protein-coding genes with HMMsearch and holds that 1, 2 ∈ F (respectively 1, 2 ∈ S)and theredoes HMMalign from the HMMER 3.1b1 software package [26]. not exist an element z between 1 and 2 in the duplicated Moreover, for both mitochondrial gene orders the con- intermediate that is lost. served sequences of genes (i.e., the maximal – with respect to inclusion – sequences of genes that occur in both gene Results and discussion orders) were calculated. Experiment Then EqualTDRL was used to determine TDRLs that In this section we show on the basis of mitochondrial transform the mitochondrial gene order of Prionoglaris gene orders how EqualTDRL can be used to find a stygia into the mitochondrial gene order of Lepidopso- plausible TDRL rearrangement. Specifically, the follow- ing two mitochondrial gene orders are considered: i) the cidae sp. under four different objectives: A) to show all gene order of Prionoglaris stygia (Genbank, accession: possible TDRLs, B) to show all TDRLs that do not break MG255141.1), which represents the ancestral mitochon- any conserved sequence of genes, C) to highlight a sub- drial gene order of the Pancrustacea [20], and ii) the set of the TDRLs of (B) that provide additional evidence mitochondrial gene order of the Trogiomorpha species for the origin of intergenic regions outside of conserved Lepidopsocidae sp. (RefSeq, accession: NC_004816.1) and sequences by an incomplete gene loss, and D) to find all Dorypteryx domestica (Genbank, accession: MG255136.1) TDRLs that minimize the number of duplicated genes. that has been published at the NCBI RefSeq release 84 [21] and the NCBI Genbank release 221 [22]. Note that Results the mitochondrial genomes of Lepidopsocidae sp. and The gene orders which result from the annotation that has Dorypteryx domestica comprise the same gene order. Both been done with MITOS for the mitochondrial genomes gene orders have recently been discussed in [20]. In this of Prionoglaris stygia, Lepidopsocidae sp.,and Dorypteryx domestica are equal to the gene orders that have been publication the authors studied the phylogenetic relation- discussed in [20]. ships between different barklice species of Psocoptera (a Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 7 of 10 a b Fig. 4 a Output created by EqualTDRL: complete set of TDRLs that rearrange the mitochondrial gene order of Prionoglaris stygia into the mitochondrial gene order of Lepidopsocidae sp. Protein-coding and ribosomal genes are denoted by their names and one capital letter indicates the amino acid for the tRNAs. b A star denotes a TDRL that does not break a conserved gene sequence. A pentagon highlights the TDRL that has been presented in [20]. A diamond highlights the TDRL which duplicates the minimum number of genes (the corresponding minimum TDRL is illustrated in Fig. 5) A linear representation of the mitochondrial gene In this less restrictive annotation the putative intergenic orders of Prionoglaris stygia and Lepidopsocidae sp. region is completely assigned to nad2.The mitochondrial (respectively Dorypteryx domestica) that are used in this genome of Dorypteryx domestica contains 4 intergenic studyisshown in Fig. 5. It can be seen that both gene regions with a length of at least 10 base pairs: between orders contain the following conserved sequences: i) trnC trnQ and nad2 (50 base pairs), trnC and trnY (31 base trnY cox1 trnL2, ii) trnK trnD atp8 atp6 cox3, iii) trnG pairs), trnN and trnF (25 base pairs), and cob and nad1 (21 nad3 trnA,iv) trnS1 trnE,v) trnF nad5 trnH nad4 nad4l base pairs). Note that the intergenic region between trnC trnT trnP nad6 cob,and vi) nad1 trnL1 rrnL trnV rrnS and trnY is contained in a conserved interval. Since Lep- CR,where CR denotes the control region. idopsocidae sp. and Dorypteryx domestica have the same The analysis with MITOS shows that the genome of gene order, which is different to the ancestral Pancrus- Lepidopsocidae sp. contains 10 intergenic regions with a tacea gene order, we assume that the TDRL studied in length of at least 10 base pairs, whereby only the three the following has happened before the speciation of the regions between trnQ and nad2 (79 base pairs), nad2 Trogiomorpha species. Therefore, we consider only inter- and trnC (105 base pairs), and cob and nad1 (58 base genic regions that occur in both mitochondrial genomes pairs) are not contained in a conserved sequence. It is of Lepidopsocidae sp. and Dorypteryx domestica.Thusin worth to mention that the intergenic region between nad2 the analysis that follows we consider the intergenic regions and trnC cannot be found in a less restrictive annotation between trnQ and nad2, and between cob and nad1.These with MITOS (when E-value exponent of 1 is used instead intergenic regions are particularly interesting for further of the default value 2 for BLAST [27]and CMsearch). analysis since the presence of a gene remnant in those Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 8 of 10 Fig. 5 TDRL with a minimum number of duplicated genes that rearranges the mitochondrial gene order of Prionoglaris stygia (top) via the gene order after the duplication (middle) into the gene order of Lepidopsocidae sp. (bottom). Sequences that are bounded by a thick black line are conserved in both gene orders. Horizontal dots indicate the circularity of the gene orders. Intergenic regions of Lepidopsocidae sp. are indicated by gray squares. The sequence of genes that is rearranged is framed by a dashed square. Gene abbreviations and notation as in Fig. 4 intergenic regions can be used to identify the subset of this TDRL also preserves all conserved gene sequences TDRLs that can explain the presence of the remnants as a and provides evidence for the presence of the intergenic result of an incomplete gene loss of a TDRL. regions regions between trnQ and nad2,and cob and nad1 Figure 4a was generated with EqualTDRL and shows as a result of an incomplete gene loss of the genes trnM all equivalent TDRLs that transform the gene order of Pri- and trnS2, respectively. While the origin of the former onoglaris stygia into the gene order of Lepidopsocidae sp. intergenic region by the loss of trnM is supported weakly It also shows the set of duplicated genes for every TDRL. by our analysis, the origin of the latter intergenic region In Fig. 4b symbols were (manually) added to the right of has no support from our sequence/structural similarity a TDRL if and only if the TDRL exhibits specific char- based analysis. The corresponding TDRL is illustrated in acteristics: those that do not break any of the conserved Fig. 5. sequences of genes and those that supports conjectures Altogether, the scenario presented in [20] partially that can explain the presence of the intergenic regions as agrees with the results that are presented in this arti- a result of an incomplete gene loss. These indications can cle. However, due to the circularity of the mitochondrial be used to further search for gene remnants. Interestingly, gene orders it is important to consider the complete set it canbeseeninFig. 4 that the TDRL which has been of TDRL rearrangements that can explain the gene order presented in [20] preserves all conserved sequences of of Lepidopsocidae sp. When it is considered to be rel- genes and it also explains the presence of both considered evant that a TDRL duplicates only a minimum number intergenic regions. of genes, the results computed with EqualTDRL weaken The HMMER and Infernal software package were used the support for the TDRL rearrangement that has been to identify potential remnants of genes. The experiments presented in [20] in favor of the TDRL rearrangement showed that trnM and trnS1 can be found in the inter- that is shown in Fig. 5. Moreover, the TDRL presented in genic region of Lepidopsocidae sp. between trnQ and Fig. 5 gets additional support by the detection of a puta- nad2 with an E-value of 0.005 and 0.016, respectively. In tive remnant of the trnM gene in the intergenic region of addition several protein-coding genes hit the intergenic Lepidopsocidae sp. between trnQ and nad2. region between trnQ and nad2. However, the E-values of these hits are larger than 0.2 and therefore less reliable. Conclusion Interestingly, no hit with an E-value smaller than 1 was In this articlewehavepresented thetool EqualTDRL, found for both intergenic regions of Dorypteryx domes- which illustrates all equivalent TDRLs for a pair of gene tica and the intergenic region between cob and nad1 of orders that differ by one TDRL. EqualTDRL considers Lepidopsocidae sp. A table that summarizes all hits in the the circularity of genomes and helps to study their dif- intergenic regions, its E-values, and the corresponding ferences in consideration of all TDRL predictions that alignments can be found in the Additional file 1: Table S3 possibly could have taken place. Thereby, it helps to iden- and Figures S1–S10. tify TDRLs that satisfy different biological constraints. For Figure 4 shows that there exists a TDRL which dupli- example, a requirement might be that the TDRL dupli- cates only 29 of 38 genetic markers, which are 5 less than cates only a minimum number of genes or that the TDRL the TDRL that has been presented in [20]. Interestingly, allows to explain the presence of intergenic regions. In Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 9 of 10 addition, EqualTDRL supports scientists to determine References 1. Fertin G, Labarre A, Rusu I, Tannier E, Vialette S. Combinatorics of whether the gene loss of a TDRL is random or might Genome Rearrangements. Cambridge: MIT Press; 2009. dependent on gene orientation or transcript structure. 2. Inoue JG, Miya M, Tsukamoto K, Nishida M. Evolution of the deep-sea It has been shown for two example mitochondrial gene gulper eel mitochondrial genomes: Large-scale gene rearrangements originated within the eels. Mol Biol Evol. 2003;20(11):1917–24. orders how EqualTDRL can be used to identify more 3. Lavrov DV, Boore JL, Brown WM. Complete mtDNA sequences of two plausible TDRLs that possibly could have taken place. millipedes suggest a new model for mitochondrial gene rearrangements: duplication and nonrandom loss. Mol Biol Evol. 2002;19(2):163–9. 4. Boore JL. The duplication/random loss model for gene rearrangement Availability and requirements exemplified by mitochondrial genomes of deuterostome animals. In: Project name: EqualTDRL Comparative Genomics: Empirical and Analytical Approaches to Gene Project home page: http://pacosy.informatik.uni-leipzig. Order Dynamics, Map Alignment and the Evolution of Gene Families. Dordrecht: Springer; 2000. p. 133–47. de/equaltdrl 5. Bernt M, Braband A, Schierwater B, Stadler PF. Genetic aspects of Operating system(s): Linux distribution mitochondrial genome evolution. Mol Phyl Evol. 2012;69:328–38. Programming language: C++,R 6. San Mauro D, Gower DJ, Zardoya R, Wilkinson M. A hotspot of gene order rearrangement by tandem duplication and random loss in the Other requirements: ggplot2 package of R vertebrate mitochondrial genome. Mol Biol Evol. 2005;23(1):227–34. License: none 7. Chaudhuri K, Chen K, Mihaescu R, Rao S. On the tandem Any restrictions to use by non-academics: none duplication-random loss model of genome rearrangement. In: Proc. 17th Ann. ACM-SIAM Symp. Discrete Algorithm (SODA ’06). Philadelphia: Society for Industrial and Applied Mathematics; 2006. p. 564–70. Additional file 8. Yancopoulos S, Attie O, Friedberg R. Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics. 2005;21(16):3340–6. Additional file 1: Supplement. Additional file that contains the MITOS2 9. Bergeron A, Mixtacki J, Stoye J. A unifying view of genome annotations and alignments of the mitochondrial genomes analyzed in the rearrangements. In: Proc. 6th Int’l Workshop Algorithms in Bioinformatics current study. (PDF 222 kb) (WABI ’06). LNCS, vol. 4175. Berlin: Springer; 2006. p. 163–73. 10. Meidanis J, Walter M, Dias Z. Reversal distance of signed circular chromosomes. Technical report IC-00-23. 2000. Abbreviations 11. Bergeron A, Chauve C, Hartman T, St-Onge K. On the properties of CR: Control region; TDRL: Tandem duplication random loss sequences of reversals that sort a signed permutation. In: Proceedings of JOBIM, vol. 2; 2002. p. 99–108. 12. Siepel AC. An algorithm to enumerate sorting reversals for signed Funding permutations. J Comput Biol. 2003;10(3-4):575–97. TH was funded by a PhD student fellowship from the Leipzig University. We 13. Braga MD, Sagot M-F, Scornavacca C, Tannier E. The solution space of acknowledge support from the German Research Foundation (DFG) and sorting by reversals. Proc. 3rd Int’l Symp. Bioinforma Res Appl (ISBRA ’07). Leipzig University within the program of Open Access Publishing. The funding 2007;4463:293–304. body did not play any roles in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. 14. Braga MDV, Sagot M-F, Scornavacca C, Tannier E. Exploring the solution space of sorting by reversals, with experiments and an application to evolution. IEEE ACM T Comput Bi. 2008;5(3):348–56. Availability of data and materials 15. Hartmann T, Chu A-C, Middendorf M, Bernt M. Combinatorics of tandem The dataset analyzed in the current study is available in the NCBI database [21]. duplication random loss mutations on circular genomes. IEEE ACM T The code of EqualTDRL and examples can be found at http://pacosy.informatik. Comput Bi. 2016;15(1):83–95. uni-leipzig.de/equaltdrl. The remainder of the data generated during this study 16. Braga MD, Stoye J. The solution space of sorting by DCJ. J Comput Biol. is included in this published article and its supplementary information file. 2010;17(9):1145–65. 17. Bernt M, Merkle D, Ramsch K, Fritzsch G, Perseke M, Bernhard D, Authors’ contributions Schlegel M, Stadler P, Middendorf M. CREx: inferring genomic TH and MB wrote and tested the EqualTDRL code. MM made substantive rearrangements based on common intervals. Bioinformatics. 2007;23(21): intellectual contributions to this paper. TH wrote the manuscript. All authors 2957–8. have read and approved the final manuscript. 18. Hartmann T, Bernt M, Middendorf M. An Exact Algorithm for Sorting by Weighted Preserving Genome Rearrangements. IEEE ACM T Comput Bi. Ethics approval and consent to participate 2018. In press. Not applicable. 19. Bernt M, Merkle D, Middendorf M. An algorithm for inferring mitogenome rearrangements in a phylogenetic tree. In: Proc. 6th Int’l Competing interests Workshop Comparative Genomics (RECOMB-CG ’08). Berlin: Springer; The authors declare that they have no competing interests. 2008. p. 143–57. 20. Yoshizawa K, Johnson KP, Sweet AD, Yao I, Ferreira RL, Cameron SL. Publisher’s Note Mitochondrial phylogenomics and genome rearrangements in the Springer Nature remains neutral with regard to jurisdictional claims in barklice (insecta: Psocodea). Mol Phyl Evol. 2017. published maps and institutional affiliations. 21. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): A curated non-redundant sequence database of genomes, transcripts Author details and proteins. Nucleic Acids Res. 2007;35(Database issue):61–5. Swarm Intelligence and Complex Systems Group, Faculty of Mathematics 22. Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, and Computer Science, Leipzig University, Augustusplatz 10, D-04109 Leipzig, Ostell J, Sayers EW. Genbank. Nucleic Acids Res. 2012;41(D1):36–42. Germany. Helmholtz Centre for Environmental Research - UFZ, 23. Maddison WP, Maddison DR. MacClade: analysis of phylogeny and Permoserstraße 15, D-04318 Leipzig, Germany. character evolution. version 3.0. Sunderland: Sinauer Associates; 1992. 24. Bernt M, Donath A, Jühling F, Externbrink F, Florentz C, Fritzsch G, Received: 22 January 2018 Accepted: 30 April 2018 Pütz J, Middendorf M, Stadler PF. MITOS: Improved de novo metazoan mitochondrial genome annotation. Mol Phyl Evol. 2013;69(2):313–9. Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 10 of 10 25. Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013;29(22):2933–5. 26. Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39(suppl_2):29–37. 27. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402. Submit your next manuscript to BioMed Central and we will help you at every step: • We accept pre-submission inquiries � Our selector tool helps you to find the most relevant journal � We provide round the clock customer support � Convenient online submission � Thorough peer review � Inclusion in PubMed and all major indexing services � Maximum visibility for your research Submit your manuscript at www.biomedcentral.com/submit http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BMC Bioinformatics Springer Journals

EqualTDRL: illustrating equivalent tandem duplication random loss rearrangements

Free
10 pages

Loading next page...
 
/lp/springer_journal/equaltdrl-illustrating-equivalent-tandem-duplication-random-loss-qTHPEGUEyB
Publisher
Springer Journals
Copyright
Copyright © 2018 by The Author(s)
Subject
Life Sciences; Bioinformatics; Microarrays; Computational Biology/Bioinformatics; Computer Appl. in Life Sciences; Algorithms
eISSN
1471-2105
D.O.I.
10.1186/s12859-018-2170-x
Publisher site
See Article on Publisher Site

Abstract

Background: To study the differences between two unichromosomal circular genomes, e.g., mitochondrial genomes, under the tandem duplication random loss (TDRL) rearrangement it is important to consider the whole set of potential TDRL rearrangement events that could have taken place. The reason is that for two given circular gene orders there can exist different TDRL rearrangements that transform one of the gene orders into the other. Hence, a TDRL event cannot always be reconstructed only from the knowledge of the circular gene order before a TDRL event and the circular gene order after it. Results: We present the program EqualTDRL that computes and illustrates the complete set of TDRLs for pairs of circular gene orders that differ by only one TDRL. EqualTDRL considers the circularity of the given genomes and certain restrictions on the TDRL rearrangements. Examples for the latter are sequences of genes that have to be conserved during a TDRL or pairs of genes that frame intergenic regions which might represent remnants of duplicated genes. Additionally, EqualTDRL allows to determine the set of TDRLs that are minimum with respect to the number of duplicated genes. Conclusion: EqualTDRL supports scientists to study the complete set of TDRLs that possibly could have taken place in the evolution of mitochondrial genomes. EqualTDRL is implemented in C++ using the ggplot2 package of the open source programming language R and is freely available from http://pacosy.informatik.uni-leipzig.de/equaltdrl. Keywords: Circular permutation, Gene order, Genome rearrangement, Mitochondria, Tandem duplication random loss Background to determine the distance between two genomes, i.e., the The genetic information of species is stored in DNA (or smallest number of rearrangements (of certain types) that RNA) molecules. These molecules are called chromo- are needed to transform one gene order into the other. The somes and can either be linear or circular.The setof sorting problem asks forashortest sequenceofrearrange- all these molecules of a species forms its genome.The ments for such a transformation. Such a sequence is called genome consists of genes which are DNA segments with a shortest scenario. In case that costs can be assigned to certain functions. Mutations can modify the arrangement different types of rearrangements the distance problem or the multiplicity of the genes within the genome. Such (the sorting problem) asks for the minimum total cost mutations are called rearrangements. The research field for the transformation from one gene order to the other of genome rearrangement analysis tries to explain the dif- (respectively, for a corresponding minimum cost sequence ferences between two genomes that are represented by of rearrangements). the arrangement of their genes, in order to infer phylo- One important type of rearrangements is the tandem genetic information. The following two central problems duplication random loss (TDRL) rearrangement. A TDRL exist in this research field [1]. The distance problem aims consists of a tandem duplication of a contiguous set of genes followed by a random loss of one copy of each duplicated gene. TDRLs have occurred several times in *Correspondence: thartmann@informatik.uni-leipzig.de the evolution of mitochondrial genomes, e.g., in gulper Swarm Intelligence and Complex Systems Group, Faculty of Mathematics and Computer Science, Leipzig University, Augustusplatz 10, D-04109 Leipzig, eels [2] and in millipedes [3]. Several mechanisms that Germany explain a TDRL have been discussed in the biological Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 2 of 10 literature, e.g., slipped strand mispairing during replica- Software tools that regard TDRL rearrangements for tion or imprecise termination [4]. It has been shown that the construction of evolutionary scenarios are CREx [17], TDRLs are a major factor of gene order evolution for CREx2 [18], and TreeREx [19]. However, these tools mitochondrial genomes [4–6]. present only a single shortest scenario in the case that a The TDRL rearrangement was initially studied formally TDRL is present in a shortest scenario. Hence, when a sce- for linear genomes in [7]. The cost of a TDRL rearrange- nario contains a TDRL, the tools do not show all possible ment is defined as α ,where α ≥ 1 is a parameter and k ∈ alternative TDRLs. Therefore, it is important to further N is thenumberofgenes thatareinfluenced by theTDRL. analyze the solutions that are generated and the software For the cases α = 1and α ≥ 2 polynomial time algo- EqualTDRL that we presentinthispaper is designed to rithms that solve the sorting problem (and therefore the support scientists to do such an analysis. distance problem) have been presented in [7]. In addition, EqualTDRL uses results from [15] to consider the cir- it has been shown for α = 1, that it is sufficient to consider cularity of the given gene orders. It provides figures that TDRLs that duplicate the whole genome, because the cost illustrate all equivalent TDRL operations for two given of every TDRL is the same in this case. Here we consider gene orders that differ by only a single TDRL. Therefore, the case α = 1 as well. Note that this definition of a TDRL EqualTDRL supports scientists to study the whole set of considers explicitly only whole genome duplications but TDRLs that possibly could have taken place. This is bene- implicitly also partial genome duplications, as explained ficial in several aspects of inferring a most reliable TDRL. in detail in “Implementation”section. For example: i) for identifying whether the gene loss of a The authors of [3] studied an alternative version of TDRL is random or is dependent on gene orientation or TDRL rearrangement where genes with the same orienta- transcript structure, ii) for finding a (partial duplication) tion or genes that belong to the same transcript are lost TDRL that duplicates only a minimum number of genes, jointly, i.e., the loss is not completely random but depends iii) to determine a subset of TDRLs that satisfy certain on gene orientation or transcript structure. conditions, e.g., to preserve certain sequences of genes In order to reconstruct genome rearrangements reliably that cannot be broken during a TDRL, and iv) for identi- it is important to consider the possibility of alternative fying the positions of potential duplication remnants for a shortest rearrangement scenarios and equivalent rear- further analyses of the nucleotide sequence. rangements, i.e., rearrangements that when applied to This article is organized as follows. In the next “Imple- the same gene order lead also to the same resulting gene mentation” section a formal background is given on cir- order. This has been studied for the inversion, the trans- cular gene orders and TDRL rearrangements. Further, position and the double cut and join (DCJ) rearrangement an overview on EqualTDRL is presented. In “Results [8, 9]. Equivalent inversions for signed circular permu- and discussion” section the benefits of EqualTDRL tations, which represent circular genomes, have been are shown for a biological example of mitochondrial discussed in [10] and all shortest scenarios for the sorting gene orders. The article ends with a conclusion in of signed permutations have been studied in depth in “Conclusion”section. [11–14]. Since every transposition can be represented by a TDRL, the set of equivalent transpositions has been Implementation studied in [15]. The set of all shortest scenarios of the Methods problem to sort a multichromosomal genome by DCJ In this article it is assumed that the genes in a genome rearrangements has been investigated in [16]. In this (before and after a TDRL) are not duplicated and that work the authors also determined the exact number of genes do not overlap. Therefore, a gene order of a (cir- optimal shortest scenarios for a particular set of problem cular) genome can be represented by a (circular) permu- instances. tation. A permutation π of length n, denoted by π = An analysis of TDRL rearrangements on circular (π(1) ... π(n)),isabijection π:[1 : n] →[1 : n]. genomes has been presented in [15]. It has been shown The set of all permutations of length n is denoted by that the circularity of the genomes should be considered, S .The shift operation φ : S → S is defined by π = n n n since the TDRL distance for an unfavorable choice of lin- (π(1) ... π(n)) → (π(2) ... π(n)π(1)) and for k ∈ N >0 k k k−1 1 ear representatives (of the circular genomes) may lead to the k-shift is φ ◦ π,where φ := φ ◦ φ and φ := φ. an overestimation of the distance. In addition, it has been Note that with f ◦g the composition of two functions f and shown that it is not always possible to uniquely recon- g is denoted, i.e., (f ◦ g)(x) := f (g(x)).With ∼ we denote struct a TDRL only from the knowledge of the two circular the equivalence relation on S ,where π, π ∈ S are n n gene orders before and after the application of the TDRL equivalent, denoted by π ∼ π , if and only if there exists because there exist several TDRLs that can explain the an m ∈[1 : n]with φ (π ) = π . The equivalence class change from one circular genome to the other. π :=[ π] of ∼ on S is called a circular permutation of ∼ n length n and the set of all circular permutations of length Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 3 of 10 n is denoted by S . In other words, a circular permutation sets of genes that are kept (i.e., they are not deleted) in the ◦ ◦ π ∈ S is the set of all permutations that are equivalent first and second copy of π in the duplicated intermedi- ◦ n−1 with respect to ∼, i.e., π ={π, ρ(π), ... , ρ (π )} or, ate, respectively. Here “first” and “second” are defined with less formally, a circular permutation is the set of all per- respect to π . The elements that are deleted in the dupli- mutations that become equivalent when the first and cated intermediate are called lost elements. The effect of a ◦ ◦ ◦ ◦ the last element of a permutation are considered to be TDRL τ = (F, S, p) on π is defined as τ ◦π :=[ τ ◦π ] , p ∼ adjacent. Figure 1a shows an illustration of a circular per- where τ ◦ π is a rearranged permutation of π such that p p ◦ ◦ mutation. Each π ∈ π is called a representative of π and the elements of F are moved in front of the elements of S the representative which starts with element p ∈[1 : n] and the relative order of all elements of F (respectively S) is denoted by π . Circular permutations are used as a for- of π is unchanged. More precisely, τ ◦ π is the (linear) p p p mal model for unichromosomal circular genomes, e.g., permutation for which it holds that 1) if i ∈ F, j ∈ S,then mitochondrial genomes, in which each element repre- i is to the left of j in τ ◦ π and 2) if i, j ∈ F or i, j ∈ S,then sents a gene and each representative stands for a possible i is to the left of j in τ ◦ π if and only if i is to the left of j linearization of the considered genome. in π . Note that by this definition, a TDRL (F, S, p), which ◦ ◦ ◦ ◦ ATDRL τ : S → S is a bijection that is denoted by maps a set π of permutations to another set of permuta- n n atriple (F, S, p),where (F, S) is a bipartition of [ 1 : n], tions, can be visualized by reordering the elements of the i.e., F, S ⊂[1 : n], F ∩ S =∅,and F ∪ S =[1 : n], and representative π according to the sets F and S,see Fig. 1b p ∈[1 : n]. Element p is called the origin of (F, S, p) and for an example. For the combinatorics of TDRLs it is irrel- it denotes the position where the whole genome duplica- evant whether F or S is the set of genes that are preserved tion of π starts resulting in a duplicated intermediate. in the original part of the duplicated intermediate, since The sets F and S are a bipartition of [ 1 : n] and denote the both applications result in the same circular permutation ab Fig. 1 Application of TDRL ({4, 5}, {1, 2, 3},1) to the circular permutation [ (14253)] . The resulting circular permutation is [ (45123)] . The circular ∼ ∼ permutation [ (14253)] (respectively [ (45123)] )isrepresented in (a) by a circular illustrations on the top (respectively bottom) which gives ∼ ∼ the corresponding representatives when read in clockwise direction. Whereas (a) shows the application of the TDRL by using circular illustrations, (b) illustrates the same application by applying a tandem duplication to the representative π ∈[ (14253)] followed by the subsequent loss of 1 ∼ one copy of every duplicated gene. In particular, the elements of F ={4, 5} are kept in the first copy (illustrated by a bright gray) and the elements of S ={1, 2, 3} are kept in the second copy (illustrated by a dark gray) of the duplicated intermediate. Elements that are lost during this process are crossed out. Permutation τ ◦ π = (45123) is a representative of the resulting circular permutation, namely [ τ ◦ π ] =[ (45123)] 1 1 ∼ ∼ Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 4 of 10 ◦ ◦ ◦ ◦ [15], i.e., (F, S, p)◦π = (S, F, p)◦π . Note that the strand- that transform ι into π : ({3, 5, 7, 8}, {1, 2, 4, 6},1), edness of a gene (also called its orientation) is not relevant ({1, 3, 5, 7, 8}, {2, 4, 6},2), ({4, 6},{1, 2, 3, 5, 7, 8},3), ({1, 2, 5, 7, 8}, for this paper since a TDRL does not change the strand- {3, 4, 6},4), ({3, 6}, {1, 2, 4, 5, 7, 8},5), ({1, 2, 4, 7, 8}, {3, 5, 6},6), edness of genes. This is also the reason why we represent ({3, 5}, {1, 2, 4, 6, 7, 8},7), ({3, 5, 7}, {1, 2, 4, 6, 8},8),and a gene order as an (unsigned) permutation. all TDRLs that can be obtained from the listed TDRLs Two TDRLs are called equivalent if their application by interchanging the sets F and S.RecallthatTDRL to the same (circular) permutation results in the same ({3, 5}, {1, 2, 4, 6, 7, 8},7) is illustrated in Fig. 2. (circular) permutation. It was shown that the number of Moreover, EqualTDRL can illustrate all partial TDRLs equivalent TDRLs is 2n if the TDRLs are identity maps in addition to the whole genome TDRLs. This is shown in and 2n otherwise, where n is the length of permutation Fig. 3 where a black square surrounds the circle of a gene ◦ ◦ ◦ ◦ ◦ π [15]. The TDRL distance of π and σ is d(π , σ ) = if and only if the gene is part of the duplicated sequence of ◦ ◦ ◦ ◦ ◦ ◦ min({d ∈ N |∃ TDRLs τ , ... , τ : τ ◦...◦τ ◦π = σ }). a partial duplication TDRL. Note that for partial TDRLs 1 1 d d The definitions related to TDRL rearrangements and it is not important whether genes that are excluded from circular permutations are exemplified for the circular per- the duplication are considered to be in F or in S since they mutation π =[ (14253)] .See Fig. 1 for an illustration. are not duplicated and therefore no copy of these genes is The representatives of π are π = (14253), π = lost. As an example, consider the origin 3 in Fig. 3.The 1 2 (25314), π = (31425), π = (42531),and figure shows that TDRL ({4, 6}, {1, 2, 3, 5, 7, 8},3), which 3 4 π = (53142), i.e., π ={π , ... , π }.Itholds that π = considers the whole permutation to be duplicated, can be 5 1 5 2 2 2 2 2 φ (π ), π = φ (π ), π = φ (π ), π = φ (π ),and π = replaced by a partial duplication TDRL that only dupli- 1 3 2 4 3 5 4 1 2 ◦ φ (π ). The application of TDRL τ = ({4, 5}, {1, 2, 3},1) cates the sequence 3 4 5 6 and contains the genes 4, 6 in to π gives [ (45123)] . Since genes 1 and 2 are in S the – with respect to the origin – first copy of the dupli- and 1 is to the left of 2 in π ,itholdsthat 1istothe cated intermediate and genes 3, 5 in the second copy of the left of 2 in (45123).Also, since5 ∈ F and 3 ∈ S,it duplicated intermediate. See Fig. 2 for an illustration of holds that 3 is to the right of 5 in (45123).Figure 1 the specified partial duplication TDRL and a correspond- illustrates the application of τ to [ (14253)] .Since ing TDRL that duplicates the whole permutation. ◦ ◦ ({3, 4}, {1, 2, 5},5) ◦ π = ({4, 5}, {1, 2, 3},1) ◦ π it holds Finally, EqualTDRL is also able to illustrate only those that TDRLs ({3, 4}, {1, 2, 5},5) and ({4, 5}, {1, 2, 3},1) are TDRLs that satisfy the following types of conditions which equivalent. can be given by the user: i) specific sets of sequences of To see that the formal model also covers partial duplica- genes that are conserved by a TDRL and ii) intergenic tion TDRLs (i.e., TDRLs were not all genes are duplicated) regions that are framed by specific pairs of genes. Both consider a partial duplication TDRL that transforms π conditions and the type of input that they require are into σ . For every element e ∈[1 : n] it holds that either explained in the following. e ∈ F (i.e., e is kept in the first copy), e ∈ S (i.e., e is kept Forthe firsttypeofcondition theuserhas to specify in the second copy), or e ∈ N (i.e., e is not duplicated). the corresponding sets of genes. Then, EqualTDRL pro- Then the TDRL (F, S, p),where F = S , S = N ∪ F ,and ceeds as explained in the following. Consider a set of genes ◦ ◦ p being the unique non-duplicated element adjacent to G ⊂[1 : n], two circular permutations π and σ such that ◦ ◦ ◦ ◦ ◦ one element of the second copy in the partially duplicated d(π , σ ) = 1, and a TDRL τ = (F, S, p) with τ ◦ π = intermediate, gives the same circular output permutation σ .When G is used as a condition EqualTDRL only con- (see Fig. 2 for an example). Consequently, the same rear- siders equivalent TDRLs (F , S , p ) of τ such that either rangement can be achieved by a TDRL that duplicates all G ⊆ F or G ⊆ S . Therefore, if there exist such an equiva- elements [15]. lent TDRL all genes of G are lost in the same copy and the loss of this TDRL depends on G. In the case that set G con- EqualTDRL tains exactly all genes that belong to the same transcript The software tool EqualTDRL calculates for two the first type of condition is used to determine whether ◦ ◦ circular permutations π and σ that represent two or not the loss of a TDRL depends on a given transcript circular unichromosomal genomes the distances structure. Note that EqualTDRL can also use multiple ◦ ◦ ◦ ◦ ◦ ◦ d(π , σ ) and d(σ , π ).Inthe case that d(π , σ ) = 1 gene sets. For an example consider Fig. 3 that illustrates all ◦ ◦ ◦ (respectively d(σ , π ) = 1) EqualTDRL produces equivalent TDRLs that transform ι =[ (12345678)] an illustration which shows all equivalent TDRL rear- into π =[ (12463578)] . Let 1 2 and 7 8 be a sequence ◦ ◦ ◦ rangements, i.e., all triples (F, S, p) that transform π of genes (of π and ι ) that shall be conserved by a TDRL, into σ or vice versa. An example of such an illustra- hence G ={1, 2} and G ={7, 8}. If these conditions are 1 2 specified, then EqualTDRL provides an illustration sim- tionis giveninFig. 3 for the circular permutations ◦ ◦ ι =[ (12345678)] and π =[ (12463578)] with ilar to Fig. 3 but with the difference that it does not show ∼ ∼ ◦ ◦ d(ι , π ) = 1. Figure 3 illustrates all equivalent TDRLs the TDRLs for origin 2 and 8, i.e., ({1, 3, 5, 7, 8}, {2, 4, 6},2) Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 5 of 10 Fig. 2 Partial duplication TDRL and a corresponding TDRL that achieves the same circular output permutation [ (12463578)] .Notationisasin Fig. 1. The left-hand side shows a partial tandem duplication of the sequence 3 4 5 6 of [ (12345678)] followed by a subsequent loss of the elements 3 and 5 in the first copy, and 4 and 6 in the second copy, i.e., F ={4, 6}, S ={3, 5}, and the elements of N ={1, 2, 7, 8} are not duplicated. The same rearrangement can be achieved by the TDRL (F, S, p),where F = S , S = N ∪ F ,and p = 7 is the unique non-duplicated element adjacent to the second copy. The corresponding TDRL ({3, 5}, {1, 2, 4, 6, 7, 8},7) is illustrated on the right-hand side and ({3, 5, 7}, {1, 2, 4, 6, 8},8). These TDRLs are not illus- by an incomplete deletion of the gene (or the set of genes) trated since for i ∈[ 1 : 2] neither G ⊆ F nor G ⊆ S. that were lost between x and y. Therefore, if such a TDRL i i However, for all other origins p ∈{1, 3, 4, 5, 6, 7} it holds exists EqualTDRL provides evidence that an intergenic that G and G are subsets of either F or S. region might be a remnant of a gene (or the set of genes) 1 2 The second type of condition requires to find all possible that is formed by an incomplete deletion of the same. TDRLs that allow to deduce intergenic regions between For an example consider Fig. 3 that illustrates all equiva- pairs of genes that are given to EqualTDRL as an input lent TDRLs that transform ι =[ (12345678)] into from the user. To see this, consider two circular permuta- π =[ (12463578)] . Assume that an intergenic region ◦ ◦ ◦ ◦ ◦ ◦ tions π and σ ,aTDRL τ = (F, S, p) such that τ ◦ π = is between genes 3 and 5 in π and that one is interested σ , and a pair of two distinct genes x, y ∈[1, n] that is given to know which TDRLs (and which corresponding gene by the user. Such a pair of genes should frame an inter- losses)appliedto ι could possibly result into this arrange- genic region in the genome that is represented by σ .Then ment. Hence x = 3and y = 5 are chosen. If this condition EqualTDRL considers only equivalent TDRLs (F , S , p ) is given, EqualTDRL would produce Fig. 3.Hence,all of τ where at least one gene of the duplicated intermedi- illustrated TDRLs allow to deduce the intergenic region ate that is with respect to π between x and y is deleted, between 3 and 5. This holds because in all TDRLs of Fig. 3 i.e., there exists at least one gene z ∈[1 : n] \{x, y} such either 3 ∈ F and 5 ∈ S (e.g., origin 5), 5 ∈ F and 3 ∈ S (e.g., that z is between x and y in the duplicated intermedi- origin 4), or 3, 5 ∈ F (respectively 3, 5 ∈ S)and thegene4, ate, z is lost, and either x ∈ F and y ∈ S or if x, y ∈ F which is between 3 and 5 in the duplicated intermediate, (respectively x, y ∈ S)then z ∈ S (respectively z ∈ F). is lost. If, for example, condition x = 1and y = 2isused, ForsuchaTDRL theintergenicregionbetween x and y then EqualTDRL provides a figure that illustrates only in the genome that is represented by σ can be explained TDRL ({1, 3, 5, 7, 8}, {2, 4, 6},2) of Fig. 3 (i.e., the row of Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 6 of 10 taxon which includes all considered species) and esti- mated the history of rearrangements using the condition- based coding algorithm MacClade [23], which does not support TDRL rearrangements, and the TreeREx soft- ware [19]. For both given mitochondrial gene orders the TreeREx analysis resulted in a large TDRL rearrange- ment which transforms the gene order of Prionoglaris stygia into the gene order of Lepidopsocidae sp. by dupli- cating 34 of 38 genetic markers (37 genes plus the control region). The corresponding TDRL is illustrated in Fig. 4 for the origin cox2. Since for circular gene orders several equivalent TDRLs exist [15], we investigate the set of all possible TDRL rearrangements in the following. Since we also want to consider the intergenic sequences the mitochondrial genomes of the Trogiomorpha species Lepidopsocidae sp. and Dorypteryx domestica have been reannotated with an extended version of MITOS (unpublished http://mitos2. bioinf.uni-leipzig.de)[24]. In order to examine whether an Fig. 3 Output created by EqualTDRL: all TDRL rearrangements that intergenic region contains remnants of genes caused by transform [ (12345678)] into [ (12463578)] .Eachrow ∼ ∼ an incomplete gene loss, the following data analysis was illustrates the TDRL rearrangements (F, S, p),where p is the origin executed for Lepidopsocidae sp. and Dorypteryx domes- (y-axis) and a gene (x-axis) is in the set F (respectively S)ifthe tica. For every tRNA and ribosomal gene (respectively corresponding circle is filled with white colour (respectively black protein-coding gene) the Covariance Models (respectively colour). A square surrounding a circle shows that the corresponding gene is part of the duplicated sequence of the corresponding partial Hidden Markov Models) that are used in MITOS were duplication TDRL applied i) to search in every intergenic region for every gene sequence and ii) if a gene (or a remnant of a gene) has been found to (locally) align the gene sequence to the corresponding intergenic region. The analysis was carried out for the tRNA and ribosomal genes with CMsearch Fig. 3 with origin 2). This holds since 1 ∈ F,2 ∈ S,and 3is and CMalign from the Infernal 1.1rc4 software package between 1 and 2 in π . For all other origins p ∈[1 : 8] \{2} [25] and for protein-coding genes with HMMsearch and holds that 1, 2 ∈ F (respectively 1, 2 ∈ S)and theredoes HMMalign from the HMMER 3.1b1 software package [26]. not exist an element z between 1 and 2 in the duplicated Moreover, for both mitochondrial gene orders the con- intermediate that is lost. served sequences of genes (i.e., the maximal – with respect to inclusion – sequences of genes that occur in both gene Results and discussion orders) were calculated. Experiment Then EqualTDRL was used to determine TDRLs that In this section we show on the basis of mitochondrial transform the mitochondrial gene order of Prionoglaris gene orders how EqualTDRL can be used to find a stygia into the mitochondrial gene order of Lepidopso- plausible TDRL rearrangement. Specifically, the follow- ing two mitochondrial gene orders are considered: i) the cidae sp. under four different objectives: A) to show all gene order of Prionoglaris stygia (Genbank, accession: possible TDRLs, B) to show all TDRLs that do not break MG255141.1), which represents the ancestral mitochon- any conserved sequence of genes, C) to highlight a sub- drial gene order of the Pancrustacea [20], and ii) the set of the TDRLs of (B) that provide additional evidence mitochondrial gene order of the Trogiomorpha species for the origin of intergenic regions outside of conserved Lepidopsocidae sp. (RefSeq, accession: NC_004816.1) and sequences by an incomplete gene loss, and D) to find all Dorypteryx domestica (Genbank, accession: MG255136.1) TDRLs that minimize the number of duplicated genes. that has been published at the NCBI RefSeq release 84 [21] and the NCBI Genbank release 221 [22]. Note that Results the mitochondrial genomes of Lepidopsocidae sp. and The gene orders which result from the annotation that has Dorypteryx domestica comprise the same gene order. Both been done with MITOS for the mitochondrial genomes gene orders have recently been discussed in [20]. In this of Prionoglaris stygia, Lepidopsocidae sp.,and Dorypteryx domestica are equal to the gene orders that have been publication the authors studied the phylogenetic relation- discussed in [20]. ships between different barklice species of Psocoptera (a Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 7 of 10 a b Fig. 4 a Output created by EqualTDRL: complete set of TDRLs that rearrange the mitochondrial gene order of Prionoglaris stygia into the mitochondrial gene order of Lepidopsocidae sp. Protein-coding and ribosomal genes are denoted by their names and one capital letter indicates the amino acid for the tRNAs. b A star denotes a TDRL that does not break a conserved gene sequence. A pentagon highlights the TDRL that has been presented in [20]. A diamond highlights the TDRL which duplicates the minimum number of genes (the corresponding minimum TDRL is illustrated in Fig. 5) A linear representation of the mitochondrial gene In this less restrictive annotation the putative intergenic orders of Prionoglaris stygia and Lepidopsocidae sp. region is completely assigned to nad2.The mitochondrial (respectively Dorypteryx domestica) that are used in this genome of Dorypteryx domestica contains 4 intergenic studyisshown in Fig. 5. It can be seen that both gene regions with a length of at least 10 base pairs: between orders contain the following conserved sequences: i) trnC trnQ and nad2 (50 base pairs), trnC and trnY (31 base trnY cox1 trnL2, ii) trnK trnD atp8 atp6 cox3, iii) trnG pairs), trnN and trnF (25 base pairs), and cob and nad1 (21 nad3 trnA,iv) trnS1 trnE,v) trnF nad5 trnH nad4 nad4l base pairs). Note that the intergenic region between trnC trnT trnP nad6 cob,and vi) nad1 trnL1 rrnL trnV rrnS and trnY is contained in a conserved interval. Since Lep- CR,where CR denotes the control region. idopsocidae sp. and Dorypteryx domestica have the same The analysis with MITOS shows that the genome of gene order, which is different to the ancestral Pancrus- Lepidopsocidae sp. contains 10 intergenic regions with a tacea gene order, we assume that the TDRL studied in length of at least 10 base pairs, whereby only the three the following has happened before the speciation of the regions between trnQ and nad2 (79 base pairs), nad2 Trogiomorpha species. Therefore, we consider only inter- and trnC (105 base pairs), and cob and nad1 (58 base genic regions that occur in both mitochondrial genomes pairs) are not contained in a conserved sequence. It is of Lepidopsocidae sp. and Dorypteryx domestica.Thusin worth to mention that the intergenic region between nad2 the analysis that follows we consider the intergenic regions and trnC cannot be found in a less restrictive annotation between trnQ and nad2, and between cob and nad1.These with MITOS (when E-value exponent of 1 is used instead intergenic regions are particularly interesting for further of the default value 2 for BLAST [27]and CMsearch). analysis since the presence of a gene remnant in those Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 8 of 10 Fig. 5 TDRL with a minimum number of duplicated genes that rearranges the mitochondrial gene order of Prionoglaris stygia (top) via the gene order after the duplication (middle) into the gene order of Lepidopsocidae sp. (bottom). Sequences that are bounded by a thick black line are conserved in both gene orders. Horizontal dots indicate the circularity of the gene orders. Intergenic regions of Lepidopsocidae sp. are indicated by gray squares. The sequence of genes that is rearranged is framed by a dashed square. Gene abbreviations and notation as in Fig. 4 intergenic regions can be used to identify the subset of this TDRL also preserves all conserved gene sequences TDRLs that can explain the presence of the remnants as a and provides evidence for the presence of the intergenic result of an incomplete gene loss of a TDRL. regions regions between trnQ and nad2,and cob and nad1 Figure 4a was generated with EqualTDRL and shows as a result of an incomplete gene loss of the genes trnM all equivalent TDRLs that transform the gene order of Pri- and trnS2, respectively. While the origin of the former onoglaris stygia into the gene order of Lepidopsocidae sp. intergenic region by the loss of trnM is supported weakly It also shows the set of duplicated genes for every TDRL. by our analysis, the origin of the latter intergenic region In Fig. 4b symbols were (manually) added to the right of has no support from our sequence/structural similarity a TDRL if and only if the TDRL exhibits specific char- based analysis. The corresponding TDRL is illustrated in acteristics: those that do not break any of the conserved Fig. 5. sequences of genes and those that supports conjectures Altogether, the scenario presented in [20] partially that can explain the presence of the intergenic regions as agrees with the results that are presented in this arti- a result of an incomplete gene loss. These indications can cle. However, due to the circularity of the mitochondrial be used to further search for gene remnants. Interestingly, gene orders it is important to consider the complete set it canbeseeninFig. 4 that the TDRL which has been of TDRL rearrangements that can explain the gene order presented in [20] preserves all conserved sequences of of Lepidopsocidae sp. When it is considered to be rel- genes and it also explains the presence of both considered evant that a TDRL duplicates only a minimum number intergenic regions. of genes, the results computed with EqualTDRL weaken The HMMER and Infernal software package were used the support for the TDRL rearrangement that has been to identify potential remnants of genes. The experiments presented in [20] in favor of the TDRL rearrangement showed that trnM and trnS1 can be found in the inter- that is shown in Fig. 5. Moreover, the TDRL presented in genic region of Lepidopsocidae sp. between trnQ and Fig. 5 gets additional support by the detection of a puta- nad2 with an E-value of 0.005 and 0.016, respectively. In tive remnant of the trnM gene in the intergenic region of addition several protein-coding genes hit the intergenic Lepidopsocidae sp. between trnQ and nad2. region between trnQ and nad2. However, the E-values of these hits are larger than 0.2 and therefore less reliable. Conclusion Interestingly, no hit with an E-value smaller than 1 was In this articlewehavepresented thetool EqualTDRL, found for both intergenic regions of Dorypteryx domes- which illustrates all equivalent TDRLs for a pair of gene tica and the intergenic region between cob and nad1 of orders that differ by one TDRL. EqualTDRL considers Lepidopsocidae sp. A table that summarizes all hits in the the circularity of genomes and helps to study their dif- intergenic regions, its E-values, and the corresponding ferences in consideration of all TDRL predictions that alignments can be found in the Additional file 1: Table S3 possibly could have taken place. Thereby, it helps to iden- and Figures S1–S10. tify TDRLs that satisfy different biological constraints. For Figure 4 shows that there exists a TDRL which dupli- example, a requirement might be that the TDRL dupli- cates only 29 of 38 genetic markers, which are 5 less than cates only a minimum number of genes or that the TDRL the TDRL that has been presented in [20]. Interestingly, allows to explain the presence of intergenic regions. In Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 9 of 10 addition, EqualTDRL supports scientists to determine References 1. Fertin G, Labarre A, Rusu I, Tannier E, Vialette S. Combinatorics of whether the gene loss of a TDRL is random or might Genome Rearrangements. Cambridge: MIT Press; 2009. dependent on gene orientation or transcript structure. 2. Inoue JG, Miya M, Tsukamoto K, Nishida M. Evolution of the deep-sea It has been shown for two example mitochondrial gene gulper eel mitochondrial genomes: Large-scale gene rearrangements originated within the eels. Mol Biol Evol. 2003;20(11):1917–24. orders how EqualTDRL can be used to identify more 3. Lavrov DV, Boore JL, Brown WM. Complete mtDNA sequences of two plausible TDRLs that possibly could have taken place. millipedes suggest a new model for mitochondrial gene rearrangements: duplication and nonrandom loss. Mol Biol Evol. 2002;19(2):163–9. 4. Boore JL. The duplication/random loss model for gene rearrangement Availability and requirements exemplified by mitochondrial genomes of deuterostome animals. In: Project name: EqualTDRL Comparative Genomics: Empirical and Analytical Approaches to Gene Project home page: http://pacosy.informatik.uni-leipzig. Order Dynamics, Map Alignment and the Evolution of Gene Families. Dordrecht: Springer; 2000. p. 133–47. de/equaltdrl 5. Bernt M, Braband A, Schierwater B, Stadler PF. Genetic aspects of Operating system(s): Linux distribution mitochondrial genome evolution. Mol Phyl Evol. 2012;69:328–38. Programming language: C++,R 6. San Mauro D, Gower DJ, Zardoya R, Wilkinson M. A hotspot of gene order rearrangement by tandem duplication and random loss in the Other requirements: ggplot2 package of R vertebrate mitochondrial genome. Mol Biol Evol. 2005;23(1):227–34. License: none 7. Chaudhuri K, Chen K, Mihaescu R, Rao S. On the tandem Any restrictions to use by non-academics: none duplication-random loss model of genome rearrangement. In: Proc. 17th Ann. ACM-SIAM Symp. Discrete Algorithm (SODA ’06). Philadelphia: Society for Industrial and Applied Mathematics; 2006. p. 564–70. Additional file 8. Yancopoulos S, Attie O, Friedberg R. Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics. 2005;21(16):3340–6. Additional file 1: Supplement. Additional file that contains the MITOS2 9. Bergeron A, Mixtacki J, Stoye J. A unifying view of genome annotations and alignments of the mitochondrial genomes analyzed in the rearrangements. In: Proc. 6th Int’l Workshop Algorithms in Bioinformatics current study. (PDF 222 kb) (WABI ’06). LNCS, vol. 4175. Berlin: Springer; 2006. p. 163–73. 10. Meidanis J, Walter M, Dias Z. Reversal distance of signed circular chromosomes. Technical report IC-00-23. 2000. Abbreviations 11. Bergeron A, Chauve C, Hartman T, St-Onge K. On the properties of CR: Control region; TDRL: Tandem duplication random loss sequences of reversals that sort a signed permutation. In: Proceedings of JOBIM, vol. 2; 2002. p. 99–108. 12. Siepel AC. An algorithm to enumerate sorting reversals for signed Funding permutations. J Comput Biol. 2003;10(3-4):575–97. TH was funded by a PhD student fellowship from the Leipzig University. We 13. Braga MD, Sagot M-F, Scornavacca C, Tannier E. The solution space of acknowledge support from the German Research Foundation (DFG) and sorting by reversals. Proc. 3rd Int’l Symp. Bioinforma Res Appl (ISBRA ’07). Leipzig University within the program of Open Access Publishing. The funding 2007;4463:293–304. body did not play any roles in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. 14. Braga MDV, Sagot M-F, Scornavacca C, Tannier E. Exploring the solution space of sorting by reversals, with experiments and an application to evolution. IEEE ACM T Comput Bi. 2008;5(3):348–56. Availability of data and materials 15. Hartmann T, Chu A-C, Middendorf M, Bernt M. Combinatorics of tandem The dataset analyzed in the current study is available in the NCBI database [21]. duplication random loss mutations on circular genomes. IEEE ACM T The code of EqualTDRL and examples can be found at http://pacosy.informatik. Comput Bi. 2016;15(1):83–95. uni-leipzig.de/equaltdrl. The remainder of the data generated during this study 16. Braga MD, Stoye J. The solution space of sorting by DCJ. J Comput Biol. is included in this published article and its supplementary information file. 2010;17(9):1145–65. 17. Bernt M, Merkle D, Ramsch K, Fritzsch G, Perseke M, Bernhard D, Authors’ contributions Schlegel M, Stadler P, Middendorf M. CREx: inferring genomic TH and MB wrote and tested the EqualTDRL code. MM made substantive rearrangements based on common intervals. Bioinformatics. 2007;23(21): intellectual contributions to this paper. TH wrote the manuscript. All authors 2957–8. have read and approved the final manuscript. 18. Hartmann T, Bernt M, Middendorf M. An Exact Algorithm for Sorting by Weighted Preserving Genome Rearrangements. IEEE ACM T Comput Bi. Ethics approval and consent to participate 2018. In press. Not applicable. 19. Bernt M, Merkle D, Middendorf M. An algorithm for inferring mitogenome rearrangements in a phylogenetic tree. In: Proc. 6th Int’l Competing interests Workshop Comparative Genomics (RECOMB-CG ’08). Berlin: Springer; The authors declare that they have no competing interests. 2008. p. 143–57. 20. Yoshizawa K, Johnson KP, Sweet AD, Yao I, Ferreira RL, Cameron SL. Publisher’s Note Mitochondrial phylogenomics and genome rearrangements in the Springer Nature remains neutral with regard to jurisdictional claims in barklice (insecta: Psocodea). Mol Phyl Evol. 2017. published maps and institutional affiliations. 21. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): A curated non-redundant sequence database of genomes, transcripts Author details and proteins. Nucleic Acids Res. 2007;35(Database issue):61–5. Swarm Intelligence and Complex Systems Group, Faculty of Mathematics 22. Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, and Computer Science, Leipzig University, Augustusplatz 10, D-04109 Leipzig, Ostell J, Sayers EW. Genbank. Nucleic Acids Res. 2012;41(D1):36–42. Germany. Helmholtz Centre for Environmental Research - UFZ, 23. Maddison WP, Maddison DR. MacClade: analysis of phylogeny and Permoserstraße 15, D-04318 Leipzig, Germany. character evolution. version 3.0. Sunderland: Sinauer Associates; 1992. 24. Bernt M, Donath A, Jühling F, Externbrink F, Florentz C, Fritzsch G, Received: 22 January 2018 Accepted: 30 April 2018 Pütz J, Middendorf M, Stadler PF. MITOS: Improved de novo metazoan mitochondrial genome annotation. Mol Phyl Evol. 2013;69(2):313–9. Hartmann et al. BMC Bioinformatics (2018) 19:192 Page 10 of 10 25. Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013;29(22):2933–5. 26. Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39(suppl_2):29–37. 27. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402. Submit your next manuscript to BioMed Central and we will help you at every step: • We accept pre-submission inquiries � Our selector tool helps you to find the most relevant journal � We provide round the clock customer support � Convenient online submission � Thorough peer review � Inclusion in PubMed and all major indexing services � Maximum visibility for your research Submit your manuscript at www.biomedcentral.com/submit

Journal

BMC BioinformaticsSpringer Journals

Published: May 30, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off