FlexiDot: highly customizable, ambiguity-aware dotplots for visual sequence analyses

FlexiDot: highly customizable, ambiguity-aware dotplots for visual sequence analyses Abstract Summary FlexiDot is a cross-platform dotplot suite generating high quality self, pairwise and all-against-all visualizations. To improve dotplot suitability for comparison of consensus and error-prone sequences, FlexiDot harbors routines for strict and relaxed handling of ambiguities and substitutions. Our shading modules facilitate dotplot interpretation and motif identification by adding information on sequence annotations and sequence similarities. Combined with collage-like outputs, FlexiDot supports simultaneous visual screening of large sequence sets, enabling dotplot use for routine analyses. Availability and implementation FlexiDot is implemented in Python 2.7. Software and documentation are freely available at http://github.com/molbio-dresden/flexidot. Contact tony.heitkam@tu-dresden.de Supplementary information Supplementary data are available at Bioinformatics online. 1 Introduction First described five decades ago (Gibbs and McIntyre, 1970), dotplots remain effective tools for sequence exploration, still conveying key messages in current research (Hosaka et al., 2017). Dotplots allow characterization of complex or repetitive sequences, visual detection of DNA motifs and identification of modular similarities between sequences. Despite advances in dotplot algorithms and availability of different software tools, essential features are missing and, if available, scattered across various tools (Table 1, Supplementary Material). This prompted us to combine established functionalities (e.g. all-against-all modes) with new features for dotplot improvement (base ambiguity handling, shading, integration of annotations), while retaining usability and customizability. Table 1. Feature list of commonly used dotplot tools Tools  Ambiguity handling  Annotation shading  All-against-all mode  Batch analyses  Interactive GUI  Input: DNA–DNA  Input: DNA–protein  Input: protein-protein  Multiple output formats  Reverse complement  Self/pairwise collages  Similarity shading  Strict/relaxed matching  Citation  FlexiDot  +  +  +  +  –  +  –  +  +  +  +  +  +/+  here  Dotmatcher  –  –  –  +  –  +  –  +  +  –  –  –  –/+  Rice et al. (2000)  Dotter  –  –  +  –  +  +  +  +  –  +  –  –  +/+  Sonnhammer and Durbin (1995)  Dottup  –  –  –  +  –  +  –  +  +  –  –  –  +/–  Rice et al. (2000)  Gepard  –  +  +  –  +  +  –  +  –  +  –  –  +/–  Krumsiek et al. (2007)  PolyDot  –  –  +  +  –  +  –  +  +  –  –  –  +/–  Rice et al. (2000)  YASS webserver  +  –  –  –  +  +  –  +  –  +  –  –  +/+  Noé et al. (2005)  Tools  Ambiguity handling  Annotation shading  All-against-all mode  Batch analyses  Interactive GUI  Input: DNA–DNA  Input: DNA–protein  Input: protein-protein  Multiple output formats  Reverse complement  Self/pairwise collages  Similarity shading  Strict/relaxed matching  Citation  FlexiDot  +  +  +  +  –  +  –  +  +  +  +  +  +/+  here  Dotmatcher  –  –  –  +  –  +  –  +  +  –  –  –  –/+  Rice et al. (2000)  Dotter  –  –  +  –  +  +  +  +  –  +  –  –  +/+  Sonnhammer and Durbin (1995)  Dottup  –  –  –  +  –  +  –  +  +  –  –  –  +/–  Rice et al. (2000)  Gepard  –  +  +  –  +  +  –  +  –  +  –  –  +/–  Krumsiek et al. (2007)  PolyDot  –  –  +  +  –  +  –  +  +  –  –  –  +/–  Rice et al. (2000)  YASS webserver  +  –  –  –  +  +  –  +  –  +  –  –  +/+  Noé et al. (2005)  GUI, Graphical user interface. Table 1. Feature list of commonly used dotplot tools Tools  Ambiguity handling  Annotation shading  All-against-all mode  Batch analyses  Interactive GUI  Input: DNA–DNA  Input: DNA–protein  Input: protein-protein  Multiple output formats  Reverse complement  Self/pairwise collages  Similarity shading  Strict/relaxed matching  Citation  FlexiDot  +  +  +  +  –  +  –  +  +  +  +  +  +/+  here  Dotmatcher  –  –  –  +  –  +  –  +  +  –  –  –  –/+  Rice et al. (2000)  Dotter  –  –  +  –  +  +  +  +  –  +  –  –  +/+  Sonnhammer and Durbin (1995)  Dottup  –  –  –  +  –  +  –  +  +  –  –  –  +/–  Rice et al. (2000)  Gepard  –  +  +  –  +  +  –  +  –  +  –  –  +/–  Krumsiek et al. (2007)  PolyDot  –  –  +  +  –  +  –  +  +  –  –  –  +/–  Rice et al. (2000)  YASS webserver  +  –  –  –  +  +  –  +  –  +  –  –  +/+  Noé et al. (2005)  Tools  Ambiguity handling  Annotation shading  All-against-all mode  Batch analyses  Interactive GUI  Input: DNA–DNA  Input: DNA–protein  Input: protein-protein  Multiple output formats  Reverse complement  Self/pairwise collages  Similarity shading  Strict/relaxed matching  Citation  FlexiDot  +  +  +  +  –  +  –  +  +  +  +  +  +/+  here  Dotmatcher  –  –  –  +  –  +  –  +  +  –  –  –  –/+  Rice et al. (2000)  Dotter  –  –  +  –  +  +  +  +  –  +  –  –  +/+  Sonnhammer and Durbin (1995)  Dottup  –  –  –  +  –  +  –  +  +  –  –  –  +/–  Rice et al. (2000)  Gepard  –  +  +  –  +  +  –  +  –  +  –  –  +/–  Krumsiek et al. (2007)  PolyDot  –  –  +  +  –  +  –  +  +  –  –  –  +/–  Rice et al. (2000)  YASS webserver  +  –  –  –  +  +  –  +  –  +  –  –  +/+  Noé et al. (2005)  GUI, Graphical user interface. 2 Features and implementation FlexiDot is a multi-purpose dotplot suite for publication-ready dotplots, handling self, pairwise and all-against-all comparisons with individual and combined visualizations (Fig. 1A–C, see Supplementary Material for details). We want to highlight that (i) our mismatch and ambiguity handling enables analyses of degenerate consensus sequences and error-prone long reads (Fig. 1B), and that (ii) our sequence similarity and annotation-based shadings for self and all-against-all representations (Fig. 1A and C, respectively) convey descriptive information to facilitate sequence interpretation. Fig. 1. View largeDownload slide Visual sequence comparison by FlexiDot with window size 10 using six artificial test sequences. (A) Self dotplot collage. The Seq2 dotplot is shaded with custom annotations. (B) Influence of ambiguity and mismatch handling on pairwise dotplots. (C) All-against-all dotplot of the six sequences with ambiguity handling and similarity shading Fig. 1. View largeDownload slide Visual sequence comparison by FlexiDot with window size 10 using six artificial test sequences. (A) Self dotplot collage. The Seq2 dotplot is shaded with custom annotations. (B) Influence of ambiguity and mismatch handling on pairwise dotplots. (C) All-against-all dotplot of the six sequences with ambiguity handling and similarity shading The FlexiDot algorithm identifies matches, transforms them into diagonals and creates clear vector images (pdf, svg) or standard raster graphics (png). Less stringent matching is possible by addressing ambiguous residues specifically or by allowing a defined number of substitutions. A tabular output with lengths of the longest match (longest common subsequence, LCS) of all sequence pairs is provided. FlexiDot integrates highly customizable shadings: (i) Self dotplot regions can be highlighted according to their sequence annotation provided as general feature file (Fig. 1A). (ii) All-against-all comparisons can be shaded according to the LCS length in forward, reverse or both directions (Fig. 1C). (iii) The user can provide a matrix with numerical values (e.g. identities) to guide shading. Matrix values can be displayed in the dotplot. FlexiDot uses Python 2.7 with numpy, matplotlib, biopython, regex, colormap and colour libraries. It is operated from the command line under Windows, Linux, and Mac. Input sequences are either specified as single or multi-fasta, or automatically detected in the working directory. 3 Application As demonstrated for a variety of use cases in the Supplementary Material, FlexiDot creates publication-ready figures for complex sequences. This facilitates: evaluation of tandem repeat higher order structures of error-prone long reads, e.g. as seen in Sevim et al. (2016) and Symonova et al. (2017), combined depiction of sequence structure and functional annotations, identification of conserved motifs in related sequences, gene or repeat comparisons using degenerated consensus sequences (Schwichtenberg et al., 2016; Weber et al., 2013), analysis of terminal or internal inverted or direct repeats, e.g. for transposable element annotation (Hosaka et al., 2017). Acknowledgements We sincerely thank Michael Standke for help with the algorithm, as well as Beatrice Weber and Björn Langer for code testing and valuable feedback. Funding This work was supported by the German Federal Ministry of Education and Research [KMU-innovativ-18 grant 031B0224B] and the German Federal Ministry of Food and Agriculture [“Fachagentur Nachwachsende Rohstoffe e.V.” (FNR) grant 22031714]. Conflict of Interest: none declared. References Gibbs A.J., McIntyre G.A. ( 1970) The diagram: a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur. J. Biochem ., 16, 1– 11. Google Scholar CrossRef Search ADS PubMed  Hosaka A. et al.   ( 2017) Evolution of sequence-specific anti-silencing systems in Arabidopsis. Nat. Commun ., 8, 2161. Google Scholar CrossRef Search ADS PubMed  Krumsiek J. et al.   ( 2007) Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics , 23, 1026– 1028. Google Scholar CrossRef Search ADS PubMed  Noé L., Kucherov G. ( 2005) YASS: enhancing the sensitivity of DNA similarity search. Nucleic Acids Res ., 33, W540– W543. Google Scholar CrossRef Search ADS PubMed  Rice P. et al.   ( 2000) EMBOSS: the European molecular biology open software suite. Trends Genet ., 16, 276– 277. Google Scholar CrossRef Search ADS PubMed  Schwichtenberg K. et al.   ( 2016) Diversification, evolution and methylation of short interspersed nuclear element families in sugar beet and related Amaranthaceae species. Plant J ., 85, 229– 244. Google Scholar CrossRef Search ADS PubMed  Sevim V. et al.   ( 2016) Alpha-CENTAURI: assess-ing novel centromeric repeat sequence variation with long read sequencing. Bioinformatics , 32, 1921– 1924. Google Scholar CrossRef Search ADS PubMed  Sonnhammer E.L., Durbin R. ( 1995) A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene , 167, GC1– GC10. Google Scholar CrossRef Search ADS PubMed  Symonova R. et al.   ( 2017) Higher-order organisation of extremely amplified, potentially functional and massively methylated 5S rDNA in European pikes (Esox sp.). BMC Genomics , 18, 391. Google Scholar CrossRef Search ADS PubMed  Weber B. et al.   ( 2013) Highly diverse chromoviruses of Beta vulgaris are classified by chromodomains and chromosomal integration. Mob. DNA , 4, 8. Google Scholar CrossRef Search ADS PubMed  © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

FlexiDot: highly customizable, ambiguity-aware dotplots for visual sequence analyses

Loading next page...
 
/lp/ou_press/flexidot-highly-customizable-ambiguity-aware-dotplots-for-visual-Nu4KdQvH1X
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
ISSN
1367-4803
eISSN
1460-2059
D.O.I.
10.1093/bioinformatics/bty395
Publisher site
See Article on Publisher Site

Abstract

Abstract Summary FlexiDot is a cross-platform dotplot suite generating high quality self, pairwise and all-against-all visualizations. To improve dotplot suitability for comparison of consensus and error-prone sequences, FlexiDot harbors routines for strict and relaxed handling of ambiguities and substitutions. Our shading modules facilitate dotplot interpretation and motif identification by adding information on sequence annotations and sequence similarities. Combined with collage-like outputs, FlexiDot supports simultaneous visual screening of large sequence sets, enabling dotplot use for routine analyses. Availability and implementation FlexiDot is implemented in Python 2.7. Software and documentation are freely available at http://github.com/molbio-dresden/flexidot. Contact tony.heitkam@tu-dresden.de Supplementary information Supplementary data are available at Bioinformatics online. 1 Introduction First described five decades ago (Gibbs and McIntyre, 1970), dotplots remain effective tools for sequence exploration, still conveying key messages in current research (Hosaka et al., 2017). Dotplots allow characterization of complex or repetitive sequences, visual detection of DNA motifs and identification of modular similarities between sequences. Despite advances in dotplot algorithms and availability of different software tools, essential features are missing and, if available, scattered across various tools (Table 1, Supplementary Material). This prompted us to combine established functionalities (e.g. all-against-all modes) with new features for dotplot improvement (base ambiguity handling, shading, integration of annotations), while retaining usability and customizability. Table 1. Feature list of commonly used dotplot tools Tools  Ambiguity handling  Annotation shading  All-against-all mode  Batch analyses  Interactive GUI  Input: DNA–DNA  Input: DNA–protein  Input: protein-protein  Multiple output formats  Reverse complement  Self/pairwise collages  Similarity shading  Strict/relaxed matching  Citation  FlexiDot  +  +  +  +  –  +  –  +  +  +  +  +  +/+  here  Dotmatcher  –  –  –  +  –  +  –  +  +  –  –  –  –/+  Rice et al. (2000)  Dotter  –  –  +  –  +  +  +  +  –  +  –  –  +/+  Sonnhammer and Durbin (1995)  Dottup  –  –  –  +  –  +  –  +  +  –  –  –  +/–  Rice et al. (2000)  Gepard  –  +  +  –  +  +  –  +  –  +  –  –  +/–  Krumsiek et al. (2007)  PolyDot  –  –  +  +  –  +  –  +  +  –  –  –  +/–  Rice et al. (2000)  YASS webserver  +  –  –  –  +  +  –  +  –  +  –  –  +/+  Noé et al. (2005)  Tools  Ambiguity handling  Annotation shading  All-against-all mode  Batch analyses  Interactive GUI  Input: DNA–DNA  Input: DNA–protein  Input: protein-protein  Multiple output formats  Reverse complement  Self/pairwise collages  Similarity shading  Strict/relaxed matching  Citation  FlexiDot  +  +  +  +  –  +  –  +  +  +  +  +  +/+  here  Dotmatcher  –  –  –  +  –  +  –  +  +  –  –  –  –/+  Rice et al. (2000)  Dotter  –  –  +  –  +  +  +  +  –  +  –  –  +/+  Sonnhammer and Durbin (1995)  Dottup  –  –  –  +  –  +  –  +  +  –  –  –  +/–  Rice et al. (2000)  Gepard  –  +  +  –  +  +  –  +  –  +  –  –  +/–  Krumsiek et al. (2007)  PolyDot  –  –  +  +  –  +  –  +  +  –  –  –  +/–  Rice et al. (2000)  YASS webserver  +  –  –  –  +  +  –  +  –  +  –  –  +/+  Noé et al. (2005)  GUI, Graphical user interface. Table 1. Feature list of commonly used dotplot tools Tools  Ambiguity handling  Annotation shading  All-against-all mode  Batch analyses  Interactive GUI  Input: DNA–DNA  Input: DNA–protein  Input: protein-protein  Multiple output formats  Reverse complement  Self/pairwise collages  Similarity shading  Strict/relaxed matching  Citation  FlexiDot  +  +  +  +  –  +  –  +  +  +  +  +  +/+  here  Dotmatcher  –  –  –  +  –  +  –  +  +  –  –  –  –/+  Rice et al. (2000)  Dotter  –  –  +  –  +  +  +  +  –  +  –  –  +/+  Sonnhammer and Durbin (1995)  Dottup  –  –  –  +  –  +  –  +  +  –  –  –  +/–  Rice et al. (2000)  Gepard  –  +  +  –  +  +  –  +  –  +  –  –  +/–  Krumsiek et al. (2007)  PolyDot  –  –  +  +  –  +  –  +  +  –  –  –  +/–  Rice et al. (2000)  YASS webserver  +  –  –  –  +  +  –  +  –  +  –  –  +/+  Noé et al. (2005)  Tools  Ambiguity handling  Annotation shading  All-against-all mode  Batch analyses  Interactive GUI  Input: DNA–DNA  Input: DNA–protein  Input: protein-protein  Multiple output formats  Reverse complement  Self/pairwise collages  Similarity shading  Strict/relaxed matching  Citation  FlexiDot  +  +  +  +  –  +  –  +  +  +  +  +  +/+  here  Dotmatcher  –  –  –  +  –  +  –  +  +  –  –  –  –/+  Rice et al. (2000)  Dotter  –  –  +  –  +  +  +  +  –  +  –  –  +/+  Sonnhammer and Durbin (1995)  Dottup  –  –  –  +  –  +  –  +  +  –  –  –  +/–  Rice et al. (2000)  Gepard  –  +  +  –  +  +  –  +  –  +  –  –  +/–  Krumsiek et al. (2007)  PolyDot  –  –  +  +  –  +  –  +  +  –  –  –  +/–  Rice et al. (2000)  YASS webserver  +  –  –  –  +  +  –  +  –  +  –  –  +/+  Noé et al. (2005)  GUI, Graphical user interface. 2 Features and implementation FlexiDot is a multi-purpose dotplot suite for publication-ready dotplots, handling self, pairwise and all-against-all comparisons with individual and combined visualizations (Fig. 1A–C, see Supplementary Material for details). We want to highlight that (i) our mismatch and ambiguity handling enables analyses of degenerate consensus sequences and error-prone long reads (Fig. 1B), and that (ii) our sequence similarity and annotation-based shadings for self and all-against-all representations (Fig. 1A and C, respectively) convey descriptive information to facilitate sequence interpretation. Fig. 1. View largeDownload slide Visual sequence comparison by FlexiDot with window size 10 using six artificial test sequences. (A) Self dotplot collage. The Seq2 dotplot is shaded with custom annotations. (B) Influence of ambiguity and mismatch handling on pairwise dotplots. (C) All-against-all dotplot of the six sequences with ambiguity handling and similarity shading Fig. 1. View largeDownload slide Visual sequence comparison by FlexiDot with window size 10 using six artificial test sequences. (A) Self dotplot collage. The Seq2 dotplot is shaded with custom annotations. (B) Influence of ambiguity and mismatch handling on pairwise dotplots. (C) All-against-all dotplot of the six sequences with ambiguity handling and similarity shading The FlexiDot algorithm identifies matches, transforms them into diagonals and creates clear vector images (pdf, svg) or standard raster graphics (png). Less stringent matching is possible by addressing ambiguous residues specifically or by allowing a defined number of substitutions. A tabular output with lengths of the longest match (longest common subsequence, LCS) of all sequence pairs is provided. FlexiDot integrates highly customizable shadings: (i) Self dotplot regions can be highlighted according to their sequence annotation provided as general feature file (Fig. 1A). (ii) All-against-all comparisons can be shaded according to the LCS length in forward, reverse or both directions (Fig. 1C). (iii) The user can provide a matrix with numerical values (e.g. identities) to guide shading. Matrix values can be displayed in the dotplot. FlexiDot uses Python 2.7 with numpy, matplotlib, biopython, regex, colormap and colour libraries. It is operated from the command line under Windows, Linux, and Mac. Input sequences are either specified as single or multi-fasta, or automatically detected in the working directory. 3 Application As demonstrated for a variety of use cases in the Supplementary Material, FlexiDot creates publication-ready figures for complex sequences. This facilitates: evaluation of tandem repeat higher order structures of error-prone long reads, e.g. as seen in Sevim et al. (2016) and Symonova et al. (2017), combined depiction of sequence structure and functional annotations, identification of conserved motifs in related sequences, gene or repeat comparisons using degenerated consensus sequences (Schwichtenberg et al., 2016; Weber et al., 2013), analysis of terminal or internal inverted or direct repeats, e.g. for transposable element annotation (Hosaka et al., 2017). Acknowledgements We sincerely thank Michael Standke for help with the algorithm, as well as Beatrice Weber and Björn Langer for code testing and valuable feedback. Funding This work was supported by the German Federal Ministry of Education and Research [KMU-innovativ-18 grant 031B0224B] and the German Federal Ministry of Food and Agriculture [“Fachagentur Nachwachsende Rohstoffe e.V.” (FNR) grant 22031714]. Conflict of Interest: none declared. References Gibbs A.J., McIntyre G.A. ( 1970) The diagram: a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur. J. Biochem ., 16, 1– 11. Google Scholar CrossRef Search ADS PubMed  Hosaka A. et al.   ( 2017) Evolution of sequence-specific anti-silencing systems in Arabidopsis. Nat. Commun ., 8, 2161. Google Scholar CrossRef Search ADS PubMed  Krumsiek J. et al.   ( 2007) Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics , 23, 1026– 1028. Google Scholar CrossRef Search ADS PubMed  Noé L., Kucherov G. ( 2005) YASS: enhancing the sensitivity of DNA similarity search. Nucleic Acids Res ., 33, W540– W543. Google Scholar CrossRef Search ADS PubMed  Rice P. et al.   ( 2000) EMBOSS: the European molecular biology open software suite. Trends Genet ., 16, 276– 277. Google Scholar CrossRef Search ADS PubMed  Schwichtenberg K. et al.   ( 2016) Diversification, evolution and methylation of short interspersed nuclear element families in sugar beet and related Amaranthaceae species. Plant J ., 85, 229– 244. Google Scholar CrossRef Search ADS PubMed  Sevim V. et al.   ( 2016) Alpha-CENTAURI: assess-ing novel centromeric repeat sequence variation with long read sequencing. Bioinformatics , 32, 1921– 1924. Google Scholar CrossRef Search ADS PubMed  Sonnhammer E.L., Durbin R. ( 1995) A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene , 167, GC1– GC10. Google Scholar CrossRef Search ADS PubMed  Symonova R. et al.   ( 2017) Higher-order organisation of extremely amplified, potentially functional and massively methylated 5S rDNA in European pikes (Esox sp.). BMC Genomics , 18, 391. Google Scholar CrossRef Search ADS PubMed  Weber B. et al.   ( 2013) Highly diverse chromoviruses of Beta vulgaris are classified by chromodomains and chromosomal integration. Mob. DNA , 4, 8. Google Scholar CrossRef Search ADS PubMed  © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

BioinformaticsOxford University Press

Published: May 14, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off