MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data

Tomáš Pluskal; Sandra Castillo; Alejandro Villar-Briones; Matej Orešič

doi:10.1186/1471-2105-11-395

MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data

Pluskal, Tomáš; Castillo, Sandra; Villar-Briones, Alejandro; Orešič, Matej 2010-07-23 00:00:00 Background: Mass spectrometry (MS) coupled with online separation methods is commonly applied for differential and quantitative profiling of biological samples in metabolomic as well as proteomic research. Such approaches are used for systems biology, functional genomics, and biomarker discovery, among others. An ongoing challenge of these molecular profiling approaches, however, is the development of better data processing methods. Here we introduce a new generation of a popular open-source data processing toolbox, MZmine 2. Results: A key concept of the MZmine 2 software design is the strict separation of core functionality and data processing modules, with emphasis on easy usability and support for high-resolution spectra processing. Data processing modules take advantage of embedded visualization tools, allowing for immediate previews of parameter settings. Newly introduced functionality includes the identification of peaks using online databases, MS data support, improved isotope pattern support, scatter plot visualization, and a new method for peak list alignment based on the random sample consensus (RANSAC) algorithm. The performance of the RANSAC alignment was evaluated using synthetic datasets as well as actual experimental data, and the results were compared to those obtained using other alignment algorithms. Conclusions: MZmine 2 is freely available under a GNU GPL license and can be obtained from the project website at: http://mzmine.sourceforge.net/. The current version of MZmine 2 is suitable for processing large batches of data and has been applied to both targeted and non-targeted metabolomic analyses. Background exchange and standardization. It also underlines the Mass spectrometry (MS) coupled with online separation need for a flexible and universal software framework to methods, such as liquid chromatography (LC), is com- provide the necessary support for data import, export, monly applied for differential and quantitative profiling and visualization, thus allowing the rapid development of biological samples in metabolomic and proteomic of specialized data-processing methods. research. Such approaches are useful in the domains of MZmine was first introduced in 2005 as an open- systems biology, functional genomics, and biomarker source software toolbox for LC-MS data processing [3]. discovery. One of the ongoing challenges of such mole- The first version of MZmine defined the data analysis cular profiling approaches is the development of better workflow and implemented simple methods for data processing and visualization [3,4]. The software has data processing methods. Several software packages have been developed for this purpose, and have been been applied to numerous metabolomic analyses [5-10] extensively reviewed by Katajamaa and Orešič [1]. and comparative studies with other related software The recent introduction of mzML, an open and uni- packages have been performed [9,11]. A weakness of versal format for MS data [2], represents an important MZmine was insufficient modularity in its initial design, milestone in the effort to address the issues of MS data thus limiting the possibility of expanding the software with new methods developed by the scientific commu- * Correspondence: [email protected] nity. For this reason, the new release, MZmine 2, was G0 Cell Unit, Okinawa Institute of Science and Technology (OIST), Onna, completely redesigned to support modularity. Here we Okinawa, Japan © 2010 Pluskal et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 2 of 11 http://www.biomedcentral.com/1471-2105/11/395 describe the architecture of MZmine 2 as well as its were linked to embedded visualization modules, provid- basic features. We also introduce a new and efficient ing online previews during parameter setup. Addition- method for peak list alignment that was implemented in ally, the use of any data-processing method in MZmine MZmine 2. 2 does not remove the original (unprocessed) data, giv- ing the user the option to return back to previous Implementation results or raw data at any stage of data processing. MZmine 2 was developed using Java technology, and is The third goal was to provide good support for pro- therefore platform independent. The software has been cessing high-resolution MS data, e.g., as obtained from Orbitrap or Fourier transform ion cyclotron resonance tested on the Windows, Mac OS X, and Linux plat- forms. We focused on three main aims during the soft- MS instruments. We designed the data import and peak ware design and implementation. detection modules to maintain the precision of the First, the framework should be flexible and allow for imported data without any degradation due to inade- easy and straightforward development of new data pro- quate resampling. Because the use of high-resolution cessing modules. We addressedthisbykeeping astrict data suggests an increased data volume, MZmine 2 was separation between the application core and individual tested and optimized with large datasets (on the order modules for data processing and visualization (Figure 1). of gigabytes). A compact data model was designed and the code of The flexibility of the Java environment allows MZmine each Java class code was kept short and intuitive. To 2 to take advantage of several open-source libraries, support the development of new modules, we provided including JFreeChart (http://www.jfree.org/jfreechart/) an online tutorial available at the project web site. forthe TIC,spectra,2Dand other visualizers,VisAD Second, the graphical interface of the application (http://www.ssec.wisc.edu/~billh/visad.html) for the 3D should be intuitive and easy to use. For this purpose, visualizer, Chemistry Development Kit (CDK) [12] for critical data processing methods such as peak picking calculating isotopic distributions, JChemPaint (http:// Figure 1 MZmine 2 software architecture and its main modules Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 3 of 11 http://www.biomedcentral.com/1471-2105/11/395 jchempaint.sourceforge.net/) for rendering 2D molecular Raw data file format support structures, and Jmol (http://jmol.sourceforge.net/) for MZmine 2 can read and process both unit mass resolu- rendering 3D molecular structures. These libraries are tion and accurate mass resolution MS data in both con- included in the MZmine 2 distribution. tinuous and centroid modes, including fragmentation (MS ) scans. Raw data import is modularized and the currently supported file formats are mzML (1.0 and 1.1), Results mzXML (2.0, 2.1 and 3.0), mzData (1.04 and 1.05), The typical MS data processing workflow comprises raw NetCDF, and RAW format used natively by Thermo data file import, filtering/smoothing (optional), peak pick- Fisher Scientific instruments (requires installation of ing, peak list deisotoping, alignment, gap filling, and nor- malization [4]. The MZmine 2 modules cover all these Thermo Xcalibur). Support for other file formats can be workflow stages and also include additional functionality implemented as additional plug-ins. for the visualization and interpretation of the results. Only features new to MZmine 2 are described in this section. Data visualization MZmine 2 includes several of visualization modules Project management (Figure 2), all of which were newly implemented for this Oneofthe newcorefeaturesofMZmine2is project release. Following the goal of providing the user with an management, which allows the user to track and store intuitive interface, the visualizers automatically annotate intermediate results. Each data-processing step can be raw data with the obtained peak picking and identifica- performed multiple times with different parameters and tion results, allowing for quick orientation when large the results can be observed and compared. The data amounts of data are being processed. processing pipeline settings (e.g., algorithms and para- Quantitative results in the form of peak lists may be meters used, reference peak lists) can be stored for observed using a table visualizer or chart-plotting mod- future applications. Direct export of the peak list data to ules (Figure 2I). The scatter plot visualizer (Figure 2F) comma-separated values (CSV) or XML files is also has proven to be very useful for efficient comparison of possible. multiple samples [13]. Figure 2 Screenshot of MZmine 2 showing multiple visualization modules. The specific panels included are: (A) imported samples, (B) peak lists including single peak list contents, (C) peak shapes for an identified metabolite across multiple samples, (D) MS/MS spectrum of a metabolite, (E) combined base peak plot for multiple samples, (F) scatter plot of peak areas across two samples, (G) 2D plot of a detected peak, mass-to-charge ratio vs. retention time, (H) 3D view of a detected peak, and (I) intensity plot for specific peaks across multiple samples. Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 4 of 11 http://www.biomedcentral.com/1471-2105/11/395 Peak detection into chromatogram objects. The default algorithm pro- Feature detection is a critical step in MS data proces- vided by MZmine 2 connects m/z values in the order of sing. The peak detection methods and their implemen- their intensity, with the most intense peaks connected tations should be flexible enough to deal with great first. A chromatogram spanning a given minimal time differences in data obtained from different instruments, range is constructed for each m/z value (within user- such as variable mass resolution, chromatographic reso- defined tolerance). Each chromatogram is then deconvo- lution and peak shape, or background noise. In MZmine luted into individual chromatographic peaks (Figure 3C). 2, peak detection is performed in several customizable Several algorithms are provided as plug-ins. The “Base- steps (Figure 3). Previews are provided to allow for opti- line cut-off” algorithm recognizes each chromatographic mal selection of parameter values. peak that has an intensity above a given minimum level In the first step (Figure 3A), each MS spectrum is pro- and spans over a given minimum time range. The cessed individually and converted to pairs of m/z and “Noise amplitude” algorithm adds another parameter intensity values (in other words, each mass spectrum is specifying the intensity range, which is considered noisy. centroided). Several algorithms are provided as plug-ins, The algorithm then finds the intensity level where most each suitable for a different type of mass spectra. The of the noise is concentrated and sets the baseline level “Local maxima” algorithm is a simple algorithm suitable to this intensity, individually for each chromatogram. for demonstrating the process: it detects each local max- Following the setting of the baseline, the procedure is imum in the spectrum. The “Recursive threshold” algo- thesameasthe “Baseline cut-off” algorithm. The rithm is based on an earlier method implemented in Savitzky-Golay algorithm uses the smoothed second MZmine [3,4] and adds two additional parameters of derivative of the chromatogram curve to detect the bor- minimum and maximum peak m/z width. This method ders of individual peaks. The “Local minimum search” reduces the false positives by avoiding detection of noise algorithm attempts to identify local minima in the chro- peaks. The “Wavelet transform” algorithm is particularly matogram as border points between individual peaks. suitable for noisy data. It processes each spectrum using Several restrictions are placed on possible peak shapes, continuous wavelet transform, matching the m/z peaks such as minimum absolute and relative intensities, or a to the “Mexican hat” wavelet model. This algorithm is minimum ratio between peak maximum and edge. based on a previously reported method [14]. The “Exact We also implemented an experimental module, which mass” algorithm assumes high quality spectra (high fits the (potentially noisy) set of data points of each mass resolution, low noise) and determines the center of deconvoluted peak with an ideal peak model such as each m/z peak using the “full width at half maximum” Gaussian or Exponentially Modified Gaussian (Figure paradigm: m/z value is placed in the middle of the line, 3D). Such an approach may reduce the chromatographic which crosses the peak at half of the maximum intensity noise between samples, but the practical applicability of (as shown in the insets in Figure 3A). Finally, the “Cen- this method has not yet been thoroughly validated. troid” algorithm is suitable for already centroided data. It detects all data points above the specified noise level Peak identification as m/z peaks. Assignment of intuitive metabolite or peptide names to Data obtained by Fourier transform mass spectrometry detected m/z values greatly assists with the process of instruments provide very high mass resolution, but suf- data interpretation. In MZmine 2, identification of fer from the presence of noise signals known as peaks can be performed either by searching a custom “shoulder peaks” (Figure 3B). These peaks are residues database of m/z values and retention times, or by con- of the Fourier transform function calculated by the necting to an online resource such as PubChem [15], instrument and their intensity is usually below 5% of the KEGG [16], METLIN [17], or HMDB [18] directly intensity of the main (true) m/z peak. To remove these from the MZmine 2 interface (Figure 4). For each ion noise peaks, we introduced an optional filtration plug-in subjected to identification, its neutral molecular mass that builds a theoretical model (such as Gaussian or (m ) is calculated from its m/z value. For that pur- neutral Lorentzian) with given mass resolution around each pose, the charge of the ion (z) can be automatically peak, and removes all noise peaks below this model. determined from its isotope pattern. Ionization mode Peaks are processed in the order of decreasing intensity. (positive or negative) and ionization adduct (e.g. H , + + In the preview (Figure 3B), the main m/z signal is indi- Na ,K , etc.) are selected by the user as parameters. cated by the red color, while the shoulder peaks subject Neutral mass is then calculated as m = (m/z × z) neutral to removal are indicated in yellow. Again, it is possible ± m , where the sign (±) is defined by the ioniza- adduct to implement other filtration algorithms as plug-ins. tion mode and m is the mass of the selected ioni- adduct The next step consists of an algorithm that connects zation adduct. The neutral mass m is the primary neutral consecutive m/z values spanning over multiple scans term for database search, within user-specified Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 5 of 11 http://www.biomedcentral.com/1471-2105/11/395 Figure 3 Peak detection modules with previews. (A) Mass detection (centroiding) module. Recognized m/z peaks are shown in red. In the insets, details of a single m/z peak are shown, indicating the full width at half maximum approach to the m/z value calculation. (B) Fourier transform mass spectrometry shoulder peaks filter. In the preview panel, the main detected peak is indicated with the red line, while shoulder peaks are indicated with the yellow lines. (C) Peak deconvolution. Each individual recognized peak within the chromatogram is indicated by a different color. (D) Experimental peak shape modeler. A Gaussian peak model (pink) is fitted to the deconvoluted chromatographic peak’s data points (blue). Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 6 of 11 http://www.biomedcentral.com/1471-2105/11/395 Figure 4 Peak identification using the PubChem Compound database. (A) A peak list showing the row selected for identification. (B) Dialog for setting search parameters. (C) Table of candidates obtained from the database within a given mass tolerance. (D) 2D and 3D structural views of the candidate compound. tolerance. Isotopic pattern similarity can be used as a include outliers. The probability of obtaining a good second filter to select optimal candidates, by compar- result increases with the number of iterations. In each ing the ratios of the detected isotopes and matching iteration, a random subset of observed data points is isotopes from the predicted isotopic pattern of the selected and a model is fit to this data. In our specific database compound. Because the online identification case, we used 4 points to find a non-linear model. The module is itself modularized, support for other mole- remaining data is tested against the fitted model and if a cular databases can be easily added. For proteomic value fits well, it is considered a part of the model. applications, a module allowing identification of pep- Finally, the model is evaluated and when the iteration is tide peaks using the MASCOT [19] search engine and finished, the model with the most data points fitted to it MS/MS spectra is under development. is considered the best. The RANSAC method of alignment makes use of two user-defined two-dimensional windows, the RANSAC RANdom SAmple Consensus (RANSAC) aligner window (RW) and Alignment window (AW), respec- The purpose of peak list alignment is to match relevant peaks across multiple samples. The original MZmine tively. The RW is defined by the m/z threshold rm and software introduced a simple alignment algorithm that retention time threshold rr , and AW constitutes the first creates an empty master peak list and then aligns same m/z threshold rm but a different retention time each peak from given peak lists (samples) to the best threshold ar . The retention time threshold in RW candidate of the master list using a two-dimensional should be as big as the maximum observed deviation in alignment window (AW) represented by user-specified the retention time among all peaks. The procedure for m/z and retention time tolerances. If no suitable candi- aligning a sample S with the master list L is as follows: date is found, a new row is created in the master list. In Step 1: For every row i in L, let MZmine 2, this algorithm is referred to as the “Join r = the average retention time of all individual peaks aligner”. One disadvantage of the Join aligner is the in the row inability to cope with a non-linear deviation of the m = average m/z of all individual peaks in the row retention times among samples. For this purpose, we RW =[(m, r)| m - rm ≤ m ≤ m + rm and r - rr ≤ i i 0 i 0 i 0 introduced a new peak list alignment method based on r ≤ r + rr ], the RANSAC window for row i. i 0 the RANSAC algorithm. Then, for row i in L,markallpeaksinsampleS in The RANSAC algorithm [20] is a non-deterministic RW as candidate alignments. iterative algorithm that estimates parameters of a math- Step 2: Build a scatter plot representation of all candi- ematical model from a set of observed data, which may date alignments, and apply the RANSAC algorithm to Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 7 of 11 http://www.biomedcentral.com/1471-2105/11/395 build a candidate model for alignment. This model First, 12 synthetic datasets were created using samples represents a list of matching retention times. from 12 different lipidomic studies. A single sample Step 3: Apply the locally-weighted scatterplot smooth- from each study was used as a seed to create a synthetic ing (LOESS) method for regression [21] on all points in set of 20 samples. These 20 samples contained identical the model obtained with RANSAC. information (peaks), but a random non-linear deviation Step 4: Using this regression model, for each row i in in the retention time was introduced into each one. The L, predict the correction for the retention time shift to MZmine 2 projects of all 12 datasets are available on- locate the new center (m ,r’ ) of the alignment window line (see Dataset download). Each dataset was aligned i i using the RANSAC aligner and Join aligner with three AW . RANSAC alignment can correct the retention time deviation by centering the position of the AW to the different retention time tolerance thresholds (50 s, 20 s, correct position in the new sample. and 5 s). Parameters used for alignment are specified in Thus, the alignment window AW =[(m, r)| m - rm Table 1. Run times of the RANSAC aligner were mea- i i 0 ≤ m ≤ m + rm and r’ - ar ≤ r ≤ r’ + ar ] sured and are reported in Table 2. Precision and recall i 0 i 0 i 0 Step 5: For each row i in L, apply the Join algorithm values were calculated and the average results are for alignment using the alignment window AW . shown in Figure 6 (numerical results are available in Figure 5 shows a preview of the RANSAC alignment Additional file 1). Only the use of the RANSAC algo- in MZmine 2. Each dot represents a candidate align- rithm achieved 100% in both precision and recall perfor- ment of two peaks. Red dots represent those candidate mance on these synthetic data sets. alignments that were fitted to the best model (blue line). Our second approach for the comparison was to use the real proteomic (P1 and P2) and metabolomic (M1 and RANSAC aligner performance M2) datasets introduced by Lange et al. [11], together Two types of errors can be introduced during the align- with their tables of “ground truth” alignments and an ment process [11]. Either two non-related peaks could evaluation script for calculating the alignment precision be matched, or the matching of two related peaks could and recall values. We applied the MZmine 2 Join and be omitted. A variable called “precision” represents the RANSAC aligners to align all the datasets with the para- proportion of true alignments out of all alignments meters specified in Table 1. Run times of the RANSAC found by the algorithm. The proportion of peaks that aligner are reported in Table 2. Precision and recall are correctly aligned by the algorithm out of all true values were calculated using the provided evaluation alignments inside the dataset is called “recall“.These script and compared to already published results in two variables together represent the quality of the align- Table3.Weusedthelatest availableevaluationresults published at http://msbi.ipb-halle.de/msbi/caap at the ment. To test whether the newly introduced RANSAC algorithm performs better than the Join alignment, the time of writing. Compared to the Join aligner, the RAN- results of two different approaches were compared. SAC aligner provided better results in 11 of 13 Figure 5 RANSAC aligner. Dialog shows preview of RANSAC alignment of two peak lists using the given parameters. Each possible candidate alignment (peak pair) within a defined m/z and retention time tolerance is shown as a dot. A model is fitted to the data (blue line) and red dots indicate those fitting to the model and therefore selected for the final alignment. Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 8 of 11 http://www.biomedcentral.com/1471-2105/11/395 Table 1 Parameter values used for aligning the 12 synthetic data sets and the real proteomic (P1 and P2) and metabolomic (M1 and M2) data sets using the RANSAC and Join aligners Parameter 12 synthetic data sets Proteomics data Metabolomics data Data set P1 Data set P2 Data set M1 Data set M2 m/z tolerance 0.05 m/z 1.5 m/z 1.5 m/z 0.03 m/z 0.025 m/z RT tolerance after correction 0:25 02:30 2:30 00:50 00:30 RT tolerance 0:50 03:30 05:00 00:30 00:30 RANSAC iterations 5000 50000 50000 15000 15000 Minimum number of points 20% 2.00% 0.10% 20.00% 20.00% Threshold value 4 seconds 4 seconds 15 seconds 4 seconds 4 seconds Non-linear model yes yes no yes yes alignments, with worse results obtained in only a single fractions of dataset P1, the RANSAC aligner provided case (P2 dataset fraction 00). We assume that the high the best results among all the tested algorithms. Com- number of features in this fraction (over 6800 rows after plete datasets P1, P2, M1, and M2, as well as all align- alignment) made it somewhat difficult for the RANSAC ment results, are available online (see Dataset algorithm to build a suitable model. Notably, in all download). Conclusions Table 2 Run times of the RANSAC aligner for aligning the The development of MZmine 2 was motivated by the 12 synthetic data sets and the real proteomic (P1 and P2) and metabolomic (M1 and M2) data sets need for a flexible and modular software platform that would allow the bioinformatic and analytical community Data set Run time (min) to contribute new methods for specific stages of MS- Run 1 Run 2 Run 3 Average based data processing. Great emphasis was placed on Synthetic data set 1 0.17 0.15 0.16 0.16 achieving the three main goals of a flexible, extendable, Synthetic data set 2 0.32 0.31 0.31 0.31 and modular design; user-friendly graphic interface; and Synthetic data set 3 0.44 0.41 0.42 0.42 good support for high-resolution MS data. The authors Synthetic data set 4 0.46 0.45 0.45 0.45 of this manuscript work in the field of metabolomics Synthetic data set 5 0.62 0.67 0.74 0.68 utilizing an LC-MS analytical platform, and therefore Synthetic data set 6 0.39 0.38 0.39 0.39 the currently developed modules were tested mainly on Synthetic data set 7 0.54 0.54 0.55 0.55 LC-MS data. The flexibility of MZmine 2, however, allows for easy expansion to other dataset types such as Synthetic data set 8 0.25 0.26 0.25 0.25 gas chromatography-MS, as well as interoperation with Synthetic data set 9 0.79 0.80 0.84 0.81 popular proteomics search engines such as MASCOT. Synthetic data set 10 0.73 0.73 0.72 0.73 Synthetic data set 11 5.24 4.11 5.17 4.84 Synthetic data set 12 7.79 7.78 7.64 7.74 M1 62.27 58.08 63.56 61.30 M2 147.64 163.62 146.79 152.69 P1 000 4.95 6.49 7.72 6.39 020 0.50 0.50 0.57 0.52 040 0.76 0.70 0.65 0.70 060 1.11 1.06 1.14 1.10 080 0.61 0.57 0.67 0.62 100 0.46 0.48 0.51 0.48 P2 000 22.47 22.94 21.12 22.18 020 1.35 1.31 1.18 1.28 Figure 6 Performance comparison of RANSAC aligner and Join 040 0.65 0.73 0.71 0.70 aligner for 12 synthetic datasets. For each dataset, peak lists were aligned using the RANSAC aligner and the Join aligner with 080 0.31 0.36 0.39 0.35 three different retention time tolerance thresholds (50 s, 20 s, and 5 100 0.47 0.43 0.49 0.46 s). Plot shows the average recall and precision values for all Run times were obtained on an AMD Opteron 1.8 GHz dual-core system with datasets. Error bars indicate standard deviations. 10 GB RAM, running Linux. Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 9 of 11 http://www.biomedcentral.com/1471-2105/11/395 Table 3 Performance comparison of MZmine 2 alignment methods (right side of the table) to previously published results (left side of the table) obtained using several different software packages [11] Results published by Lange et al. (2008), avaiable at the time of writing at http://msbi. MZmine 2 results ipb-halle.de/msbi/caap msInspect MZmine OpenMS SpecArray XAlign XCMS (version 0.6) without RT With Join RANSAC correction correction aligner aligner Proteomics data set P1 fraction 00 Recall 0.52 0.81 0.86 0.61 0.82 0.72 0.62 0.80 0.86 Precision 0.38 0.81 0.86 0.61 0.82 0.54 0.58 0.80 0.86 fraction 20 Recall 0.56 0.90 0.92 0.62 0.85 0.88 0.81 0.90 0.93 Precision 0.45 0.90 0.92 0.62 0.85 0.84 0.80 0.90 0.93 fraction 40 Recall 0.63 0.90 0.94 0.75 0.87 0.92 0.81 0.87 0.94 Precision 0.48 0.90 0.94 0.75 0.87 0.85 0.80 0.87 0.94 fraction 60 Recall 0.73 0.84 0.96 0.71 0.87 0.91 0.78 0.89 0.97 Precision 0.54 0.84 0.96 0.71 0.87 0.80 0.75 0.89 0.97 fraction 80 Recall 0.70 0.94 0.96 0.74 0.90 0.94 0.89 0.94 0.97 Precision 0.57 0.94 0.96 0.74 0.90 0.88 0.88 0.94 0.97 fraction 100 Recall 0.82 0.94 0.94 0.77 0.96 0.95 0.96 0.95 0.96 Precision 0.56 0.94 0.94 0.77 0.96 0.89 0.96 0.95 0.96 Proteomics data set P2 fraction 00 Recall 0.23 0.62 0.77 0.07 0.65 0.70 0.58 0.63 0.56 Precision 0.07 0.49 0.65 0.05 0.49 0.31 0.44 0.53 0.49 fraction 20 Recall 0.67 0.87 0.92 0.57 0.84 0.89 0.86 0.81 0.93 Precision 0.24 0.71 0.77 0.42 0.70 0.55 0.66 0.69 0.78 fraction 40 Recall 0.44 0.79 0.76 0.60 0.71 0.72 0.72 0.74 0.78 Precision 0.26 0.76 0.74 0.41 0.69 0.56 0.69 0.73 0.77 fraction 80 Recall 0.73 0.60 0.80 0.65 0.58 0.64 0.49 0.61 0.61 Precision 0.34 0.56 0.70 0.44 0.56 0.50 0.45 0.58 0.61 fraction 100 Recall 0.82 0.80 0.90 0.63 0.85 0.95 0.85 0.85 0.88 Precision 0.39 0.64 0.75 0.44 0.69 0.65 0.69 0.71 0.75 Metabolomics data sets M1 Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 10 of 11 http://www.biomedcentral.com/1471-2105/11/395 Table 3: Performance comparison of MZmine 2 alignment methods (right side of the table) to previously published results (left side of the table) obtained using several different software packages [11] (Continued) Recall 0.27 0.92 0.87 - 0.88 0.98 0.94 0.90 0.91 Precision 0.46 0.73 0.69 - 0.70 0.60 0.70 0.74 0.74 M2 Recall 0.23 0.98 0.93 - 0.93 0.97 0.98 0.98 0.98 Precision 0.47 0.84 0.79 - 0.79 0.58 0.78 0.83 0.83 Several other software packages have been introduced Additional material for LC-MS based data processing, such as XCMS [22], Trans Proteomic Pipeline [23], Trequips [24], OpenMS- Additional file 1: Numerical values for Figure 6. Precision and recall TOPP [25], and ProteoWizard [26]. None of these tools, values of RANSAC and Join aligner results for 12 synthetic data sets. however, share the same goals with MZmine 2, most of them being command-line oriented with fixed feature sets, aiming specifically for either proteomic or metabo- Abbreviations lomic research. Rather then a single piece of software, AW: Alignment window; LC-MS: Liquid chromatography-mass spectrometry; MS: Mass spectrometry; RANSAC: Random sample consensus; RW: RANSAC the developmental aim of MZmine 2 is to create a uni- window versal platform through which researchers can contri- bute individual processing modules and implement and Acknowledgements We thank the present and past MZmine 2 contributors Mikko Katajamaa, share novel ideas, spanning over multiple research fields Yosuke Kawasaki, Jarkko Miettinen, John Rush, Marco Schaerfke, and Sasha and analytical methods. Tkachev. We also thank the Okinawa Institute of Science and Technology MZmine 2 is available for download at the project Promotion Corporation for providing the funding and Mitsuhiro Yanagida for supporting the MZmine 2 development in his laboratory. We are very WWW site, together with a printable manual, an ani- grateful to the developers of open-source libraries such as JFreeChart, VisAD, mated tutorial, a module development tutorial, and Jmol, and CDK. This work was in part supported by the EU-funded project further relevant project information such as a source ETHERPATHS (FP7-KBBE-222639, http://www.etherpaths.org/). code repository and developers’ mailing list. The current Author details version of the framework is already suitable for proces- G0 Cell Unit, Okinawa Institute of Science and Technology (OIST), Onna, sing large batches of data, both for targeted and/or non- Okinawa, Japan. Quantitative Biology and Bioinformatics, VTT Technical Research Centre of Finland, Espoo, Finland. targeted analyses, and has been applied in metabolomic research [13,27]. Authors’ contributions TP designed the data model and overall architecture of the MZmine 2 framework and implemented most of the raw data visualization and peak Dataset download identification modules. SC implemented the project serialization and The data associated with this manuscript may be down- RANSAC aligner. AVB implemented the peak detection module with loaded from ProteomeCommons.org Tranche using the previews, scatter plot and histogram visualizers, and isotope pattern support, and contributed to the online database search module development. MO following hash: participated in software testing and provided feedback on the framework design. All authors read and approved the final manuscript. X19bvFk4++ SVz0ngXab4YQ Qu389r / SBAOevlKh2f5bNyxDnvYiOQhqmU0r+ + rIknzgCsg8SNWWJVWtlhURkA+= eoea8MAAAAAAABm9w= Received: 6 May 2010 Accepted: 23 July 2010 Published: 23 July 2010 The hash may be used to validate the files were pub- References lished as part of this manuscript’s dataset, and to check 1. Katajamaa M, Oresic M: Data processing for mass spectrometry-based that the data have not changed since publication. metabolomics. J Chromatogr A 2007, 1158(1-2):318-328. 2. Orchard S, Hoogland C, Bairoch A, Eisenacher M, Kraus HJ, Binz PA: Managing the data explosion. A report on the HUPO-PSI Workshop. Availability and requirements August 2008, Amsterdam, The Netherlands. Proteomics 2009, 9(3):499-501. � Project name: MZmine 2 3. Katajamaa M, Miettinen J, Oresic M: MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data. � Project home page: http://mzmine.sourceforge.net Bioinformatics 2006, 22(5):634-636. � Operating system(s): Platform independent 4. Katajamaa M, Oresic M: Processing methods for differential analysis of � Programming language: Java LC/MS profile data. BMC Bioinformatics 2005, 6:179. 5. Laaksonen R, Katajamaa M, Paiva H, Sysi-Aho M, Saarinen L, Junni P, � Other requirements: Java Runtime Environment Lutjohann D, Smet J, Van Coster R, Seppanen-Laakso T, Lehtimäki T, Soini J, (JRE) 1.6, Java3D Oresic M: A systems biology strategy reveals biological pathways and � License: GNU GPL Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 11 of 11 http://www.biomedcentral.com/1471-2105/11/395 plasma biomarker candidates for potentially toxic statin-induced 26. Kessner D, Chambers M, Burke R, Agus D, Mallick P: ProteoWizard: open changes in muscle. PLoS One 2006, 1:e97. source software for rapid proteomics tools development. Bioinformatics 6. Oresic M, Simell S, Sysi-Aho M, Nanto-Salonen K, Seppanen-Laakso T, 2008, 24(21):2534-2536. Parikka V, Katajamaa M, Hekkala A, Mattila I, Keskinen P, Yetukuri L, 27. Oresic M, Seppanen-Laakso T, Yetukuri L, Backhed F, Hanninen V: Gut Reinikainen A, Lähde J, Suortti T, Hakalax J, Simell T, Hyöty H, Veijola R, microbiota affects lens and retinal lipid composition. Exp Eye Res 2009, Ilonen J, Lahesmaa R, Knip M, Simell O: Dysregulation of lipid and amino 89(5):604-607. acid metabolism precedes islet autoimmunity in children who later doi:10.1186/1471-2105-11-395 progress to type 1 diabetes. J Exp Med 2008, 205(13):2975-2984. Cite this article as: Pluskal et al.: MZmine 2: Modular framework for 7. Gopalacharyulu PV, Velagapudi VR, Lindfors E, Halperin E, Oresic M: processing, visualizing, and analyzing mass spectrometry-based Dynamic network topology changes in functional modules predict molecular profile data. BMC Bioinformatics 2010 11:395. responses to oxidative stress in yeast. Mol Biosyst 2009, 5(3):276-287. 8. Medina-Gomez G, Gray SL, Yetukuri L, Shimomura K, Virtue S, Campbell M, Curtis RK, Jimenez-Linan M, Blount M, Yeo GS, Lopez M, Seppänen-Laakso T, Ashcroft FM, Oresic M, Vidal-Puig A: PPAR gamma 2 prevents lipotoxicity by controlling adipose tissue expandability and peripheral lipid metabolism. PLoS Genet 2007, 3(4):e64. 9. Kind T, Tolstikov V, Fiehn O, Weiss RH: A comprehensive urinary metabolomic approach for identifying kidney cancer. Anal Biochem 2007, 363(2):185-195. 10. Timischl B, Dettmer K, Kaspar H, Thieme M, Oefner PJ: Development of a quantitative, validated capillary electrophoresis-time of flight-mass spectrometry method with integrated high-confidence analyte identification for metabolomics. Electrophoresis 2008, 29(10):2203-2214. 11. Lange E, Tautenhahn R, Neumann S, Gropl C: Critical assessment of alignment procedures for LC-MS proteomics and metabolomics measurements. BMC Bioinformatics 2008, 9(1):375. 12. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E: The Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformatics. J Chem Inf Comput Sci 2003, 43(2):493-500. 13. Pluskal T, Nakamura T, Villar-Briones A, Yanagida M: Metabolic profiling of the fission yeast S. pombe: quantification of compounds under different temperatures and genetic perturbation. Mol Biosyst 2010, 6(1):182-198. 14. Tautenhahn R, Bottcher C, Neumann S: Highly sensitive feature detection for high resolution LC/MS. BMC Bioinformatics 2008, 9:504. 15. Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH: PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res 2009, , 37 Web Server: W623-633. 16. Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000, 28(1):27-30. 17. Smith CA, O’Maille G, Want EJ, Qin C, Trauger SA, Brandon TR, Custodio DE, Abagyan R, Siuzdak G: METLIN: a metabolite mass spectral database. Ther Drug Monit 2005, 27(6):747-751. 18. Wishart DS, Knox C, Guo AC, Eisner R, Young N, Gautam B, Hau DD, Psychogios N, Dong E, Bouatra S, Mandal R, Sinelnikov I, Xia J, Jia L, Cruz JA, Lim E, Sobsey CA, Shrivastava S, Huang P, Liu P, Fang L, Peng J, Fradette R, Cheng D, Tzur D, Clements M, Lewis A, De Souza A, Zuniga A, Dawe M, Xiong Y, Clive D, Greiner R, Nazyrova A, Shaykhutdinov R, Li L, Vogel HJ, Forsythe I: HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res 2009, , 37 Database: D603-610. 19. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20(18):3551-3567. 20. Fischler MA, Bolles RC: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Comm Of the ACM 1981, 24:381-395. 21. Cleveland WS, Devlin SJ: Locally weighted regression - an approach to regression-analysis by local fitting. J Am Stat Assoc 1988, 83(403):596-610. 22. Benton HP, Wong DM, Trauger SA, Siuzdak G: XCMS2: processing tandem mass spectrometry data for metabolite identification and structural Submit your next manuscript to BioMed Central characterization. Anal Chem 2008, 80(16):6382-6389. and take full advantage of: 23. Keller A, Eng J, Zhang N, Li XJ, Aebersold R: A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol 2005, • Convenient online submission 1:2005.0017. 24. Gehlenborg N, Yan W, Lee IY, Yoo H, Nieselt K, Hwang D, Aebersold R, • Thorough peer review Hood L: Prequips–an extensible software platform for integration, • No space constraints or color ﬁgure charges visualization and analysis of LC-MS/MS proteomics data. Bioinformatics • Immediate publication on acceptance 2009, 25(5):682-683. 25. Kohlbacher O, Reinert K, Gropl C, Lange E, Pfeifer N, Schulz-Trieglaff O, • Inclusion in PubMed, CAS, Scopus and Google Scholar Sturm M: TOPP–the OpenMS proteomics pipeline. Bioinformatics 2007, • Research which is freely available for redistribution 23(2):e191-197. Submit your manuscript at www.biomedcentral.com/submit http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BMC Bioinformatics Springer Journals http://www.deepdyve.com/lp/springer-journals/mzmine-2-modular-framework-for-processing-visualizing-and-analyzing-bmEnK0UdL0

Loading next page...

References (61)

(GehlenborgNYanWLeeIYYooHNieseltKHwangDAebersoldRHoodLPrequips--an extensible software platform for integration, visualization and analysis of LC-MS/MS proteomics dataBioinformatics200925568268310.1093/bioinformatics/btp00519129212)
GehlenborgNYanWLeeIYYooHNieseltKHwangDAebersoldRHoodLPrequips--an extensible software platform for integration, visualization and analysis of LC-MS/MS proteomics dataBioinformatics200925568268310.1093/bioinformatics/btp00519129212
GehlenborgNYanWLeeIYYooHNieseltKHwangDAebersoldRHoodLPrequips--an extensible software platform for integration, visualization and analysis of LC-MS/MS proteomics dataBioinformatics200925568268310.1093/bioinformatics/btp00519129212, GehlenborgNYanWLeeIYYooHNieseltKHwangDAebersoldRHoodLPrequips--an extensible software platform for integration, visualization and analysis of LC-MS/MS proteomics dataBioinformatics200925568268310.1093/bioinformatics/btp00519129212
(OresicMSimellSSysi-AhoMNanto-SalonenKSeppanen-LaaksoTParikkaVKatajamaaMHekkalaAMattilaIKeskinenPYetukuriLReinikainenALähdeJSuorttiTHakalaxJSimellTHyötyHVeijolaRIlonenJLahesmaaRKnipMSimellODysregulation of lipid and amino acid metabolism precedes islet autoimmunity in children who later progress to type 1 diabetesJ Exp Med2008205132975298410.1084/jem.2008180019075291)
OresicMSimellSSysi-AhoMNanto-SalonenKSeppanen-LaaksoTParikkaVKatajamaaMHekkalaAMattilaIKeskinenPYetukuriLReinikainenALähdeJSuorttiTHakalaxJSimellTHyötyHVeijolaRIlonenJLahesmaaRKnipMSimellODysregulation of lipid and amino acid metabolism precedes islet autoimmunity in children who later progress to type 1 diabetesJ Exp Med2008205132975298410.1084/jem.2008180019075291
OresicMSimellSSysi-AhoMNanto-SalonenKSeppanen-LaaksoTParikkaVKatajamaaMHekkalaAMattilaIKeskinenPYetukuriLReinikainenALähdeJSuorttiTHakalaxJSimellTHyötyHVeijolaRIlonenJLahesmaaRKnipMSimellODysregulation of lipid and amino acid metabolism precedes islet autoimmunity in children who later progress to type 1 diabetesJ Exp Med2008205132975298410.1084/jem.2008180019075291, OresicMSimellSSysi-AhoMNanto-SalonenKSeppanen-LaaksoTParikkaVKatajamaaMHekkalaAMattilaIKeskinenPYetukuriLReinikainenALähdeJSuorttiTHakalaxJSimellTHyötyHVeijolaRIlonenJLahesmaaRKnipMSimellODysregulation of lipid and amino acid metabolism precedes islet autoimmunity in children who later progress to type 1 diabetesJ Exp Med2008205132975298410.1084/jem.2008180019075291
W. Cleveland, S. Devlin (1988)
Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting
Journal of the American Statistical Association, 83
Article number: 2005.0017
(LangeETautenhahnRNeumannSGroplCCritical assessment of alignment procedures for LC-MS proteomics and metabolomics measurementsBMC Bioinformatics20089137510.1186/1471-2105-9-37518793413)
LangeETautenhahnRNeumannSGroplCCritical assessment of alignment procedures for LC-MS proteomics and metabolomics measurementsBMC Bioinformatics20089137510.1186/1471-2105-9-37518793413
LangeETautenhahnRNeumannSGroplCCritical assessment of alignment procedures for LC-MS proteomics and metabolomics measurementsBMC Bioinformatics20089137510.1186/1471-2105-9-37518793413, LangeETautenhahnRNeumannSGroplCCritical assessment of alignment procedures for LC-MS proteomics and metabolomics measurementsBMC Bioinformatics20089137510.1186/1471-2105-9-37518793413
(BentonHPWongDMTraugerSASiuzdakGXCMS2: processing tandem mass spectrometry data for metabolite identification and structural characterizationAnal Chem200880166382638910.1021/ac800795f18627180)
BentonHPWongDMTraugerSASiuzdakGXCMS2: processing tandem mass spectrometry data for metabolite identification and structural characterizationAnal Chem200880166382638910.1021/ac800795f18627180
BentonHPWongDMTraugerSASiuzdakGXCMS2: processing tandem mass spectrometry data for metabolite identification and structural characterizationAnal Chem200880166382638910.1021/ac800795f18627180, BentonHPWongDMTraugerSASiuzdakGXCMS2: processing tandem mass spectrometry data for metabolite identification and structural characterizationAnal Chem200880166382638910.1021/ac800795f18627180
R. Hooper, B. Aulenbach (1993)
Managing the Data Explosion
Civil Engineering, 63
(KohlbacherOReinertKGroplCLangeEPfeiferNSchulz-TrieglaffOSturmMTOPP--the OpenMS proteomics pipelineBioinformatics2007232e19119710.1093/bioinformatics/btl29917237091)
KohlbacherOReinertKGroplCLangeEPfeiferNSchulz-TrieglaffOSturmMTOPP--the OpenMS proteomics pipelineBioinformatics2007232e19119710.1093/bioinformatics/btl29917237091
KohlbacherOReinertKGroplCLangeEPfeiferNSchulz-TrieglaffOSturmMTOPP--the OpenMS proteomics pipelineBioinformatics2007232e19119710.1093/bioinformatics/btl29917237091, KohlbacherOReinertKGroplCLangeEPfeiferNSchulz-TrieglaffOSturmMTOPP--the OpenMS proteomics pipelineBioinformatics2007232e19119710.1093/bioinformatics/btl29917237091
T. Pluskal, Takahiro Nakamura, Alejandro Villar-Briones, M. Yanagida (2009)
Metabolic profiling of the fission yeast S. pombe: quantification of compounds under different temperatures and genetic perturbation.
Molecular bioSystems, 6 1
(PerkinsDNPappinDJCreasyDMCottrellJSProbability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis199920183551356710.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-210612281)
PerkinsDNPappinDJCreasyDMCottrellJSProbability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis199920183551356710.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-210612281
PerkinsDNPappinDJCreasyDMCottrellJSProbability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis199920183551356710.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-210612281, PerkinsDNPappinDJCreasyDMCottrellJSProbability-based protein identification by searching sequence databases using mass spectrometry dataElectrophoresis199920183551356710.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-210612281
(GopalacharyuluPVVelagapudiVRLindforsEHalperinEOresicMDynamic network topology changes in functional modules predict responses to oxidative stress in yeastMol Biosyst20095327628710.1039/b815347g19225619)
GopalacharyuluPVVelagapudiVRLindforsEHalperinEOresicMDynamic network topology changes in functional modules predict responses to oxidative stress in yeastMol Biosyst20095327628710.1039/b815347g19225619
GopalacharyuluPVVelagapudiVRLindforsEHalperinEOresicMDynamic network topology changes in functional modules predict responses to oxidative stress in yeastMol Biosyst20095327628710.1039/b815347g19225619, GopalacharyuluPVVelagapudiVRLindforsEHalperinEOresicMDynamic network topology changes in functional modules predict responses to oxidative stress in yeastMol Biosyst20095327628710.1039/b815347g19225619
Colin Smith, Grace Maille, E. Want, Chuan Qin, S. Trauger, T. Brandon, D. Custodio, R. Abagyan, G. Siuzdak (2005)
METLIN: A Metabolite Mass Spectral Database
Therapeutic Drug Monitoring, 27
S. Orchard, C. Hoogland, A. Bairoch, M. Eisenacher, Hans-Joachim Kraus, P. Binz (2009)
Managing the Data Explosion A Report on the HUPO‐PSI Workshop August 2008, Amsterdam, The Netherlands
PROTEOMICS, 9
(KatajamaaMMiettinenJOresicMMZmine: toolbox for processing and visualization of mass spectrometry based molecular profile dataBioinformatics200622563463610.1093/bioinformatics/btk03916403790)
KatajamaaMMiettinenJOresicMMZmine: toolbox for processing and visualization of mass spectrometry based molecular profile dataBioinformatics200622563463610.1093/bioinformatics/btk03916403790
KatajamaaMMiettinenJOresicMMZmine: toolbox for processing and visualization of mass spectrometry based molecular profile dataBioinformatics200622563463610.1093/bioinformatics/btk03916403790, KatajamaaMMiettinenJOresicMMZmine: toolbox for processing and visualization of mass spectrometry based molecular profile dataBioinformatics200622563463610.1093/bioinformatics/btk03916403790
(TimischlBDettmerKKasparHThiemeMOefnerPJDevelopment of a quantitative, validated capillary electrophoresis-time of flight-mass spectrometry method with integrated high-confidence analyte identification for metabolomicsElectrophoresis200829102203221410.1002/elps.20070051718409164)
TimischlBDettmerKKasparHThiemeMOefnerPJDevelopment of a quantitative, validated capillary electrophoresis-time of flight-mass spectrometry method with integrated high-confidence analyte identification for metabolomicsElectrophoresis200829102203221410.1002/elps.20070051718409164
TimischlBDettmerKKasparHThiemeMOefnerPJDevelopment of a quantitative, validated capillary electrophoresis-time of flight-mass spectrometry method with integrated high-confidence analyte identification for metabolomicsElectrophoresis200829102203221410.1002/elps.20070051718409164, TimischlBDettmerKKasparHThiemeMOefnerPJDevelopment of a quantitative, validated capillary electrophoresis-time of flight-mass spectrometry method with integrated high-confidence analyte identification for metabolomicsElectrophoresis200829102203221410.1002/elps.20070051718409164
M. Fischler, R. Bolles (1981)
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography
Commun. ACM, 24
Yanli Wang, Jewen Xiao, Tugba Suzek, Jian Zhang, Jiyao Wang, S. Bryant (2009)
PubChem: a public information system for analyzing bioactivities of small molecules
Nucleic Acids Research, 37
(KatajamaaMOresicMProcessing methods for differential analysis of LC/MS profile dataBMC Bioinformatics2005617910.1186/1471-2105-6-17916026613)
KatajamaaMOresicMProcessing methods for differential analysis of LC/MS profile dataBMC Bioinformatics2005617910.1186/1471-2105-6-17916026613
KatajamaaMOresicMProcessing methods for differential analysis of LC/MS profile dataBMC Bioinformatics2005617910.1186/1471-2105-6-17916026613, KatajamaaMOresicMProcessing methods for differential analysis of LC/MS profile dataBMC Bioinformatics2005617910.1186/1471-2105-6-17916026613
O. Kohlbacher, K. Reinert, C. Gröpl, E. Lange, N. Pfeifer, Ole Schulz-Trieglaff, M. Sturm (2007)
TOPP - the OpenMS proteomics pipeline
Bioinformatics, 23 2
M. Katajamaa, M. Orešič (2007)
Data processing for mass spectrometry-based metabolomics.
Journal of chromatography. A, 1158 1-2
(OrchardSHooglandCBairochAEisenacherMKrausHJBinzPAManaging the data explosion. A report on the HUPO-PSI Workshop. August 2008, Amsterdam, The NetherlandsProteomics20099349950110.1002/pmic.20080083819132688)
OrchardSHooglandCBairochAEisenacherMKrausHJBinzPAManaging the data explosion. A report on the HUPO-PSI Workshop. August 2008, Amsterdam, The NetherlandsProteomics20099349950110.1002/pmic.20080083819132688
OrchardSHooglandCBairochAEisenacherMKrausHJBinzPAManaging the data explosion. A report on the HUPO-PSI Workshop. August 2008, Amsterdam, The NetherlandsProteomics20099349950110.1002/pmic.20080083819132688, OrchardSHooglandCBairochAEisenacherMKrausHJBinzPAManaging the data explosion. A report on the HUPO-PSI Workshop. August 2008, Amsterdam, The NetherlandsProteomics20099349950110.1002/pmic.20080083819132688
(KindTTolstikovVFiehnOWeissRHA comprehensive urinary metabolomic approach for identifying kidney cancerAnal Biochem2007363218519510.1016/j.ab.2007.01.02817316536)
KindTTolstikovVFiehnOWeissRHA comprehensive urinary metabolomic approach for identifying kidney cancerAnal Biochem2007363218519510.1016/j.ab.2007.01.02817316536
KindTTolstikovVFiehnOWeissRHA comprehensive urinary metabolomic approach for identifying kidney cancerAnal Biochem2007363218519510.1016/j.ab.2007.01.02817316536, KindTTolstikovVFiehnOWeissRHA comprehensive urinary metabolomic approach for identifying kidney cancerAnal Biochem2007363218519510.1016/j.ab.2007.01.02817316536
H. Benton, Diana Wong, S. Trauger, Gary Siuzdak (2008)
XCMS2: processing tandem mass spectrometry data for metabolite identification and structural characterization.
Analytical chemistry, 80 16
(FischlerMABollesRCRandom sample consensus: a paradigm for model fitting with applications to image analysis and automated cartographyComm Of the ACM19812438139510.1145/358669.358692)
FischlerMABollesRCRandom sample consensus: a paradigm for model fitting with applications to image analysis and automated cartographyComm Of the ACM19812438139510.1145/358669.358692
FischlerMABollesRCRandom sample consensus: a paradigm for model fitting with applications to image analysis and automated cartographyComm Of the ACM19812438139510.1145/358669.358692, FischlerMABollesRCRandom sample consensus: a paradigm for model fitting with applications to image analysis and automated cartographyComm Of the ACM19812438139510.1145/358669.358692
M Kanehisa (2000)
10.1093/nar/28.1.27
Nucleic Acids Res, 28
M. Katajamaa, Jarkko Miettinen, M. Orešič (2006)
MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data
Bioinformatics, 22 5
T. Kind, V. Tolstikov, O. Fiehn, R. Weiss (2007)
A comprehensive urinary metabolomic approach for identifying kidney cancerr.
Analytical biochemistry, 363 2
DS Wishart, C Knox, AC Guo, R Eisner, N Young, B Gautam, DD Hau, N Psychogios, E Dong, S Bouatra, R Mandal, I Sinelnikov, J Xia, L Jia, JA Cruz, E Lim, CA Sobsey, S Shrivastava, P Huang, P Liu, L Fang, J Peng, R Fradette, D Cheng, D Tzur, M Clements, A Lewis, A De Souza, A Zuniga, M Dawe, Y Xiong, D Clive, R Greiner, A Nazyrova, R Shaykhutdinov, L Li, HJ Vogel, I Forsythe (2009)
Nucleic Acids Res
R. Laaksonen, M. Katajamaa, Hannu Päivä, M. Sysi-Aho, L. Saarinen, P. Junni, D. Lütjohann, J. Smet, R. Coster, T. Seppänen-Laakso, T. Lehtimäki, J. Soini, M. Orešič (2006)
A Systems Biology Strategy Reveals Biological Pathways and Plasma Biomarker Candidates for Potentially Toxic Statin-Induced Changes in Muscle
PLoS ONE, 1
(LaaksonenRKatajamaaMPaivaHSysi-AhoMSaarinenLJunniPLutjohannDSmetJVan CosterRSeppanen-LaaksoTLehtimäkiTSoiniJOresicMA systems biology strategy reveals biological pathways and plasma biomarker candidates for potentially toxic statin-induced changes in musclePLoS One20061e9710.1371/journal.pone.000009717183729)
LaaksonenRKatajamaaMPaivaHSysi-AhoMSaarinenLJunniPLutjohannDSmetJVan CosterRSeppanen-LaaksoTLehtimäkiTSoiniJOresicMA systems biology strategy reveals biological pathways and plasma biomarker candidates for potentially toxic statin-induced changes in musclePLoS One20061e9710.1371/journal.pone.000009717183729
LaaksonenRKatajamaaMPaivaHSysi-AhoMSaarinenLJunniPLutjohannDSmetJVan CosterRSeppanen-LaaksoTLehtimäkiTSoiniJOresicMA systems biology strategy reveals biological pathways and plasma biomarker candidates for potentially toxic statin-induced changes in musclePLoS One20061e9710.1371/journal.pone.000009717183729, LaaksonenRKatajamaaMPaivaHSysi-AhoMSaarinenLJunniPLutjohannDSmetJVan CosterRSeppanen-LaaksoTLehtimäkiTSoiniJOresicMA systems biology strategy reveals biological pathways and plasma biomarker candidates for potentially toxic statin-induced changes in musclePLoS One20061e9710.1371/journal.pone.000009717183729
N. Gehlenborg, Wei Yan, Inyoul Lee, Hyuntae Yoo, K. Nieselt, D. Hwang, R. Aebersold, L. Hood (2009)
Prequips - an extensible software platform for integration, visualization and analysis of LC-MS/MS proteomics data
Bioinformatics, 25 5
D Kessner, M Chambers, R Burke, D Agus, P Mallick (2008)
ProteoWizard: open source software for rapid proteomics tools development
Bioinformatics, 24
(WishartDSKnoxCGuoACEisnerRYoungNGautamBHauDDPsychogiosNDongEBouatraSMandalRSinelnikovIXiaJJiaLCruzJALimESobseyCAShrivastavaSHuangPLiuPFangLPengJFradetteRChengDTzurDClementsMLewisADe SouzaAZunigaADaweMXiongYCliveDGreinerRNazyrovaAShaykhutdinovRLiLVogelHJForsytheIHMDB: a knowledgebase for the human metabolomeNucleic Acids Res200937 DatabaseD60361010.1093/nar/gkn81018953024)
WishartDSKnoxCGuoACEisnerRYoungNGautamBHauDDPsychogiosNDongEBouatraSMandalRSinelnikovIXiaJJiaLCruzJALimESobseyCAShrivastavaSHuangPLiuPFangLPengJFradetteRChengDTzurDClementsMLewisADe SouzaAZunigaADaweMXiongYCliveDGreinerRNazyrovaAShaykhutdinovRLiLVogelHJForsytheIHMDB: a knowledgebase for the human metabolomeNucleic Acids Res200937 DatabaseD60361010.1093/nar/gkn81018953024
WishartDSKnoxCGuoACEisnerRYoungNGautamBHauDDPsychogiosNDongEBouatraSMandalRSinelnikovIXiaJJiaLCruzJALimESobseyCAShrivastavaSHuangPLiuPFangLPengJFradetteRChengDTzurDClementsMLewisADe SouzaAZunigaADaweMXiongYCliveDGreinerRNazyrovaAShaykhutdinovRLiLVogelHJForsytheIHMDB: a knowledgebase for the human metabolomeNucleic Acids Res200937 DatabaseD60361010.1093/nar/gkn81018953024, WishartDSKnoxCGuoACEisnerRYoungNGautamBHauDDPsychogiosNDongEBouatraSMandalRSinelnikovIXiaJJiaLCruzJALimESobseyCAShrivastavaSHuangPLiuPFangLPengJFradetteRChengDTzurDClementsMLewisADe SouzaAZunigaADaweMXiongYCliveDGreinerRNazyrovaAShaykhutdinovRLiLVogelHJForsytheIHMDB: a knowledgebase for the human metabolomeNucleic Acids Res200937 DatabaseD60361010.1093/nar/gkn81018953024
D. Wishart, Craig Knox, Anchi Guo, Roman Eisner, N. Young, Bijaya Gautam, D. Hau, N. Psychogios, Edison Dong, Souhaila Bouatra, R. Mandal, I. Sinelnikov, J. Xia, Leslie Jia, Joseph Cruz, Emilia Lim, Constance Sobsey, S. Shrivastava, Paul Huang, Philip Liu, Lydia Fang, Jun Peng, R. Fradette, D. Cheng, D. Tzur, M. Clements, A. Lewis, Andrea Souza, Azaret Zuniga, Margot Dawe, Yeping Xiong, D. Clive, R. Greiner, A. Nazyrova, R. Shaykhutdinov, Liang Li, H. Vogel, I. Forsythe (2008)
HMDB: a knowledgebase for the human metabolome
Nucleic Acids Research, 37
A Keller (2005)
10.1038/msb4100024
Mol Syst Biol, 1
(KatajamaaMOresicMData processing for mass spectrometry-based metabolomicsJ Chromatogr A200711581-231832810.1016/j.chroma.2007.04.02117466315)
KatajamaaMOresicMData processing for mass spectrometry-based metabolomicsJ Chromatogr A200711581-231832810.1016/j.chroma.2007.04.02117466315
KatajamaaMOresicMData processing for mass spectrometry-based metabolomicsJ Chromatogr A200711581-231832810.1016/j.chroma.2007.04.02117466315, KatajamaaMOresicMData processing for mass spectrometry-based metabolomicsJ Chromatogr A200711581-231832810.1016/j.chroma.2007.04.02117466315
D. Perkins, D. Pappin, D. Creasy, J. Cottrell (1999)
Probability‐based protein identification by searching sequence databases using mass spectrometry data
ELECTROPHORESIS, 20
G. Medina-Gómez, Sarah Gray, L. Yetukuri, Kenju Shimomura, S. Virtue, M. Campbell, R. Curtis, M. Jimenez-Linan, M. Blount, Giles Yeo, Miguel López, T. Seppänen-Laakso, Frances Ashcroft, M. Orešič, A. Vidal-Puig (2007)
PPAR gamma 2 Prevents Lipotoxicity by Controlling Adipose Tissue Expandability and Peripheral Lipid Metabolism
PLoS Genetics, 3
(ClevelandWSDevlinSJLocally weighted regression - an approach to regression-analysis by local fittingJ Am Stat Assoc19888340359661010.2307/2289282)
ClevelandWSDevlinSJLocally weighted regression - an approach to regression-analysis by local fittingJ Am Stat Assoc19888340359661010.2307/2289282
ClevelandWSDevlinSJLocally weighted regression - an approach to regression-analysis by local fittingJ Am Stat Assoc19888340359661010.2307/2289282, ClevelandWSDevlinSJLocally weighted regression - an approach to regression-analysis by local fittingJ Am Stat Assoc19888340359661010.2307/2289282
(SmithCAO'MailleGWantEJQinCTraugerSABrandonTRCustodioDEAbagyanRSiuzdakGMETLIN: a metabolite mass spectral databaseTher Drug Monit200527674775110.1097/01.ftd.0000179845.53213.3916404815)
SmithCAO'MailleGWantEJQinCTraugerSABrandonTRCustodioDEAbagyanRSiuzdakGMETLIN: a metabolite mass spectral databaseTher Drug Monit200527674775110.1097/01.ftd.0000179845.53213.3916404815
SmithCAO'MailleGWantEJQinCTraugerSABrandonTRCustodioDEAbagyanRSiuzdakGMETLIN: a metabolite mass spectral databaseTher Drug Monit200527674775110.1097/01.ftd.0000179845.53213.3916404815, SmithCAO'MailleGWantEJQinCTraugerSABrandonTRCustodioDEAbagyanRSiuzdakGMETLIN: a metabolite mass spectral databaseTher Drug Monit200527674775110.1097/01.ftd.0000179845.53213.3916404815
Ralf Tautenhahn, C. Böttcher, S. Neumann (2008)
Highly sensitive feature detection for high resolution LC/MS
BMC Bioinformatics, 9
Birgit Timischl, K. Dettmer, Hannelore Kaspar, Marian Thieme, P. Oefner (2008)
Development of a quantitative, validated Capillary electrophoresis‐time of flight – mass spectrometry method with integrated high‐confidence analyte identification for metabolomics
ELECTROPHORESIS, 29
P. Gopalacharyulu, V. Velagapudi, Erno Lindfors, E. Halperin, M. Orešič (2009)
Dynamic network topology changes in functional modules predict responses to oxidative stress in yeast.
Molecular bioSystems, 5 3
H. Ogata, S. Goto, Kazushige Sato, W. Fujibuchi, H. Bono, M. Kanehisa (1999)
KEGG: Kyoto Encyclopedia of Genes and Genomes
Nucleic acids research, 27 1
(TautenhahnRBottcherCNeumannSHighly sensitive feature detection for high resolution LC/MSBMC Bioinformatics2008950410.1186/1471-2105-9-50419040729)
TautenhahnRBottcherCNeumannSHighly sensitive feature detection for high resolution LC/MSBMC Bioinformatics2008950410.1186/1471-2105-9-50419040729
TautenhahnRBottcherCNeumannSHighly sensitive feature detection for high resolution LC/MSBMC Bioinformatics2008950410.1186/1471-2105-9-50419040729, TautenhahnRBottcherCNeumannSHighly sensitive feature detection for high resolution LC/MSBMC Bioinformatics2008950410.1186/1471-2105-9-50419040729
A Keller, J Eng, N Zhang, XJ Li, R Aebersold (2005)
A uniform proteomics MS/MS analysis platform utilizing open XML file formats
Mol Syst Biol, 1
(KanehisaMGotoSKEGG: kyoto encyclopedia of genes and genomesNucleic Acids Res2000281273010.1093/nar/28.1.2710592173)
KanehisaMGotoSKEGG: kyoto encyclopedia of genes and genomesNucleic Acids Res2000281273010.1093/nar/28.1.2710592173
KanehisaMGotoSKEGG: kyoto encyclopedia of genes and genomesNucleic Acids Res2000281273010.1093/nar/28.1.2710592173, KanehisaMGotoSKEGG: kyoto encyclopedia of genes and genomesNucleic Acids Res2000281273010.1093/nar/28.1.2710592173
M. Orešič, S. Simell, M. Sysi-Aho, K. Näntö-Salonen, T. Seppänen-Laakso, V. Parikka, M. Katajamaa, A. Hekkala, I. Mattila, P. Keskinen, L. Yetukuri, A. Reinikainen, J. Lähde, T. Suortti, J. Hakalax, T. Simell, H. Hyöty, R. Veijola, J. Ilonen, R. Lahesmaa, M. Knip, O. Simell (2008)
Dysregulation of lipid and amino acid metabolism precedes islet autoimmunity in children who later progress to type 1 diabetes
The Journal of Experimental Medicine, 205
E. Lange, Ralf Tautenhahn, S. Neumann, C. Gröpl (2008)
Critical assessment of alignment procedures for LC-MS proteomics and metabolomics measurements
BMC Bioinformatics, 9
(PluskalTNakamuraTVillar-BrionesAYanagidaMMetabolic profiling of the fission yeast S. pombe: quantification of compounds under different temperatures and genetic perturbationMol Biosyst20106118219810.1039/b908784b20024080)
PluskalTNakamuraTVillar-BrionesAYanagidaMMetabolic profiling of the fission yeast S. pombe: quantification of compounds under different temperatures and genetic perturbationMol Biosyst20106118219810.1039/b908784b20024080
PluskalTNakamuraTVillar-BrionesAYanagidaMMetabolic profiling of the fission yeast S. pombe: quantification of compounds under different temperatures and genetic perturbationMol Biosyst20106118219810.1039/b908784b20024080, PluskalTNakamuraTVillar-BrionesAYanagidaMMetabolic profiling of the fission yeast S. pombe: quantification of compounds under different temperatures and genetic perturbationMol Biosyst20106118219810.1039/b908784b20024080
(KessnerDChambersMBurkeRAgusDMallickPProteoWizard: open source software for rapid proteomics tools developmentBioinformatics200824212534253610.1093/bioinformatics/btn32318606607)
KessnerDChambersMBurkeRAgusDMallickPProteoWizard: open source software for rapid proteomics tools developmentBioinformatics200824212534253610.1093/bioinformatics/btn32318606607
KessnerDChambersMBurkeRAgusDMallickPProteoWizard: open source software for rapid proteomics tools developmentBioinformatics200824212534253610.1093/bioinformatics/btn32318606607, KessnerDChambersMBurkeRAgusDMallickPProteoWizard: open source software for rapid proteomics tools developmentBioinformatics200824212534253610.1093/bioinformatics/btn32318606607
M. Orešič, T. Seppänen-Laakso, L. Yetukuri, F. Bäckhed, V. Hänninen (2009)
Gut microbiota affects lens and retinal lipid composition.
Experimental eye research, 89 5
C. Steinbeck, Yongquan Han, S. Kuhn, Oliver Horlacher, Edgar Luttmann, Egon Willighagen (2003)
The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics
Journal of Chemical Information and Computer Sciences, 43
M. Katajamaa, M. Orešič (2005)
Processing methods for differential analysis of LC/MS profile data
BMC Bioinformatics, 6
D Kessner (2008)
10.1093/bioinformatics/btn323
Bioinformatics, 24
Darren Kessner, Matt Chambers, Robert Burke, D. Agus, P. Mallick
Bioinformatics Applications Note Proteowizard: Open Source Software for Rapid Proteomics Tools Development
(WangYXiaoJSuzekTOZhangJWangJBryantSHPubChem: a public information system for analyzing bioactivities of small moleculesNucleic Acids Res200937 Web ServerW62363310.1093/nar/gkp45619498078)
WangYXiaoJSuzekTOZhangJWangJBryantSHPubChem: a public information system for analyzing bioactivities of small moleculesNucleic Acids Res200937 Web ServerW62363310.1093/nar/gkp45619498078
WangYXiaoJSuzekTOZhangJWangJBryantSHPubChem: a public information system for analyzing bioactivities of small moleculesNucleic Acids Res200937 Web ServerW62363310.1093/nar/gkp45619498078, WangYXiaoJSuzekTOZhangJWangJBryantSHPubChem: a public information system for analyzing bioactivities of small moleculesNucleic Acids Res200937 Web ServerW62363310.1093/nar/gkp45619498078
(Medina-GomezGGraySLYetukuriLShimomuraKVirtueSCampbellMCurtisRKJimenez-LinanMBlountMYeoGSLopezMSeppänen-LaaksoTAshcroftFMOresicMVidal-PuigAPPAR gamma 2 prevents lipotoxicity by controlling adipose tissue expandability and peripheral lipid metabolismPLoS Genet200734e6410.1371/journal.pgen.003006417465682)
Medina-GomezGGraySLYetukuriLShimomuraKVirtueSCampbellMCurtisRKJimenez-LinanMBlountMYeoGSLopezMSeppänen-LaaksoTAshcroftFMOresicMVidal-PuigAPPAR gamma 2 prevents lipotoxicity by controlling adipose tissue expandability and peripheral lipid metabolismPLoS Genet200734e6410.1371/journal.pgen.003006417465682
Medina-GomezGGraySLYetukuriLShimomuraKVirtueSCampbellMCurtisRKJimenez-LinanMBlountMYeoGSLopezMSeppänen-LaaksoTAshcroftFMOresicMVidal-PuigAPPAR gamma 2 prevents lipotoxicity by controlling adipose tissue expandability and peripheral lipid metabolismPLoS Genet200734e6410.1371/journal.pgen.003006417465682, Medina-GomezGGraySLYetukuriLShimomuraKVirtueSCampbellMCurtisRKJimenez-LinanMBlountMYeoGSLopezMSeppänen-LaaksoTAshcroftFMOresicMVidal-PuigAPPAR gamma 2 prevents lipotoxicity by controlling adipose tissue expandability and peripheral lipid metabolismPLoS Genet200734e6410.1371/journal.pgen.003006417465682
(KellerAEngJZhangNLiXJAebersoldRA uniform proteomics MS/MS analysis platform utilizing open XML file formatsMol Syst Biol200512005.001710.1038/msb410002416729052)
KellerAEngJZhangNLiXJAebersoldRA uniform proteomics MS/MS analysis platform utilizing open XML file formatsMol Syst Biol200512005.001710.1038/msb410002416729052
KellerAEngJZhangNLiXJAebersoldRA uniform proteomics MS/MS analysis platform utilizing open XML file formatsMol Syst Biol200512005.001710.1038/msb410002416729052, KellerAEngJZhangNLiXJAebersoldRA uniform proteomics MS/MS analysis platform utilizing open XML file formatsMol Syst Biol200512005.001710.1038/msb410002416729052
(OresicMSeppanen-LaaksoTYetukuriLBackhedFHanninenVGut microbiota affects lens and retinal lipid compositionExp Eye Res200989560460710.1016/j.exer.2009.06.01819591827)
OresicMSeppanen-LaaksoTYetukuriLBackhedFHanninenVGut microbiota affects lens and retinal lipid compositionExp Eye Res200989560460710.1016/j.exer.2009.06.01819591827
OresicMSeppanen-LaaksoTYetukuriLBackhedFHanninenVGut microbiota affects lens and retinal lipid compositionExp Eye Res200989560460710.1016/j.exer.2009.06.01819591827, OresicMSeppanen-LaaksoTYetukuriLBackhedFHanninenVGut microbiota affects lens and retinal lipid compositionExp Eye Res200989560460710.1016/j.exer.2009.06.01819591827
(SteinbeckCHanYKuhnSHorlacherOLuttmannEWillighagenEThe Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformaticsJ Chem Inf Comput Sci200343249350012653513)
SteinbeckCHanYKuhnSHorlacherOLuttmannEWillighagenEThe Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformaticsJ Chem Inf Comput Sci200343249350012653513
SteinbeckCHanYKuhnSHorlacherOLuttmannEWillighagenEThe Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformaticsJ Chem Inf Comput Sci200343249350012653513, SteinbeckCHanYKuhnSHorlacherOLuttmannEWillighagenEThe Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformaticsJ Chem Inf Comput Sci200343249350012653513

Publisher: Springer Journals
Copyright: Copyright © 2010 by Pluskal et al; licensee BioMed Central Ltd.
Subject: Life Sciences; Bioinformatics; Microarrays; Computational Biology/Bioinformatics; Computer Appl. in Life Sciences; Combinatorial Libraries; Algorithms
eISSN: 1471-2105
DOI: 10.1186/1471-2105-11-395
pmid: 20650010
Publisher site: See Article on Publisher Site

Abstract

Background: Mass spectrometry (MS) coupled with online separation methods is commonly applied for differential and quantitative profiling of biological samples in metabolomic as well as proteomic research. Such approaches are used for systems biology, functional genomics, and biomarker discovery, among others. An ongoing challenge of these molecular profiling approaches, however, is the development of better data processing methods. Here we introduce a new generation of a popular open-source data processing toolbox, MZmine 2. Results: A key concept of the MZmine 2 software design is the strict separation of core functionality and data processing modules, with emphasis on easy usability and support for high-resolution spectra processing. Data processing modules take advantage of embedded visualization tools, allowing for immediate previews of parameter settings. Newly introduced functionality includes the identification of peaks using online databases, MS data support, improved isotope pattern support, scatter plot visualization, and a new method for peak list alignment based on the random sample consensus (RANSAC) algorithm. The performance of the RANSAC alignment was evaluated using synthetic datasets as well as actual experimental data, and the results were compared to those obtained using other alignment algorithms. Conclusions: MZmine 2 is freely available under a GNU GPL license and can be obtained from the project website at: http://mzmine.sourceforge.net/. The current version of MZmine 2 is suitable for processing large batches of data and has been applied to both targeted and non-targeted metabolomic analyses. Background exchange and standardization. It also underlines the Mass spectrometry (MS) coupled with online separation need for a flexible and universal software framework to methods, such as liquid chromatography (LC), is com- provide the necessary support for data import, export, monly applied for differential and quantitative profiling and visualization, thus allowing the rapid development of biological samples in metabolomic and proteomic of specialized data-processing methods. research. Such approaches are useful in the domains of MZmine was first introduced in 2005 as an open- systems biology, functional genomics, and biomarker source software toolbox for LC-MS data processing [3]. discovery. One of the ongoing challenges of such mole- The first version of MZmine defined the data analysis cular profiling approaches is the development of better workflow and implemented simple methods for data processing and visualization [3,4]. The software has data processing methods. Several software packages have been developed for this purpose, and have been been applied to numerous metabolomic analyses [5-10] extensively reviewed by Katajamaa and Orešič [1]. and comparative studies with other related software The recent introduction of mzML, an open and uni- packages have been performed [9,11]. A weakness of versal format for MS data [2], represents an important MZmine was insufficient modularity in its initial design, milestone in the effort to address the issues of MS data thus limiting the possibility of expanding the software with new methods developed by the scientific commu- * Correspondence: [email protected] nity. For this reason, the new release, MZmine 2, was G0 Cell Unit, Okinawa Institute of Science and Technology (OIST), Onna, completely redesigned to support modularity. Here we Okinawa, Japan © 2010 Pluskal et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 2 of 11 http://www.biomedcentral.com/1471-2105/11/395 describe the architecture of MZmine 2 as well as its were linked to embedded visualization modules, provid- basic features. We also introduce a new and efficient ing online previews during parameter setup. Addition- method for peak list alignment that was implemented in ally, the use of any data-processing method in MZmine MZmine 2. 2 does not remove the original (unprocessed) data, giv- ing the user the option to return back to previous Implementation results or raw data at any stage of data processing. MZmine 2 was developed using Java technology, and is The third goal was to provide good support for pro- therefore platform independent. The software has been cessing high-resolution MS data, e.g., as obtained from Orbitrap or Fourier transform ion cyclotron resonance tested on the Windows, Mac OS X, and Linux plat- forms. We focused on three main aims during the soft- MS instruments. We designed the data import and peak ware design and implementation. detection modules to maintain the precision of the First, the framework should be flexible and allow for imported data without any degradation due to inade- easy and straightforward development of new data pro- quate resampling. Because the use of high-resolution cessing modules. We addressedthisbykeeping astrict data suggests an increased data volume, MZmine 2 was separation between the application core and individual tested and optimized with large datasets (on the order modules for data processing and visualization (Figure 1). of gigabytes). A compact data model was designed and the code of The flexibility of the Java environment allows MZmine each Java class code was kept short and intuitive. To 2 to take advantage of several open-source libraries, support the development of new modules, we provided including JFreeChart (http://www.jfree.org/jfreechart/) an online tutorial available at the project web site. forthe TIC,spectra,2Dand other visualizers,VisAD Second, the graphical interface of the application (http://www.ssec.wisc.edu/~billh/visad.html) for the 3D should be intuitive and easy to use. For this purpose, visualizer, Chemistry Development Kit (CDK) [12] for critical data processing methods such as peak picking calculating isotopic distributions, JChemPaint (http:// Figure 1 MZmine 2 software architecture and its main modules Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 3 of 11 http://www.biomedcentral.com/1471-2105/11/395 jchempaint.sourceforge.net/) for rendering 2D molecular Raw data file format support structures, and Jmol (http://jmol.sourceforge.net/) for MZmine 2 can read and process both unit mass resolu- rendering 3D molecular structures. These libraries are tion and accurate mass resolution MS data in both con- included in the MZmine 2 distribution. tinuous and centroid modes, including fragmentation (MS ) scans. Raw data import is modularized and the currently supported file formats are mzML (1.0 and 1.1), Results mzXML (2.0, 2.1 and 3.0), mzData (1.04 and 1.05), The typical MS data processing workflow comprises raw NetCDF, and RAW format used natively by Thermo data file import, filtering/smoothing (optional), peak pick- Fisher Scientific instruments (requires installation of ing, peak list deisotoping, alignment, gap filling, and nor- malization [4]. The MZmine 2 modules cover all these Thermo Xcalibur). Support for other file formats can be workflow stages and also include additional functionality implemented as additional plug-ins. for the visualization and interpretation of the results. Only features new to MZmine 2 are described in this section. Data visualization MZmine 2 includes several of visualization modules Project management (Figure 2), all of which were newly implemented for this Oneofthe newcorefeaturesofMZmine2is project release. Following the goal of providing the user with an management, which allows the user to track and store intuitive interface, the visualizers automatically annotate intermediate results. Each data-processing step can be raw data with the obtained peak picking and identifica- performed multiple times with different parameters and tion results, allowing for quick orientation when large the results can be observed and compared. The data amounts of data are being processed. processing pipeline settings (e.g., algorithms and para- Quantitative results in the form of peak lists may be meters used, reference peak lists) can be stored for observed using a table visualizer or chart-plotting mod- future applications. Direct export of the peak list data to ules (Figure 2I). The scatter plot visualizer (Figure 2F) comma-separated values (CSV) or XML files is also has proven to be very useful for efficient comparison of possible. multiple samples [13]. Figure 2 Screenshot of MZmine 2 showing multiple visualization modules. The specific panels included are: (A) imported samples, (B) peak lists including single peak list contents, (C) peak shapes for an identified metabolite across multiple samples, (D) MS/MS spectrum of a metabolite, (E) combined base peak plot for multiple samples, (F) scatter plot of peak areas across two samples, (G) 2D plot of a detected peak, mass-to-charge ratio vs. retention time, (H) 3D view of a detected peak, and (I) intensity plot for specific peaks across multiple samples. Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 4 of 11 http://www.biomedcentral.com/1471-2105/11/395 Peak detection into chromatogram objects. The default algorithm pro- Feature detection is a critical step in MS data proces- vided by MZmine 2 connects m/z values in the order of sing. The peak detection methods and their implemen- their intensity, with the most intense peaks connected tations should be flexible enough to deal with great first. A chromatogram spanning a given minimal time differences in data obtained from different instruments, range is constructed for each m/z value (within user- such as variable mass resolution, chromatographic reso- defined tolerance). Each chromatogram is then deconvo- lution and peak shape, or background noise. In MZmine luted into individual chromatographic peaks (Figure 3C). 2, peak detection is performed in several customizable Several algorithms are provided as plug-ins. The “Base- steps (Figure 3). Previews are provided to allow for opti- line cut-off” algorithm recognizes each chromatographic mal selection of parameter values. peak that has an intensity above a given minimum level In the first step (Figure 3A), each MS spectrum is pro- and spans over a given minimum time range. The cessed individually and converted to pairs of m/z and “Noise amplitude” algorithm adds another parameter intensity values (in other words, each mass spectrum is specifying the intensity range, which is considered noisy. centroided). Several algorithms are provided as plug-ins, The algorithm then finds the intensity level where most each suitable for a different type of mass spectra. The of the noise is concentrated and sets the baseline level “Local maxima” algorithm is a simple algorithm suitable to this intensity, individually for each chromatogram. for demonstrating the process: it detects each local max- Following the setting of the baseline, the procedure is imum in the spectrum. The “Recursive threshold” algo- thesameasthe “Baseline cut-off” algorithm. The rithm is based on an earlier method implemented in Savitzky-Golay algorithm uses the smoothed second MZmine [3,4] and adds two additional parameters of derivative of the chromatogram curve to detect the bor- minimum and maximum peak m/z width. This method ders of individual peaks. The “Local minimum search” reduces the false positives by avoiding detection of noise algorithm attempts to identify local minima in the chro- peaks. The “Wavelet transform” algorithm is particularly matogram as border points between individual peaks. suitable for noisy data. It processes each spectrum using Several restrictions are placed on possible peak shapes, continuous wavelet transform, matching the m/z peaks such as minimum absolute and relative intensities, or a to the “Mexican hat” wavelet model. This algorithm is minimum ratio between peak maximum and edge. based on a previously reported method [14]. The “Exact We also implemented an experimental module, which mass” algorithm assumes high quality spectra (high fits the (potentially noisy) set of data points of each mass resolution, low noise) and determines the center of deconvoluted peak with an ideal peak model such as each m/z peak using the “full width at half maximum” Gaussian or Exponentially Modified Gaussian (Figure paradigm: m/z value is placed in the middle of the line, 3D). Such an approach may reduce the chromatographic which crosses the peak at half of the maximum intensity noise between samples, but the practical applicability of (as shown in the insets in Figure 3A). Finally, the “Cen- this method has not yet been thoroughly validated. troid” algorithm is suitable for already centroided data. It detects all data points above the specified noise level Peak identification as m/z peaks. Assignment of intuitive metabolite or peptide names to Data obtained by Fourier transform mass spectrometry detected m/z values greatly assists with the process of instruments provide very high mass resolution, but suf- data interpretation. In MZmine 2, identification of fer from the presence of noise signals known as peaks can be performed either by searching a custom “shoulder peaks” (Figure 3B). These peaks are residues database of m/z values and retention times, or by con- of the Fourier transform function calculated by the necting to an online resource such as PubChem [15], instrument and their intensity is usually below 5% of the KEGG [16], METLIN [17], or HMDB [18] directly intensity of the main (true) m/z peak. To remove these from the MZmine 2 interface (Figure 4). For each ion noise peaks, we introduced an optional filtration plug-in subjected to identification, its neutral molecular mass that builds a theoretical model (such as Gaussian or (m ) is calculated from its m/z value. For that pur- neutral Lorentzian) with given mass resolution around each pose, the charge of the ion (z) can be automatically peak, and removes all noise peaks below this model. determined from its isotope pattern. Ionization mode Peaks are processed in the order of decreasing intensity. (positive or negative) and ionization adduct (e.g. H , + + In the preview (Figure 3B), the main m/z signal is indi- Na ,K , etc.) are selected by the user as parameters. cated by the red color, while the shoulder peaks subject Neutral mass is then calculated as m = (m/z × z) neutral to removal are indicated in yellow. Again, it is possible ± m , where the sign (±) is defined by the ioniza- adduct to implement other filtration algorithms as plug-ins. tion mode and m is the mass of the selected ioni- adduct The next step consists of an algorithm that connects zation adduct. The neutral mass m is the primary neutral consecutive m/z values spanning over multiple scans term for database search, within user-specified Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 5 of 11 http://www.biomedcentral.com/1471-2105/11/395 Figure 3 Peak detection modules with previews. (A) Mass detection (centroiding) module. Recognized m/z peaks are shown in red. In the insets, details of a single m/z peak are shown, indicating the full width at half maximum approach to the m/z value calculation. (B) Fourier transform mass spectrometry shoulder peaks filter. In the preview panel, the main detected peak is indicated with the red line, while shoulder peaks are indicated with the yellow lines. (C) Peak deconvolution. Each individual recognized peak within the chromatogram is indicated by a different color. (D) Experimental peak shape modeler. A Gaussian peak model (pink) is fitted to the deconvoluted chromatographic peak’s data points (blue). Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 6 of 11 http://www.biomedcentral.com/1471-2105/11/395 Figure 4 Peak identification using the PubChem Compound database. (A) A peak list showing the row selected for identification. (B) Dialog for setting search parameters. (C) Table of candidates obtained from the database within a given mass tolerance. (D) 2D and 3D structural views of the candidate compound. tolerance. Isotopic pattern similarity can be used as a include outliers. The probability of obtaining a good second filter to select optimal candidates, by compar- result increases with the number of iterations. In each ing the ratios of the detected isotopes and matching iteration, a random subset of observed data points is isotopes from the predicted isotopic pattern of the selected and a model is fit to this data. In our specific database compound. Because the online identification case, we used 4 points to find a non-linear model. The module is itself modularized, support for other mole- remaining data is tested against the fitted model and if a cular databases can be easily added. For proteomic value fits well, it is considered a part of the model. applications, a module allowing identification of pep- Finally, the model is evaluated and when the iteration is tide peaks using the MASCOT [19] search engine and finished, the model with the most data points fitted to it MS/MS spectra is under development. is considered the best. The RANSAC method of alignment makes use of two user-defined two-dimensional windows, the RANSAC RANdom SAmple Consensus (RANSAC) aligner window (RW) and Alignment window (AW), respec- The purpose of peak list alignment is to match relevant peaks across multiple samples. The original MZmine tively. The RW is defined by the m/z threshold rm and software introduced a simple alignment algorithm that retention time threshold rr , and AW constitutes the first creates an empty master peak list and then aligns same m/z threshold rm but a different retention time each peak from given peak lists (samples) to the best threshold ar . The retention time threshold in RW candidate of the master list using a two-dimensional should be as big as the maximum observed deviation in alignment window (AW) represented by user-specified the retention time among all peaks. The procedure for m/z and retention time tolerances. If no suitable candi- aligning a sample S with the master list L is as follows: date is found, a new row is created in the master list. In Step 1: For every row i in L, let MZmine 2, this algorithm is referred to as the “Join r = the average retention time of all individual peaks aligner”. One disadvantage of the Join aligner is the in the row inability to cope with a non-linear deviation of the m = average m/z of all individual peaks in the row retention times among samples. For this purpose, we RW =[(m, r)| m - rm ≤ m ≤ m + rm and r - rr ≤ i i 0 i 0 i 0 introduced a new peak list alignment method based on r ≤ r + rr ], the RANSAC window for row i. i 0 the RANSAC algorithm. Then, for row i in L,markallpeaksinsampleS in The RANSAC algorithm [20] is a non-deterministic RW as candidate alignments. iterative algorithm that estimates parameters of a math- Step 2: Build a scatter plot representation of all candi- ematical model from a set of observed data, which may date alignments, and apply the RANSAC algorithm to Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 7 of 11 http://www.biomedcentral.com/1471-2105/11/395 build a candidate model for alignment. This model First, 12 synthetic datasets were created using samples represents a list of matching retention times. from 12 different lipidomic studies. A single sample Step 3: Apply the locally-weighted scatterplot smooth- from each study was used as a seed to create a synthetic ing (LOESS) method for regression [21] on all points in set of 20 samples. These 20 samples contained identical the model obtained with RANSAC. information (peaks), but a random non-linear deviation Step 4: Using this regression model, for each row i in in the retention time was introduced into each one. The L, predict the correction for the retention time shift to MZmine 2 projects of all 12 datasets are available on- locate the new center (m ,r’ ) of the alignment window line (see Dataset download). Each dataset was aligned i i using the RANSAC aligner and Join aligner with three AW . RANSAC alignment can correct the retention time deviation by centering the position of the AW to the different retention time tolerance thresholds (50 s, 20 s, correct position in the new sample. and 5 s). Parameters used for alignment are specified in Thus, the alignment window AW =[(m, r)| m - rm Table 1. Run times of the RANSAC aligner were mea- i i 0 ≤ m ≤ m + rm and r’ - ar ≤ r ≤ r’ + ar ] sured and are reported in Table 2. Precision and recall i 0 i 0 i 0 Step 5: For each row i in L, apply the Join algorithm values were calculated and the average results are for alignment using the alignment window AW . shown in Figure 6 (numerical results are available in Figure 5 shows a preview of the RANSAC alignment Additional file 1). Only the use of the RANSAC algo- in MZmine 2. Each dot represents a candidate align- rithm achieved 100% in both precision and recall perfor- ment of two peaks. Red dots represent those candidate mance on these synthetic data sets. alignments that were fitted to the best model (blue line). Our second approach for the comparison was to use the real proteomic (P1 and P2) and metabolomic (M1 and RANSAC aligner performance M2) datasets introduced by Lange et al. [11], together Two types of errors can be introduced during the align- with their tables of “ground truth” alignments and an ment process [11]. Either two non-related peaks could evaluation script for calculating the alignment precision be matched, or the matching of two related peaks could and recall values. We applied the MZmine 2 Join and be omitted. A variable called “precision” represents the RANSAC aligners to align all the datasets with the para- proportion of true alignments out of all alignments meters specified in Table 1. Run times of the RANSAC found by the algorithm. The proportion of peaks that aligner are reported in Table 2. Precision and recall are correctly aligned by the algorithm out of all true values were calculated using the provided evaluation alignments inside the dataset is called “recall“.These script and compared to already published results in two variables together represent the quality of the align- Table3.Weusedthelatest availableevaluationresults published at http://msbi.ipb-halle.de/msbi/caap at the ment. To test whether the newly introduced RANSAC algorithm performs better than the Join alignment, the time of writing. Compared to the Join aligner, the RAN- results of two different approaches were compared. SAC aligner provided better results in 11 of 13 Figure 5 RANSAC aligner. Dialog shows preview of RANSAC alignment of two peak lists using the given parameters. Each possible candidate alignment (peak pair) within a defined m/z and retention time tolerance is shown as a dot. A model is fitted to the data (blue line) and red dots indicate those fitting to the model and therefore selected for the final alignment. Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 8 of 11 http://www.biomedcentral.com/1471-2105/11/395 Table 1 Parameter values used for aligning the 12 synthetic data sets and the real proteomic (P1 and P2) and metabolomic (M1 and M2) data sets using the RANSAC and Join aligners Parameter 12 synthetic data sets Proteomics data Metabolomics data Data set P1 Data set P2 Data set M1 Data set M2 m/z tolerance 0.05 m/z 1.5 m/z 1.5 m/z 0.03 m/z 0.025 m/z RT tolerance after correction 0:25 02:30 2:30 00:50 00:30 RT tolerance 0:50 03:30 05:00 00:30 00:30 RANSAC iterations 5000 50000 50000 15000 15000 Minimum number of points 20% 2.00% 0.10% 20.00% 20.00% Threshold value 4 seconds 4 seconds 15 seconds 4 seconds 4 seconds Non-linear model yes yes no yes yes alignments, with worse results obtained in only a single fractions of dataset P1, the RANSAC aligner provided case (P2 dataset fraction 00). We assume that the high the best results among all the tested algorithms. Com- number of features in this fraction (over 6800 rows after plete datasets P1, P2, M1, and M2, as well as all align- alignment) made it somewhat difficult for the RANSAC ment results, are available online (see Dataset algorithm to build a suitable model. Notably, in all download). Conclusions Table 2 Run times of the RANSAC aligner for aligning the The development of MZmine 2 was motivated by the 12 synthetic data sets and the real proteomic (P1 and P2) and metabolomic (M1 and M2) data sets need for a flexible and modular software platform that would allow the bioinformatic and analytical community Data set Run time (min) to contribute new methods for specific stages of MS- Run 1 Run 2 Run 3 Average based data processing. Great emphasis was placed on Synthetic data set 1 0.17 0.15 0.16 0.16 achieving the three main goals of a flexible, extendable, Synthetic data set 2 0.32 0.31 0.31 0.31 and modular design; user-friendly graphic interface; and Synthetic data set 3 0.44 0.41 0.42 0.42 good support for high-resolution MS data. The authors Synthetic data set 4 0.46 0.45 0.45 0.45 of this manuscript work in the field of metabolomics Synthetic data set 5 0.62 0.67 0.74 0.68 utilizing an LC-MS analytical platform, and therefore Synthetic data set 6 0.39 0.38 0.39 0.39 the currently developed modules were tested mainly on Synthetic data set 7 0.54 0.54 0.55 0.55 LC-MS data. The flexibility of MZmine 2, however, allows for easy expansion to other dataset types such as Synthetic data set 8 0.25 0.26 0.25 0.25 gas chromatography-MS, as well as interoperation with Synthetic data set 9 0.79 0.80 0.84 0.81 popular proteomics search engines such as MASCOT. Synthetic data set 10 0.73 0.73 0.72 0.73 Synthetic data set 11 5.24 4.11 5.17 4.84 Synthetic data set 12 7.79 7.78 7.64 7.74 M1 62.27 58.08 63.56 61.30 M2 147.64 163.62 146.79 152.69 P1 000 4.95 6.49 7.72 6.39 020 0.50 0.50 0.57 0.52 040 0.76 0.70 0.65 0.70 060 1.11 1.06 1.14 1.10 080 0.61 0.57 0.67 0.62 100 0.46 0.48 0.51 0.48 P2 000 22.47 22.94 21.12 22.18 020 1.35 1.31 1.18 1.28 Figure 6 Performance comparison of RANSAC aligner and Join 040 0.65 0.73 0.71 0.70 aligner for 12 synthetic datasets. For each dataset, peak lists were aligned using the RANSAC aligner and the Join aligner with 080 0.31 0.36 0.39 0.35 three different retention time tolerance thresholds (50 s, 20 s, and 5 100 0.47 0.43 0.49 0.46 s). Plot shows the average recall and precision values for all Run times were obtained on an AMD Opteron 1.8 GHz dual-core system with datasets. Error bars indicate standard deviations. 10 GB RAM, running Linux. Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 9 of 11 http://www.biomedcentral.com/1471-2105/11/395 Table 3 Performance comparison of MZmine 2 alignment methods (right side of the table) to previously published results (left side of the table) obtained using several different software packages [11] Results published by Lange et al. (2008), avaiable at the time of writing at http://msbi. MZmine 2 results ipb-halle.de/msbi/caap msInspect MZmine OpenMS SpecArray XAlign XCMS (version 0.6) without RT With Join RANSAC correction correction aligner aligner Proteomics data set P1 fraction 00 Recall 0.52 0.81 0.86 0.61 0.82 0.72 0.62 0.80 0.86 Precision 0.38 0.81 0.86 0.61 0.82 0.54 0.58 0.80 0.86 fraction 20 Recall 0.56 0.90 0.92 0.62 0.85 0.88 0.81 0.90 0.93 Precision 0.45 0.90 0.92 0.62 0.85 0.84 0.80 0.90 0.93 fraction 40 Recall 0.63 0.90 0.94 0.75 0.87 0.92 0.81 0.87 0.94 Precision 0.48 0.90 0.94 0.75 0.87 0.85 0.80 0.87 0.94 fraction 60 Recall 0.73 0.84 0.96 0.71 0.87 0.91 0.78 0.89 0.97 Precision 0.54 0.84 0.96 0.71 0.87 0.80 0.75 0.89 0.97 fraction 80 Recall 0.70 0.94 0.96 0.74 0.90 0.94 0.89 0.94 0.97 Precision 0.57 0.94 0.96 0.74 0.90 0.88 0.88 0.94 0.97 fraction 100 Recall 0.82 0.94 0.94 0.77 0.96 0.95 0.96 0.95 0.96 Precision 0.56 0.94 0.94 0.77 0.96 0.89 0.96 0.95 0.96 Proteomics data set P2 fraction 00 Recall 0.23 0.62 0.77 0.07 0.65 0.70 0.58 0.63 0.56 Precision 0.07 0.49 0.65 0.05 0.49 0.31 0.44 0.53 0.49 fraction 20 Recall 0.67 0.87 0.92 0.57 0.84 0.89 0.86 0.81 0.93 Precision 0.24 0.71 0.77 0.42 0.70 0.55 0.66 0.69 0.78 fraction 40 Recall 0.44 0.79 0.76 0.60 0.71 0.72 0.72 0.74 0.78 Precision 0.26 0.76 0.74 0.41 0.69 0.56 0.69 0.73 0.77 fraction 80 Recall 0.73 0.60 0.80 0.65 0.58 0.64 0.49 0.61 0.61 Precision 0.34 0.56 0.70 0.44 0.56 0.50 0.45 0.58 0.61 fraction 100 Recall 0.82 0.80 0.90 0.63 0.85 0.95 0.85 0.85 0.88 Precision 0.39 0.64 0.75 0.44 0.69 0.65 0.69 0.71 0.75 Metabolomics data sets M1 Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 10 of 11 http://www.biomedcentral.com/1471-2105/11/395 Table 3: Performance comparison of MZmine 2 alignment methods (right side of the table) to previously published results (left side of the table) obtained using several different software packages [11] (Continued) Recall 0.27 0.92 0.87 - 0.88 0.98 0.94 0.90 0.91 Precision 0.46 0.73 0.69 - 0.70 0.60 0.70 0.74 0.74 M2 Recall 0.23 0.98 0.93 - 0.93 0.97 0.98 0.98 0.98 Precision 0.47 0.84 0.79 - 0.79 0.58 0.78 0.83 0.83 Several other software packages have been introduced Additional material for LC-MS based data processing, such as XCMS [22], Trans Proteomic Pipeline [23], Trequips [24], OpenMS- Additional file 1: Numerical values for Figure 6. Precision and recall TOPP [25], and ProteoWizard [26]. None of these tools, values of RANSAC and Join aligner results for 12 synthetic data sets. however, share the same goals with MZmine 2, most of them being command-line oriented with fixed feature sets, aiming specifically for either proteomic or metabo- Abbreviations lomic research. Rather then a single piece of software, AW: Alignment window; LC-MS: Liquid chromatography-mass spectrometry; MS: Mass spectrometry; RANSAC: Random sample consensus; RW: RANSAC the developmental aim of MZmine 2 is to create a uni- window versal platform through which researchers can contri- bute individual processing modules and implement and Acknowledgements We thank the present and past MZmine 2 contributors Mikko Katajamaa, share novel ideas, spanning over multiple research fields Yosuke Kawasaki, Jarkko Miettinen, John Rush, Marco Schaerfke, and Sasha and analytical methods. Tkachev. We also thank the Okinawa Institute of Science and Technology MZmine 2 is available for download at the project Promotion Corporation for providing the funding and Mitsuhiro Yanagida for supporting the MZmine 2 development in his laboratory. We are very WWW site, together with a printable manual, an ani- grateful to the developers of open-source libraries such as JFreeChart, VisAD, mated tutorial, a module development tutorial, and Jmol, and CDK. This work was in part supported by the EU-funded project further relevant project information such as a source ETHERPATHS (FP7-KBBE-222639, http://www.etherpaths.org/). code repository and developers’ mailing list. The current Author details version of the framework is already suitable for proces- G0 Cell Unit, Okinawa Institute of Science and Technology (OIST), Onna, sing large batches of data, both for targeted and/or non- Okinawa, Japan. Quantitative Biology and Bioinformatics, VTT Technical Research Centre of Finland, Espoo, Finland. targeted analyses, and has been applied in metabolomic research [13,27]. Authors’ contributions TP designed the data model and overall architecture of the MZmine 2 framework and implemented most of the raw data visualization and peak Dataset download identification modules. SC implemented the project serialization and The data associated with this manuscript may be down- RANSAC aligner. AVB implemented the peak detection module with loaded from ProteomeCommons.org Tranche using the previews, scatter plot and histogram visualizers, and isotope pattern support, and contributed to the online database search module development. MO following hash: participated in software testing and provided feedback on the framework design. All authors read and approved the final manuscript. X19bvFk4++ SVz0ngXab4YQ Qu389r / SBAOevlKh2f5bNyxDnvYiOQhqmU0r+ + rIknzgCsg8SNWWJVWtlhURkA+= eoea8MAAAAAAABm9w= Received: 6 May 2010 Accepted: 23 July 2010 Published: 23 July 2010 The hash may be used to validate the files were pub- References lished as part of this manuscript’s dataset, and to check 1. Katajamaa M, Oresic M: Data processing for mass spectrometry-based that the data have not changed since publication. metabolomics. J Chromatogr A 2007, 1158(1-2):318-328. 2. Orchard S, Hoogland C, Bairoch A, Eisenacher M, Kraus HJ, Binz PA: Managing the data explosion. A report on the HUPO-PSI Workshop. Availability and requirements August 2008, Amsterdam, The Netherlands. Proteomics 2009, 9(3):499-501. � Project name: MZmine 2 3. Katajamaa M, Miettinen J, Oresic M: MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data. � Project home page: http://mzmine.sourceforge.net Bioinformatics 2006, 22(5):634-636. � Operating system(s): Platform independent 4. Katajamaa M, Oresic M: Processing methods for differential analysis of � Programming language: Java LC/MS profile data. BMC Bioinformatics 2005, 6:179. 5. Laaksonen R, Katajamaa M, Paiva H, Sysi-Aho M, Saarinen L, Junni P, � Other requirements: Java Runtime Environment Lutjohann D, Smet J, Van Coster R, Seppanen-Laakso T, Lehtimäki T, Soini J, (JRE) 1.6, Java3D Oresic M: A systems biology strategy reveals biological pathways and � License: GNU GPL Pluskal et al. BMC Bioinformatics 2010, 11:395 Page 11 of 11 http://www.biomedcentral.com/1471-2105/11/395 plasma biomarker candidates for potentially toxic statin-induced 26. Kessner D, Chambers M, Burke R, Agus D, Mallick P: ProteoWizard: open changes in muscle. PLoS One 2006, 1:e97. source software for rapid proteomics tools development. Bioinformatics 6. Oresic M, Simell S, Sysi-Aho M, Nanto-Salonen K, Seppanen-Laakso T, 2008, 24(21):2534-2536. Parikka V, Katajamaa M, Hekkala A, Mattila I, Keskinen P, Yetukuri L, 27. Oresic M, Seppanen-Laakso T, Yetukuri L, Backhed F, Hanninen V: Gut Reinikainen A, Lähde J, Suortti T, Hakalax J, Simell T, Hyöty H, Veijola R, microbiota affects lens and retinal lipid composition. Exp Eye Res 2009, Ilonen J, Lahesmaa R, Knip M, Simell O: Dysregulation of lipid and amino 89(5):604-607. acid metabolism precedes islet autoimmunity in children who later doi:10.1186/1471-2105-11-395 progress to type 1 diabetes. J Exp Med 2008, 205(13):2975-2984. Cite this article as: Pluskal et al.: MZmine 2: Modular framework for 7. Gopalacharyulu PV, Velagapudi VR, Lindfors E, Halperin E, Oresic M: processing, visualizing, and analyzing mass spectrometry-based Dynamic network topology changes in functional modules predict molecular profile data. BMC Bioinformatics 2010 11:395. responses to oxidative stress in yeast. Mol Biosyst 2009, 5(3):276-287. 8. Medina-Gomez G, Gray SL, Yetukuri L, Shimomura K, Virtue S, Campbell M, Curtis RK, Jimenez-Linan M, Blount M, Yeo GS, Lopez M, Seppänen-Laakso T, Ashcroft FM, Oresic M, Vidal-Puig A: PPAR gamma 2 prevents lipotoxicity by controlling adipose tissue expandability and peripheral lipid metabolism. PLoS Genet 2007, 3(4):e64. 9. Kind T, Tolstikov V, Fiehn O, Weiss RH: A comprehensive urinary metabolomic approach for identifying kidney cancer. Anal Biochem 2007, 363(2):185-195. 10. Timischl B, Dettmer K, Kaspar H, Thieme M, Oefner PJ: Development of a quantitative, validated capillary electrophoresis-time of flight-mass spectrometry method with integrated high-confidence analyte identification for metabolomics. Electrophoresis 2008, 29(10):2203-2214. 11. Lange E, Tautenhahn R, Neumann S, Gropl C: Critical assessment of alignment procedures for LC-MS proteomics and metabolomics measurements. BMC Bioinformatics 2008, 9(1):375. 12. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E: The Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformatics. J Chem Inf Comput Sci 2003, 43(2):493-500. 13. Pluskal T, Nakamura T, Villar-Briones A, Yanagida M: Metabolic profiling of the fission yeast S. pombe: quantification of compounds under different temperatures and genetic perturbation. Mol Biosyst 2010, 6(1):182-198. 14. Tautenhahn R, Bottcher C, Neumann S: Highly sensitive feature detection for high resolution LC/MS. BMC Bioinformatics 2008, 9:504. 15. Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH: PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res 2009, , 37 Web Server: W623-633. 16. Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000, 28(1):27-30. 17. Smith CA, O’Maille G, Want EJ, Qin C, Trauger SA, Brandon TR, Custodio DE, Abagyan R, Siuzdak G: METLIN: a metabolite mass spectral database. Ther Drug Monit 2005, 27(6):747-751. 18. Wishart DS, Knox C, Guo AC, Eisner R, Young N, Gautam B, Hau DD, Psychogios N, Dong E, Bouatra S, Mandal R, Sinelnikov I, Xia J, Jia L, Cruz JA, Lim E, Sobsey CA, Shrivastava S, Huang P, Liu P, Fang L, Peng J, Fradette R, Cheng D, Tzur D, Clements M, Lewis A, De Souza A, Zuniga A, Dawe M, Xiong Y, Clive D, Greiner R, Nazyrova A, Shaykhutdinov R, Li L, Vogel HJ, Forsythe I: HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res 2009, , 37 Database: D603-610. 19. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20(18):3551-3567. 20. Fischler MA, Bolles RC: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Comm Of the ACM 1981, 24:381-395. 21. Cleveland WS, Devlin SJ: Locally weighted regression - an approach to regression-analysis by local fitting. J Am Stat Assoc 1988, 83(403):596-610. 22. Benton HP, Wong DM, Trauger SA, Siuzdak G: XCMS2: processing tandem mass spectrometry data for metabolite identification and structural Submit your next manuscript to BioMed Central characterization. Anal Chem 2008, 80(16):6382-6389. and take full advantage of: 23. Keller A, Eng J, Zhang N, Li XJ, Aebersold R: A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol 2005, • Convenient online submission 1:2005.0017. 24. Gehlenborg N, Yan W, Lee IY, Yoo H, Nieselt K, Hwang D, Aebersold R, • Thorough peer review Hood L: Prequips–an extensible software platform for integration, • No space constraints or color ﬁgure charges visualization and analysis of LC-MS/MS proteomics data. Bioinformatics • Immediate publication on acceptance 2009, 25(5):682-683. 25. Kohlbacher O, Reinert K, Gropl C, Lange E, Pfeifer N, Schulz-Trieglaff O, • Inclusion in PubMed, CAS, Scopus and Google Scholar Sturm M: TOPP–the OpenMS proteomics pipeline. Bioinformatics 2007, • Research which is freely available for redistribution 23(2):e191-197. Submit your manuscript at www.biomedcentral.com/submit

Journal

BMC Bioinformatics – Springer Journals

Published: Jul 23, 2010

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data

MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data

MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data

References (61)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies