Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

The Open Microscopy Environment (OME) Data Model and XML file: open tools for informatics and quantitative analysis in biological imaging

The Open Microscopy Environment (OME) Data Model and XML file: open tools for informatics and... The Open Microscopy Environment (OME) defines a data model and a software implementation to serve as an informatics framework for imaging in biological microscopy experiments, including representation of acquisition parameters, annotations and image analysis results. OME is designed to support high-content cell-based screening as well as traditional image analysis applications. The OME Data Model, expressed in Extensible Markup Language (XML) and realized in a traditional database, is both extensible and self-describing, allowing it to meet emerging imaging and analysis needs. the sample [1-4]. Numerical analytic methods extract infor- Rationale Biological microscopy has always required an 'imaging' capa- mation from quantitative image data that cannot be gleaned bility: traditionally, the image of a sample was drawn on by simple inspection [5-7]. Growing interest in high-through- paper, or with the advent of light-sensitive film, recorded on put cell-based screening of small molecule, RNAi, and expres- media that conveniently allowed reproduction. The advent of sion libraries (high-content screening) has highlighted the digital detectors in microscopy has progressively expanded large volume of data these methods generate and the require- imaging capacity, transforming the biological microscope ment for informatics tools for biological images [8-10]. into an assay device that linearly measures the flux of light at different points in a cell or tissue. Almost all the vast clinical In its most basic form, an image-informatics system must and research applications of digital imaging microscopy treat accurately store image data obtained from microscopes with the recorded microscope image as a quantitative measure- a wide range of imaging modes and capabilities, along with ment. This is especially true for fluorescence or biolumines- accessory information (termed metadata) that describe the cence, where the signal recorded at any point in the sample experiment, the acquisition system, and basic information gives a direct measure of the number of target molecules in about the user, experimenter, date, and so on [11,12]. At first Genome Biology 2005, 6:R47 R47.2 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. http://genomebiology.com/2005/6/5/R47 glance, it might appear that these requirements can be met by images related to specific publications has been proposed applying some of the tools that underpin modern biology, [13,14], but this cannot happen without adaptable data man- such as the informatics approaches developed for genomics. agement systems in each lab or facility. The only viable However, it is worth comparing a genome-sequencing exper- approach is the provision of a standardized data model that iment to a cellular imaging experiment. In genomics, knowl- supports local extensibility. Local instances of the data model edge of the type of automated sequencer that was used to that store site-specific data and manage access to it must be determine the DNA sequence ATGGAC... is not necessary to provided along with a mechanism for data sharing or migra- interpret the sequence. Moreover, the result ATGGAC... is tion between sites. These requirements are shared by other deterministic - no further analysis is required to 'know' the data-intensive methodologies (for example, mass spectrome- sequence, and in general, the same result will be obtained try and two-dimensional gel electrophoresis). Thus, a major from other samples from the same organism. By contrast, an challenge is the design and implementation of a system for image of a cell can only be understood if we know what type multidimensional images, experimental metadata, and ana- of cell it is, how it has been grown and prepared for imaging, lytical results that are commonly generated in biological which stains or fluorescent tags have been used to label sub- microscopy that will also be generally adaptable to many dif- cellular structures, and the imaging methodology that was ferent types of data. used to record it. For image processing, knowledge of the optical transfer function, spectral properties and noise char- To make it possible to manipulate and share image data as acteristics of the microscope are all critical. Interpretation of readily as genomic data, we are building an image-manage- results from image analysis requires knowledge of the precise ment system geared to the specific needs of quantitative characteristics of the algorithms used to extract quantitative microscopy. The major focus of the Open Microscopy Envi- information from images. Indeed, deriving information from ronment (OME) [11,15] is not on creating image-analysis images is completely dependent on contextual information algorithms, but rather on the development of software and that may vary from experiment to experiment. These require- protocols that allow image data from any microscope to be ments are not met by traditional genomics tools and thus stored, shared and transformed without loss of image data or demand a new kind of bioinformatics focused on experimen- information about the experimental setting, the imaging sys- tal metadata and analytic results. tem or the processing software. OME provides a data model that can integrate with other efforts to define experimental, In the absence of integrated solutions to image data manage- genomic, and biological ontologies [16-19] and that is suitable ment, it has become standard practice to migrate large for traditional low-volume microscopy and for high-through- amounts of data through multiple file formats as different put image-based screening. This data model is implemented analysis or visualization methods are employed. Moreover, in a relational database and application server to import, while some commercial microscope image formats record store, process, view and export data. The OME Data Model is system configuration parameters, this information is always also implemented in an Extensible Markup Language (XML) lost during file format conversion or data migration. Once an file format that makes it possible to transfer OME files analysis is carried out, the results are usually exported to a between OME databases and exchange them with other soft- spreadsheet program like Microsoft Excel for further calcula- ware, including that provided by commercial vendors. OME tions or graphing. The connections between the results of does not replace or compete with existing commercial soft- image analyses, a graphical output, the original image data ware for controlling microscopes, acquiring images or per- and any intermediate steps are lost, so that it is impossible to forming image restoration. Instead, it serves as a neutral systematically dissect or query all the elements of the data broker among a multitude of otherwise incompatible soft- analysis chain. Finally, the data model used in any imaging ware tools. system varies from site to site, depending on the local experi- mental and acquisition system. It can also change over time, In our previous work [11], we described the conceptual foun- as new acquisition systems, imaging technologies, or even dation for an image informatics system. In this report we new assays are developed. The development and application describe the implementation of this system, including details of new imaging techniques and analytic tools will only accel- of the OME XML file format, a description of how images are erate, but the requirement for coherent data management represented both in the file format and in the data model, the and adaptability of the data model remain unsolved. It is clear application of semantic types for metadata extensibility as that a new approach to data management for digital imaging well as their use in modular image analysis, and describe is necessary. recently developed software that makes use of this system and is targeted at end-users. The current version of OME focuses It might be possible to address these problems using a single on fluorescence microscopy, but the underlying schema and image data standard or a central data repository. However, a file specifications can be extended to support any type of single data format specified by a standards body breaks the microscope image. The OME XML file format has already requirement for local extensibility and would therefore be gained acceptance within the microscopy community. At the ignored. A central image data depository that stores sets of time of writing, two companies support the format in their Genome Biology 2005, 6:R47 comment reviews reports deposited research refereed research interactions information http://genomebiology.com/2005/6/5/R47 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. R47.3 current commercial offerings (Applied Precision, Issaquah, any given dataset may belong to one or more projects. Each WA and Bitplane, Zurich, Switzerland), and it has been pro- project and dataset has its own name, description and owner. posed as a standard recommendation for image data migra- tion by the European Advanced Microscopy Network [20]. The OME Data Model allows for other types of image collec- Immediate applications for OME within biomedical research tions. Explicit support is included for high-content assays include the characterization of dynamic cell and tissue struc- (HCAs) conducted on microtiter plates or other arraying for- tures for basic research, high-content cell-based screening mats. In this case, the OME Data Model allows for an addi- and high-performance clinical microscopy. tional grouping hierarchy: 'Plates', 'Screens', 'Wells', and 'Samples'. Samples are groups of images from one well, Plates are groups of Wells, and Screens are groups of Plates. Just like Projects and Datasets, each level of the hierarchy has its own Definition of an image All imaging experiments occur within specific temporal and set of identifiers. It is also possible for a given plate to belong spatial limits. In OME, we define an image as a five-dimen- to multiple screens, thereby providing a logical mechanism sional (5D) structure containing multiple two-dimensional for reuse of the same collection of data for different analyses. (2D) frames (Figure 1a). Each frame has dimensions (x, y) Similarly, a mechanism is provided for categorizing images that correspond to the image plane of the microscope and is into arbitrary user-defined groups. recorded from an array detector (for example a CCD camera in a wide-field microscope) or generated by a two-dimen- An additional level of hierarchy below images included in the sional raster scan (for example, a laser scanning confocal OME Data Model is 'Features'. Although there is some con- microscope). Each frame has a specified focal position z, a flict of nomenclature in what is considered an image feature wavelength, or more generally channel, c, and timepoint t. between areas of machine learning and traditional image The extent of a 5D-image is unlimited. The time and channel analysis, in OME's case, image features are 'regions' in an dimensions may be continuous or discrete. For example, the image (for example cells or nuclei). Numerical descriptors image may contain an entire spectrum at each pixel as in Fou- used for classification content are then referred to as 'Signa- rier Transform Infrared (FTIR) imaging, or it may consist of tures' [21]. The OME Data Model allows features to contain a set of discrete wavelengths such as commonly seen in fluo- other features, so that, for example, the relationship between rescence microscopy. Similarly, there may be a continuous a cell, a nucleus and a nucleolus can be expressed. At present, series of time points that are evenly spaced, as in a video we do not specify an ontology for the kinds of information an stream, or the image may contain unevenly spaced, discrete image feature may contain. Any information obtained by seg- time points. Images that are not continuous in space are mentation algorithms, or other algorithms that define Fea- treated as separate images even though they may be part of tures is stored using the data model's extensibility the same experiment. For example, visiting several places on mechanism (see Semantic types below). a microscope slide or a microtiter plate will result in as many separate images. Finally, the meaning of the pixel values Semantic types recorded in each frame are determined by the imaging All information in the OME Data Model can be reduced to method performed (Figure 1b). 'semantic types' (STs). In most ways, this is merely a name or label given to a piece of information, but in OME it has addi- tional consequences. STs can describe information at four levels in the OME hierarchy: Global, Dataset, Image and Fea- The OME Data Model To solve the problems of data interoperability and extensibil- ture. Global STs are used to describe 'Experimenters', ity, we have developed a definition, or ontology, of the differ- 'Groups', 'Microscopes', and so on - items that are applicable ent data types and relationships included in an imaging to all images in an OME database. Dataset STs are used to experiment. The OME Data Model integrates binary image describe information about datasets - information pertinent data and all information regarding the image acquisition and to a collection of images. Image STs describe information per- processing, and any results generated during analysis. In this tinent to images, and feature STs describe information about way, all aspects of the data acquisition, processing, and anal- image features - objects or 'blobs' within images. In our ysis remain linked and can be used by any analysis or visuali- nomenclature, the data type is an ST, and the data itself is an zation application. Groups of Images can be organized into attribute. For example, the 'Pixels' data type is an Image ST, 'Datasets' and 'Projects'. (Throughout this paper, when refer- and a particular set of Pixels is an attribute of a particular ring specifically to OME objects (such as Projects, Datasets, Image. Throughout this paper XML elements defined in the Images, Pixels, and Features), they are capitalized.) Datasets OME XML schema are placed within angle brackets (<>). are user-defined groups of images that are always analyzed together: an example would be images from a single immun- Data model extensibility ofluorescence experiment. An image may belong to one or Standardizing access to data solves many problems, but could more datasets. Projects in turn are collections of datasets, and severely limit the types of data that might be stored. Because it is not possible to define a priori what kinds of imaging Genome Biology 2005, 6:R47 R47.4 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. http://genomebiology.com/2005/6/5/R47 Optical sections (a) Spectral coding Timelapse ∆ focus ∆ time ∆ wavelength Single frame from CCD or laser scan ∆ position 1 2 3 4 (b) Contrast method Imaging mode Brightfield Wide-field Phase Laser scanning confocal DIC Spinning disk confocal Hoffman modulation Multi-photon Oblique illumination Structured illumination Polarized light Single molecule Darkfield Total internal reflection Fluorescence Fluorescence lifetime Fluorescence correlation Second harmonic generation Figure 1 (see legend on next page) Genome Biology 2005, 6:R47 comment reviews reports deposited research refereed research interactions information http://genomebiology.com/2005/6/5/R47 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. R47.5 Fi Thg e mo ure 1 de of (seeac previous quisitio page n de) fines the pixel image data The mode of acquisition defines the pixel image data. The meaning of a 2D-image recorded from a digital microscope imaging system varies depending on how it is collected. Almost all of the different modes in (a) and (b) can be combined to analyze cell structure and behavior. All of the parameters and configurations must be somehow recorded for the interpretation of the pixel data in an image. (a) The spatial, spectral and temporal context of an image is used to generate more information about the cell under study. Changing stage position, focus, spectral range or time of imaging all expand the meaning of an image. Modified from [33]. (b) The two aspects of the image data collection that define the pixel data. A variety of methods are used to generate contrast in modern biological imaging. In addition, the imaging method used to record the data also has meaning. experiments and analyses will be performed, it is not possible <CustomAttributes> to design a data model to contain this information ahead of time. For this reason, we have included a mechanism for <AcquisitionTiming theZ='0' theC='0' describing new types of data in the OME Data Model. As one theT='0' deltaT='0.001'/> of our goals is to define a common ontology for light micros- copy, the STs that make up this ontology are part of the 'core </CustomAttributes> set', whereas other STs can be locally defined to address evolving imaging needs. Since the data model contains its Importantly, our open-source implementation of OME (see own description, it can be extended in arbitrary ways. As below) will automatically expand its database schema when it these extensions become commonly used, the STs that define comes across an ST definition, and will populate the resulting them can be incorporated into the core set. The initial core set tables when it comes across the data in <CustomAttributes>. is concerned chiefly with acquisition parameters so that This approach allows for immense flexibility in the ontologies image data can be interpreted unambiguously. As the project OME can support. evolves, analytical STs will be incorporated into the core set in order to achieve interoperability not only at the level of inter- IDs and references preting raw image data, but also at the level of interpreting OME has adopted the Life Science ID (LSID) system of data image analysis results. registration [22]. Since LSIDs are universally unique, every piece of information stored using the OME Data Model can be Consider an example where a commercial software vendor traced to its source - regardless of how it was produced. Every might specify additional metadata in the timing information OME element that has an ID attribute may follow the LSID for acquisition of Z sections in an XYZ 3D stack of image format, but this is not a requirement. If a particular ID does planes. As the timing information would pertain to specific not follow the LSID format (it does not start with 'urn:lsid:'), images, this new data type would be declared as an Image ST. it must be assumed that this is a 'brand new' object. While this More specifically, since the timing information pertains to is a valid assumption for data, it may not be valid for an individual planes within the 5D Image, a set of plane indexes instrument description. For this reason actual globally would be included in the definition referring to a specific unique LSIDs are preferred whenever possible - especially for plane. The timing information itself can be expressed as a global data (such as Experimenters, Screens, Plates, Micro- delta-time or an absolute time (or both), and may have units scopes). If the object is identified with a proper LSID, it can that are either implied or made explicit. Regardless of how the be referred to from other documents. In this way, a single timing is expressed, it is understood that any software that document can be used to describe a microscope and its com- uses this newly declared ST agrees on the convention adopted ponents, and subsequent documents containing images can and the precise meaning of the data it represents. This agree- refer to these components by LSID. There are open-source ment on meaning allows any software application to implementations of LSID servers (resolvers) and clients exchange acquisition timing information with any other. developed by IBM Life Sciences available online [22] that make it possible to resolve an LSID remotely. Although we Using OME XML (see OME XML file below), this declaration plan to incorporate LSID resolution into OME software tools, would be stored in the <SemanticTypeDefinitions> element at the time of writing, support for LSIDs are only incorpo- in the XML document, while the timing information itself rated into the OME Data Model. (the attributes) would be stored under the <CustomAttrib- utes> element for the specific image. The names of the ele- The globally unique nature of LSIDs allows OME to trace ments under <CustomAttributes> match the names of the every piece of information back to its origin. Provenance and STs, and the data itself goes into the element's attributes. For data history will be discussed in a future report detailing the example: OME analysis system, but the use of LSIDs and a representa- tion of data history is sufficient to determine the origin of every piece of information about an image. From precisely Genome Biology 2005, 6:R47 R47.6 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. http://genomebiology.com/2005/6/5/R47 where, when and how the image was acquired, through any analysis that was done, to any structured information or conclusions that were derived as a result of analysis. LSIDs allow preservation of this chain of provenance regardless of the number of intermediate documents, and proprietary or open-source OME-compatible software systems that oper- ated on this information. The OME XML file The OME Data Model serves as the foundation of two tools we have developed to address the requirement for extensible image data management. The first addresses the absence of a universally recognized image data file format. We have built an XML-based implementation of the OME Data Model that can be used by manufacturers of acquisition hardware and developers of image-processing and analysis software who may not want to invent their own image format. With this def- inition, it is possible to specify a minimal set of commonly used parameters during image acquisition in light micros- copy, analogous to the MIAME standard that defines a mini- mal set of information about microarray experiments [23]. All the characteristics of the OME Data Model described above are reproduced in the OME XML file. Along with each 5D image (that is, the binary pixels), the OME XML file con- tains all of the associated metadata. The OME file schema [24] and the full documentation for the schema [25] are avail- able online. A description of how the schema is designed and its relationship to other OME schemas is also available online [26]. Figures 2, 3, 4 highlight some of the features of the schema. In these figures, the highest level in the schema is on the left side of the diagram, and the elements defined in it are read moving from left to right. Why XML? The structure of the OME XML document is defined in XML Schema, which is a standard language for defining XML doc- ument structure [27]. The use of XML and a publicly available schema allows OME documents to be used in several ways that are not possible with current image formats. For exam- ple, modern browsers incorporate XML parsers, and are able to display the information contained in XML with the use of a style sheet, thus allowing customized display of data in the document using a standard browser without additional soft- ware. The use of XML also allows us to take advantage of its growing popularity in various unrelated fields - including a great deal of software written for XML, including databases, editing tools, and parsing libraries. Finally, and perhaps most important, XML is a plain-text format. As a last resort, it can be opened in any text editor and the information it contains can simply be read by a person. This inherent openness is one of its most desirable features for representing scientific data. Figure 2 Defining the OME file using XML Schema allows other advantages. The document structure is specified in a form Genome Biology 2005, 6:R47 comment reviews reports deposited research refereed research interactions information http://genomebiology.com/2005/6/5/R47 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. R47.7 gzip format is a standard encoding for XML, so any XML soft- H Fiig gh u-le rev 2 el view of the elements in the OME file schema ware library will transparently read and write these com- High-level view of the elements in the OME file schema. This figure (and pressed files even though the compressed file will no longer Figures 3 and 4) should be read from left to right. A data type (for be readable by standard text editors. However, this secondary example, OME) is defined by a number of elements. In this case, OME is defined by Project, Dataset, Experiment, Image, and so on. Each of these compression will only eliminate the base64 encoding over- elements can be defined by their own individual elements. The Image and head - it will not further compress already compressed Instrument elements are expanded in Figures 3 and 4. The full XML planes. schema is available [24]. The full documentation for the schema is also available [25]. +, One or more elements of this type; ?, optional element or attribute; *, zero or more elements of this type; 1, choose one from a There are limitations to the use of this compression scheme. list of elements; D, the value of this element/attribute is constrained to Performing the compression on a per-plane basis allows lim- one of several values, a range, or a text pattern (see the online ited random access to the planes. The entire XML file need documentation for more details [25]). not be kept in memory in order to access arbitrary planes by index, but a file offset cannot be calculated for a given plane due to their different sizes when compressed. Instead, the that can be parsed, which allows third-party software to vali- entire file has to be scanned first in order to determine the file date XML documents against our published schema. This for- offsets for each plane index. It is important to note that the mal specification allows other parties to implement this primary goal of the OME XML file format is not raw perform- format without the potential misunderstanding and incom- ance, but interoperability above all else, using widely patibility that is common with textual descriptions of file for- accepted standards and practices for information exchange. mats. For example, several manufacturers are either As the OME XML file format has gained acceptance, a developing or have developed support for the OME file format demand for a high-performance variant has begun to emerge, and we are examining several possibilities that preserve the independently of each other and, to a large extent, independ- ently of our group of developers. No exchange of intellectual metadata structure that we have defined, but allow rapid property or reverse engineering is necessary to accomplish reading and writing from disc. this. The XML Schema is the definitive documentation for reading and writing OME XML files, used in the same way by third-party developers for proprietary software, as well as by Schema overview ourselves for our own open-source implementation. Figure 2 shows the main elements of the OME XML file schema. As discussed above, each image is defined as being There are a few disadvantages to XML worth considering. A part of a dataset and project, and when necessary, a given commonly perceived weakness of XML is that its human- plate and screen. The stored data is also related to the exper- readable design is often at odds with the storage of binary imenter that collected the data and his or her group. Any data. Since the bulk of an image file is represented by the pix- additional types of global data including customized or ven- dor-specific data can be defined at this level. Images and els in the image and not the metadata, this might be perceived as a serious problem. A related problem is that XML is ver- Instruments are defined as discussed below. Many of the ele- bose - XML files are often much larger than their binary ments contain IDs that uniquely identify that data element - equivalents, and image files are already quite large. The pro- Experimenter, Dataset. If these identifiers follow the LSID format they are considered globally unique and can be used as posed format addresses these two concerns by storing binary data in plain text and reducing file size using compression. references between other OME XML documents or remote OME installations. The standard approach to representing binary data in XML is This format allows for an arbitrary number of images to be with the use of base64 encoding. A 24-digit base 2 binary number (three bytes) is converted to a 4-digit base 64 number described and their relationships and grouping patterns spec- (four bytes) with each digit represented as a text character ified in a single document. Conversely, the file may describe using all the numbers, upper- and lowercase letters and two only the imaging equipment, users, or other parameters at a given site and not contain any images. Subsequent docu- punctuation marks. This conversion inflates the size of the binary data by 25%. To mitigate this increase in size, OME ments can refer to these items by LSID. Or, as is done in other XML specifies compression of the pixels on a per-plane basis formats, the file can be used to specify a single image and its in either bzip2 or gzip, both patent-free compression schemes accompanying metadata. As any information not specified in the schema must be represented as well, a section is dedicated available in open-source form online. Owing to the high com- pressibility of image data, OME XML files are in practice to defining new types of information (<SemanticTypeDecla- much smaller than their equivalents in other formats, usually rations>). The information itself is specified at the appropri- a half to a third the size of uncompressed binary data. Because ate hierarchy level within the <CustomAttributes> elements that exist in <OME>, <Dataset>, <Image> and <Feature>. the compressed stream is still encoded in base64, it still incurs the 25% overhead, but on a much smaller piece of binary data. Of course text is itself easily compressed, and the Genome Biology 2005, 6:R47 R47.8 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. http://genomebiology.com/2005/6/5/R47 Figure 3 (see legend on next page) Genome Biology 2005, 6:R47 comment reviews reports deposited research refereed research interactions information http://genomebiology.com/2005/6/5/R47 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. R47.9 The Figu In re stru 3 (see ment elemen previous page t in t ) he OME file schema The Instrument element in the OME file schema. The data elements that define the acquisition system parameters are shown. For these descriptions, we have incorporated suggestions from many colleagues and commercial partners [32]. Symbols are as in Figure 2. The least developed aspect of the OME schema is the Experi- The OME Image type ment description. Although clearly a critical part of the meta- The OME Image type (Figure 4) provides a description of the data, the design of this ontology is under development by structure, format, and display of the image data. There are many other groups (for example, MIAME/MAGE, Gene references to the light source, spectral filtering, imaging Ontology (GO), Proteomics Standards Initiative (PSI), and method, and display settings used for each channel. The minimum information specification for in situ hybridization actual binary data, referred to as 'Pixels' are also stored in this and immunohistochemistry experiments (MISFISHIE)) [16- part of the schema. A set of Pixels is a 5D-structure containing 19] and we are experimenting with several scenarios for multiple 2D-frames collected across focus (z), wavelength or merging these efforts with OME. At present, several of these channel (c), and time (t), as described above. Sets of Pixels projects including OME are evaluating the new Web Ontology that are not continuous in space are treated as separate Language (OWL) recommendation from the World Wide images even though they may be part of the same experiment. Web consortium (W3C) to standardize ontology specification for the Semantic Web initiative [28]. At the moment, Experi- The Image's binary pixels are compressed and encoded in ment is defined in simple unstructured text entered by the base-64 as described above, with one plane per <BinData> user. This situation reflects our goals of not only defining a element. The schema allows for more than one set of Pixels in data model or ontology, but also building the tools for using an Image. A given image may consist of the original 'raw' pix- that model in demanding, experimentally relevant, data- els and a set of processed pixels as is often done for deconvo- intensive applications. However, it is worth noting that a sep- lution or restoration microscopy. Because these two sets of arate group has represented the OME Data Model within the pixels share the same acquisition metadata, they are grouped Resource Description Framework (RDF), and has begun together in the same image. using this implementation [29]. We are currently studying an implementation of OME in OWL, and whether an RDF-based A critical feature in this specification is a definition of what system provides the performance required for large-scale the data stored in 'Pixels' actually mean. The meaning of the imaging applications. pixels is stored as three attributes in <ChannelInfo>: Mode, ContrastMethod, and IlluminationType. Mode describes the microscopy method used to generate the pixels, and can take The OME Instrument type The OME Instrument type (Figure 3) provides a description on values such as 'Wide-field', 'Laser-scanning confocal', and of the data-acquisition instrument and defines the actual so on. ContrastMethod describes how contrast is developed in instrument as well as available configuration choices such as the type of microscopy used and can contain terms such as the objective lens, detector, and filter sets. Instrument also 'BrightField', 'DIC', or 'Fluorescence'. The IlluminationType defines the use and configuration of lasers or arc lamps and attribute describes how the sample was illuminated and can includes a specification for a secondary illumination source contain values of 'Transmitted', 'Epifluorescence', and (for example, a photoablation laser). Once defined in the 'Oblique'. Together these terms and their controlled vocabu- Instrument, the specific components used to acquire an lary describe how the pixels were acquired. Each <Chan- image (or a channel within an image) are referenced from nelInfo> has several internal elements that allow further within the Image or its ChannelInfo elements. The <Instru- refinement of the acquisition parameters by referring to com- ment> element is meant to define a static instrument com- ponents defined in the <Instrument>, such as filters and light posed of several components: one or more light sources, one sources. Each channel in the image has its own <Chan- or more detectors, filters, objectives, and so on. Because it nelInfo>, allowing the description of multimode images. does not change from image to image and has a globally unique LSID, it does not need to be defined in every OME file The metadata associated with a channel have an additional with images collected from it. The Image elements within the important feature made possible with the nested <Channel- OME File contain references to the instrument's components Component> element. In a fluorescence experiment, each along with any necessary parameters for their use (that is fluorescence channel would be described by a <Chan- detector gain). The Instrument may also contain several nelInfo>, and each of these would contain a single optical transfer functions (OTFs), which can be referred to <ChannelComponent> referring to an index in the c dimen- from the ChannelInfo element, allowing each channel within sion of the Pixels. However, in several imaging modes, each a set of pixels to specify its own OTF. channel may contain several components. For example, in fluorescence-lifetime imaging, each fluorescence channel may contain 128 bins of fluorescence-lifetime data. The image may consist of lifetime measurements for several fluores- Genome Biology 2005, 6:R47 R47.10 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. http://genomebiology.com/2005/6/5/R47 the channel dimension to effectively represent two dimen- sions - a logical channel containing all of the metadata and one or more components representing the actual data. The same mechanism can be used to represent data from FTIR imaging. Updating the OME file specification The OME XML file has been developed with input from the OME consortium and a number of commercial partners (see Figure 3 legend). However, the specification for this format is incomplete and doubtless will be updated to accommodate unanticipated requirements. Moreover, as new data acquisi- tions methods develop, new data semantics and elements will be required. However, modifications to the specification for this file must occur in stages, preceded by announcements, if it is to be used as an export format. The OME file allows mod- ifications to the schema to be implemented and tested through the Custom Attributes type. Proposed new types and elements can be tested and modified there, and then when fully worked out and agreed upon by the OME community, can then be merged into the main schema. The OME database It is formally possible to use a library of OME XML files as a data warehouse. A true image informatics system however, must also maintain a record of all transactions with the data warehouse, including all data transformations and analyses. Storing and recording image data is a first step; a defined set of interfaces and access methods to the data must be also be provided. For this reason, we have developed a second imple- mentation of the OME Data Model as a relational database that is accessed using a series of services and interfaces. All of these tools are open source and licensed under the GNU Lesser General Public License (LGPL) [30]. The initial design has been described previously [11] and a description of more recent updates is available [15]. Image metadata are captured by the OME database when it imports a recognized file for- mat, and are then available either by accessing the database directly or through a variety of interfaces into the OME data- base. These will be the subject of a future publication, but source code and documentation are available [31]. An impor- tant consequence is that all commonly available types of metadata are stored in common tables. It is not necessary to know the format of the underlying file in order to access this information. For example, to find the exposure time for a par- ticular image, one would look in the same table regardless of The Image Figure 4 element in the OME file schema the commercial imaging system used to record the data. The Image element in the OME file schema. The data elements that define the an image in the OME file are shown. These include the image itself The use of an OME database as a record of all data transfor- (Pixels), and a variety of characteristics of the image data and display parameters. Symbols are as in Figure 2. mations contrasts with the standard approach to image processing. In a stand-alone analysis program, data relation- ships are specified by the programmer and are therefore cence channels. In this case, each fluorescence channel would 'hard-coded'. The results, while useful, do not usually link to still be represented by a single <ChannelInfo>, but each of the original data or other analyses. In an OME database, an those would have 128 <ChannelComponent>s. This allows identical algorithm can be used, but the resulting data are Genome Biology 2005, 6:R47 comment reviews reports deposited research refereed research interactions information http://genomebiology.com/2005/6/5/R47 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. R47.11 returned to the database, and are linked to the algorithm that Images can be exported, along with their metadata, and ana- produced them. A subsequent analysis can gather its inputs lytic results and exposed to external software tools or from the database as well, without having to link directly to imported to a second OME database. This strategy solves the the previous algorithm directly. The links between measure- file-format problem that has so far plagued digital ments, results and the image data can be incorporated into microscopy. other analyses defined by the user. Trends and relationships between these can easily be tested. Most important, the com- OME database extensibility plete transactional record of data elements is known and is It is clear the OME Data Model, and its representation in a available, in effect creating a transfer function for data analy- specific instance of an OME database will be adapted to sis. This kind of data provenance for biological microscopy support local experimental requirements. We have imple- has sometimes resided in lab notebooks, sometimes coded in mented this within the OME server code simply by loading an filenames, or sometimes simply retained only in experiment- OME XML containing new STs and updating the existing ers' memories. With OME, it is finally stored, managed, and database on the fly. However, an inherent problem in sup- available in a generally accessible form. porting schema extension is a potential for incompatibility between different schemas. If an OME database exports an To function as planned, OME must ensure that requirements OME XML file with a locally modified data model, how can of different processing and analysis tools are satisfied before that file be accessed by another OME site? Since OME defines execution. To accomplish this, STs are used to govern what what are considered core STs, all other STs must be defined kinds of information can flow between analysis modules. In within the same document that contains data pertaining to OME, analysis modules can exchange information only if the them. During import, local STs and imported STs are consid- output of one has the same ST as the input of the next. This ered equal if their names, elements and element types are principle means that information can flow only between logi- equal. In this way, if the structure of an ST can be agreed cally and semantically similar data types, not simply between upon, the information it describes can be seamlessly inte- numerically similar data types. This ensures that users grated across different OME installations. If the structure of employ algorithms in a logically consistent manner without an extended ST is not agreed upon beforehand, then the STs necessarily an intimate knowledge of the algorithm itself. We are considered incompatible and their data are kept separate. have used this concept to implement a user tool called 'Chain If however, two STs have the same name, but different ele- Builder' (Figure 5a). This Java tool accesses the STs in an ments or element types, a name collision will result, and the OME database and allows a user to 'chain' analysis modules import will be rejected until the discrepancy is resolved. together, linking of separate modules by matching the output Because the agreed on meaning and structure of STs is essen- STs of one module with the input STs of the next. Thus OME tially a social contract and are not defined more formally, uses 'strong semantic typing', not only to store and maintain these name collisions must be resolved manually. A common data and metadata, but also to define permitted workflows approach to resolve name collisions is the use of namespaces and potential data relationships. - essentially a prefix to differentiate similar names from dif- ferent schemas. While namespaces solve the immediate prob- Figure 5b shows a second example of the use of STs. In this lem of collision, they do not address the underlying problem example, a data manager (Figure 5b, left) displays the - that ST names and their meanings have not been agreed on. Projects, Datasets, and Images belonging to one OME user. The disadvantage of using namespaces is they would not Right-clicking a Dataset opens a Dataset browser (Figure 5b, allow the information in these STs to be used interchangea- middle) and displays image thumbnails obtained from the bly, and it is this interoperability rather than mere coexist- OME database. The browser accesses data associated with ence that is the desired result. specific STs to define how an array of thumbnails should be presented to the user. In this case, the cell-cycle position of the cell in each image is used to define the layout (a more in- Discussion depth description of this tool is in preparation). Finally, a 5D- We have designed and built OME as a data storage, manage- image viewer (Figure 5b, right) allows viewing of the individ- ment and analysis system for biological microscopy. The data ual images, with display parameters based on data obtained model used by OME is represented in two distinct ways: a set from an OME database associated with appropriate STs (sig- of open-source software tools that use a relational database nal min, max, mean, and so on). for information storage, and an XML-based file format used for transmission of this information and storage outside of Data migration databases. The OME XML file format allows the exchange of Under most circumstances, the contents of a single OME highly structured information between independently devel- database will be available only to the local lab or facility. How- oped imaging systems, which we believe is a major hurdle in ever, data sharing and migration is often critical for collabo- microscopy today. The XML schema provides support for rations or when investigators move to a new site. In OME, image data, experimental and image metadata, and any gen- database export is achieved using the OME XML file. OME erated analytic results. The use of a self-describing XML Genome Biology 2005, 6:R47 R47.12 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. http://genomebiology.com/2005/6/5/R47 Usi Fign ug ST re 5s for visualization in OME Using STs for visualization in OME. Examples of the use of STs for visualization of data within an OME database are shown. These tools are Java applications that access OME via the OME remote framework [34]. All OME code is available [31]. (a) The Chain Builder, a tool that enables a user to build analysis chains by ensuring that the input requirements of a given module are satisfied by outputs from previous modules. This is achieved by accessing the STs for the inputs and outputs within an OME database. (b) The DataManager, DatasetBrowser and 5DViewer. The DataManager shows the relationships between Projects, Datasets and Images within an OME database. The DatasetBrowser modifies the display method for images within a given dataset depending on the values of data stored as STs within an OME database. The 5Dviewer allows visualization of individual images based on STs in an OME database. schema allows this format to satisfy local requirements and Our implementation of a relational database for digital micro- enables a strategy for updating schemas to satisfy new, scopy satisfies the absolute requirement for local extensibility incoming data types. This approach provides the infrastruc- of data models. We acknowledge the impossibility of defining ture to support systematic quantitative image analysis, and a single standard that encompasses all biological microscope satisfies an indispensable need as high-throughput imaging image data. However, using the self-describing OME XML gains wider acceptance as an assay system for functional file, we can mediate between different data models, and when genomic assays. necessary, update a local model so that it can send or receive data from a different model. In this way, OME considers data Genome Biology 2005, 6:R47 comment reviews reports deposited research refereed research interactions information http://genomebiology.com/2005/6/5/R47 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. R47.13 dialects as a compromise between a universal data language mum information about a microarray experiment (MIAME)- toward standards for microarray data. Nat Genet 2001, and a universe of separate languages. In general, although the 29:365-371. current OME system is focused on biological microscopy, its 24. Open Microscopy Environment OME: XML Schema 1.0 [http://openmicroscopy.org/XMLschemas/OME/FC/ome.xsd] concepts, and much of its architecture, can be adapted to any 25. Schema Doc: ome.xsd [http://openmicroscopy.org/XMLschemas/ data-intensive activity. OME/FC/ome_xsd/index.html] 26. XML Schemata: OME XML Overview [http://openmicros copy.org.uk/api/xml/OME] 27. Extensible Markup Language (XML) [http://www.w3.org/XML] Acknowledgements 28. OWL Web Ontology Reference Language [http:// www.w3.org/TR/owl-ref] We gratefully acknowledge helpful discussions with our academic and com- mercial partners [32]. Research in the authors' laboratories is supported by 29. Hunter J, Drennan J, Little S: Realizing the hydrogen economy through semantic web technologies. IEEE Intell Syst 2004, grants from the Wellcome Trust (068046 to J.R.S.), the National Institutes of Health (I.G.G.), the Harvard Institute of Chemistry and Cell Biology 19:40-47. 30. GNU Lesser General Public License [http://www.gnu.org/copyl (P.K.S), and NIH grant GM068762 (P.K.S). J.R.S. is a Wellcome Trust Senior Research Fellow. eft/lesser.html] 31. Open Microscopy Environment: CVS (UK) [http://cvs.openmi croscopy.org.uk] 32. About OME - Commercial Partners [http://www.openmicros References copy.org/about/partners.html] 33. Andrews PD, Harper IS, Swedlow JR: To 5D and beyond: quanti- 1. Phair RD, Misteli T: Kinetic modelling approaches to in vivo tative fluorescence microscopy in the postgenomic era. Traf- imaging. Nat Rev Mol Cell Biol 2001, 2:898-907. fic 2002, 3:29-36. 2. Eils R, Athale C: Computational imaging in cell biology. J Cell Biol 34. Remote Framework - Introduction [http://openmicros 2003, 161:477-481. copy.org.uk/api/remote] 3. Lippincott-Schwartz J, Snapp E, Kenworthy A: Studying protein dynamics in living cells. Nat Rev Mol Cell Biol 2001, 2:444-456. 4. Wouters FS, Verveer PJ, Bastiaens PI: Imaging biochemistry inside cells. Trends Cell Biol 2001, 11:203-211. 5. Ponti A, Machacek M, Gupton SL, Waterman-Storer CM, Danuser G: Two distinct actin networks drive the protrusion of migrat- ing cells. Science 2004, 305:1782-1786. 6. Huang K, Murphy RF: Boosting accuracy of automated classifi- cation of fluorescence microscope images for location proteomics. BMC Bioinformatics 2004, 5:78. 7. Hu Y, Murphy RF: Automated interpretation of subcellular patterns from immunofluorescence microscopy. J Immunol Methods 2004, 290:93-105. 8. Yarrow JC, Feng Y, Perlman ZE, Kirchhausen T, Mitchison TJ: Phe- notypic screening of small molecule libraries by high throughput cell imaging. Comb Chem High Throughput Screen 2003, 6:279-286. 9. Simpson JC, Wellenreuther R, Poustka A, Pepperkok R, Wiemann S: Systematic subcellular localization of novel proteins identi- fied by large-scale cDNA sequencing. EMBO Rep 2000, 1:287-292. 10. Conrad C, Erfle H, Warnat P, Daigle N, Lorch T, Ellenberg J, Pep- perkok R, Eils R: Automatic identification of subcellular pheno- types on human cell arrays. Genome Res 2004, 14:1130-1136. 11. Swedlow JR, Goldberg I, Brauner E, Sorger PK: Informatics and quantitative analysis in biological imaging. Science 2003, 300:100-102. 12. Huang K, Lin J, Gajnak JA, Murphy RF: Image Content-based retrieval and automated interpretation of fluorescence microscope images via the Protein Subcellular Location Image Database. Proc IEEE Symp Biomed Imaging 2002:325-328. 13. Carazo JM, Stelzer EH, Engel A, Fita I, Henn C, Machtynger J, McNeil P, Shotton DM, Chagoyen M, de Alarcon PA, et al.: Organising multi-dimensional biological image information: the BioIm- age Database. Nucleic Acids Res 1999, 27:280-283. 14. Schuldt A: Images to reveal all? Nat Cell Biol 2004, 6:909. 15. Open Microscopy Environment [http://openmicroscopy.org] 16. MGED NETWORK: MGED Ontology [http://mged.source forge.net/ontologies/MGEDontology.php] 17. Gene Ontology [http://www.geneontology.org] 18. MGED NETWORK: MISFISHIE Standard Working Group [http://mged.sourceforge.net/misfishie] 19. OBO Main [http://obo.sourceforge.net] 20. EAMNET [http://www.embl-heidelberg.de/eamnet/html/down loads.html] 21. Murphy RF: Automated interpretation of protein subcellular location patterns: implications for early cancer detection and assessment. Ann NY Acad Sci 2004, 1020:124-131. 22. Sourceforge.net: Project Info - LSID [http://sourceforge.net/ projects/lsid] 23. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, et al.: Mini- Genome Biology 2005, 6:R47 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Genome Biology Springer Journals

The Open Microscopy Environment (OME) Data Model and XML file: open tools for informatics and quantitative analysis in biological imaging

Loading next page...
 
/lp/springer-journals/the-open-microscopy-environment-ome-data-model-and-xml-file-open-tools-x0t8Jj24FQ

References (53)

Publisher
Springer Journals
Copyright
2005 Goldberg et al.; licensee BioMed Central Ltd.
eISSN
1474-760X
DOI
10.1186/gb-2005-6-5-r47
Publisher site
See Article on Publisher Site

Abstract

The Open Microscopy Environment (OME) defines a data model and a software implementation to serve as an informatics framework for imaging in biological microscopy experiments, including representation of acquisition parameters, annotations and image analysis results. OME is designed to support high-content cell-based screening as well as traditional image analysis applications. The OME Data Model, expressed in Extensible Markup Language (XML) and realized in a traditional database, is both extensible and self-describing, allowing it to meet emerging imaging and analysis needs. the sample [1-4]. Numerical analytic methods extract infor- Rationale Biological microscopy has always required an 'imaging' capa- mation from quantitative image data that cannot be gleaned bility: traditionally, the image of a sample was drawn on by simple inspection [5-7]. Growing interest in high-through- paper, or with the advent of light-sensitive film, recorded on put cell-based screening of small molecule, RNAi, and expres- media that conveniently allowed reproduction. The advent of sion libraries (high-content screening) has highlighted the digital detectors in microscopy has progressively expanded large volume of data these methods generate and the require- imaging capacity, transforming the biological microscope ment for informatics tools for biological images [8-10]. into an assay device that linearly measures the flux of light at different points in a cell or tissue. Almost all the vast clinical In its most basic form, an image-informatics system must and research applications of digital imaging microscopy treat accurately store image data obtained from microscopes with the recorded microscope image as a quantitative measure- a wide range of imaging modes and capabilities, along with ment. This is especially true for fluorescence or biolumines- accessory information (termed metadata) that describe the cence, where the signal recorded at any point in the sample experiment, the acquisition system, and basic information gives a direct measure of the number of target molecules in about the user, experimenter, date, and so on [11,12]. At first Genome Biology 2005, 6:R47 R47.2 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. http://genomebiology.com/2005/6/5/R47 glance, it might appear that these requirements can be met by images related to specific publications has been proposed applying some of the tools that underpin modern biology, [13,14], but this cannot happen without adaptable data man- such as the informatics approaches developed for genomics. agement systems in each lab or facility. The only viable However, it is worth comparing a genome-sequencing exper- approach is the provision of a standardized data model that iment to a cellular imaging experiment. In genomics, knowl- supports local extensibility. Local instances of the data model edge of the type of automated sequencer that was used to that store site-specific data and manage access to it must be determine the DNA sequence ATGGAC... is not necessary to provided along with a mechanism for data sharing or migra- interpret the sequence. Moreover, the result ATGGAC... is tion between sites. These requirements are shared by other deterministic - no further analysis is required to 'know' the data-intensive methodologies (for example, mass spectrome- sequence, and in general, the same result will be obtained try and two-dimensional gel electrophoresis). Thus, a major from other samples from the same organism. By contrast, an challenge is the design and implementation of a system for image of a cell can only be understood if we know what type multidimensional images, experimental metadata, and ana- of cell it is, how it has been grown and prepared for imaging, lytical results that are commonly generated in biological which stains or fluorescent tags have been used to label sub- microscopy that will also be generally adaptable to many dif- cellular structures, and the imaging methodology that was ferent types of data. used to record it. For image processing, knowledge of the optical transfer function, spectral properties and noise char- To make it possible to manipulate and share image data as acteristics of the microscope are all critical. Interpretation of readily as genomic data, we are building an image-manage- results from image analysis requires knowledge of the precise ment system geared to the specific needs of quantitative characteristics of the algorithms used to extract quantitative microscopy. The major focus of the Open Microscopy Envi- information from images. Indeed, deriving information from ronment (OME) [11,15] is not on creating image-analysis images is completely dependent on contextual information algorithms, but rather on the development of software and that may vary from experiment to experiment. These require- protocols that allow image data from any microscope to be ments are not met by traditional genomics tools and thus stored, shared and transformed without loss of image data or demand a new kind of bioinformatics focused on experimen- information about the experimental setting, the imaging sys- tal metadata and analytic results. tem or the processing software. OME provides a data model that can integrate with other efforts to define experimental, In the absence of integrated solutions to image data manage- genomic, and biological ontologies [16-19] and that is suitable ment, it has become standard practice to migrate large for traditional low-volume microscopy and for high-through- amounts of data through multiple file formats as different put image-based screening. This data model is implemented analysis or visualization methods are employed. Moreover, in a relational database and application server to import, while some commercial microscope image formats record store, process, view and export data. The OME Data Model is system configuration parameters, this information is always also implemented in an Extensible Markup Language (XML) lost during file format conversion or data migration. Once an file format that makes it possible to transfer OME files analysis is carried out, the results are usually exported to a between OME databases and exchange them with other soft- spreadsheet program like Microsoft Excel for further calcula- ware, including that provided by commercial vendors. OME tions or graphing. The connections between the results of does not replace or compete with existing commercial soft- image analyses, a graphical output, the original image data ware for controlling microscopes, acquiring images or per- and any intermediate steps are lost, so that it is impossible to forming image restoration. Instead, it serves as a neutral systematically dissect or query all the elements of the data broker among a multitude of otherwise incompatible soft- analysis chain. Finally, the data model used in any imaging ware tools. system varies from site to site, depending on the local experi- mental and acquisition system. It can also change over time, In our previous work [11], we described the conceptual foun- as new acquisition systems, imaging technologies, or even dation for an image informatics system. In this report we new assays are developed. The development and application describe the implementation of this system, including details of new imaging techniques and analytic tools will only accel- of the OME XML file format, a description of how images are erate, but the requirement for coherent data management represented both in the file format and in the data model, the and adaptability of the data model remain unsolved. It is clear application of semantic types for metadata extensibility as that a new approach to data management for digital imaging well as their use in modular image analysis, and describe is necessary. recently developed software that makes use of this system and is targeted at end-users. The current version of OME focuses It might be possible to address these problems using a single on fluorescence microscopy, but the underlying schema and image data standard or a central data repository. However, a file specifications can be extended to support any type of single data format specified by a standards body breaks the microscope image. The OME XML file format has already requirement for local extensibility and would therefore be gained acceptance within the microscopy community. At the ignored. A central image data depository that stores sets of time of writing, two companies support the format in their Genome Biology 2005, 6:R47 comment reviews reports deposited research refereed research interactions information http://genomebiology.com/2005/6/5/R47 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. R47.3 current commercial offerings (Applied Precision, Issaquah, any given dataset may belong to one or more projects. Each WA and Bitplane, Zurich, Switzerland), and it has been pro- project and dataset has its own name, description and owner. posed as a standard recommendation for image data migra- tion by the European Advanced Microscopy Network [20]. The OME Data Model allows for other types of image collec- Immediate applications for OME within biomedical research tions. Explicit support is included for high-content assays include the characterization of dynamic cell and tissue struc- (HCAs) conducted on microtiter plates or other arraying for- tures for basic research, high-content cell-based screening mats. In this case, the OME Data Model allows for an addi- and high-performance clinical microscopy. tional grouping hierarchy: 'Plates', 'Screens', 'Wells', and 'Samples'. Samples are groups of images from one well, Plates are groups of Wells, and Screens are groups of Plates. Just like Projects and Datasets, each level of the hierarchy has its own Definition of an image All imaging experiments occur within specific temporal and set of identifiers. It is also possible for a given plate to belong spatial limits. In OME, we define an image as a five-dimen- to multiple screens, thereby providing a logical mechanism sional (5D) structure containing multiple two-dimensional for reuse of the same collection of data for different analyses. (2D) frames (Figure 1a). Each frame has dimensions (x, y) Similarly, a mechanism is provided for categorizing images that correspond to the image plane of the microscope and is into arbitrary user-defined groups. recorded from an array detector (for example a CCD camera in a wide-field microscope) or generated by a two-dimen- An additional level of hierarchy below images included in the sional raster scan (for example, a laser scanning confocal OME Data Model is 'Features'. Although there is some con- microscope). Each frame has a specified focal position z, a flict of nomenclature in what is considered an image feature wavelength, or more generally channel, c, and timepoint t. between areas of machine learning and traditional image The extent of a 5D-image is unlimited. The time and channel analysis, in OME's case, image features are 'regions' in an dimensions may be continuous or discrete. For example, the image (for example cells or nuclei). Numerical descriptors image may contain an entire spectrum at each pixel as in Fou- used for classification content are then referred to as 'Signa- rier Transform Infrared (FTIR) imaging, or it may consist of tures' [21]. The OME Data Model allows features to contain a set of discrete wavelengths such as commonly seen in fluo- other features, so that, for example, the relationship between rescence microscopy. Similarly, there may be a continuous a cell, a nucleus and a nucleolus can be expressed. At present, series of time points that are evenly spaced, as in a video we do not specify an ontology for the kinds of information an stream, or the image may contain unevenly spaced, discrete image feature may contain. Any information obtained by seg- time points. Images that are not continuous in space are mentation algorithms, or other algorithms that define Fea- treated as separate images even though they may be part of tures is stored using the data model's extensibility the same experiment. For example, visiting several places on mechanism (see Semantic types below). a microscope slide or a microtiter plate will result in as many separate images. Finally, the meaning of the pixel values Semantic types recorded in each frame are determined by the imaging All information in the OME Data Model can be reduced to method performed (Figure 1b). 'semantic types' (STs). In most ways, this is merely a name or label given to a piece of information, but in OME it has addi- tional consequences. STs can describe information at four levels in the OME hierarchy: Global, Dataset, Image and Fea- The OME Data Model To solve the problems of data interoperability and extensibil- ture. Global STs are used to describe 'Experimenters', ity, we have developed a definition, or ontology, of the differ- 'Groups', 'Microscopes', and so on - items that are applicable ent data types and relationships included in an imaging to all images in an OME database. Dataset STs are used to experiment. The OME Data Model integrates binary image describe information about datasets - information pertinent data and all information regarding the image acquisition and to a collection of images. Image STs describe information per- processing, and any results generated during analysis. In this tinent to images, and feature STs describe information about way, all aspects of the data acquisition, processing, and anal- image features - objects or 'blobs' within images. In our ysis remain linked and can be used by any analysis or visuali- nomenclature, the data type is an ST, and the data itself is an zation application. Groups of Images can be organized into attribute. For example, the 'Pixels' data type is an Image ST, 'Datasets' and 'Projects'. (Throughout this paper, when refer- and a particular set of Pixels is an attribute of a particular ring specifically to OME objects (such as Projects, Datasets, Image. Throughout this paper XML elements defined in the Images, Pixels, and Features), they are capitalized.) Datasets OME XML schema are placed within angle brackets (<>). are user-defined groups of images that are always analyzed together: an example would be images from a single immun- Data model extensibility ofluorescence experiment. An image may belong to one or Standardizing access to data solves many problems, but could more datasets. Projects in turn are collections of datasets, and severely limit the types of data that might be stored. Because it is not possible to define a priori what kinds of imaging Genome Biology 2005, 6:R47 R47.4 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. http://genomebiology.com/2005/6/5/R47 Optical sections (a) Spectral coding Timelapse ∆ focus ∆ time ∆ wavelength Single frame from CCD or laser scan ∆ position 1 2 3 4 (b) Contrast method Imaging mode Brightfield Wide-field Phase Laser scanning confocal DIC Spinning disk confocal Hoffman modulation Multi-photon Oblique illumination Structured illumination Polarized light Single molecule Darkfield Total internal reflection Fluorescence Fluorescence lifetime Fluorescence correlation Second harmonic generation Figure 1 (see legend on next page) Genome Biology 2005, 6:R47 comment reviews reports deposited research refereed research interactions information http://genomebiology.com/2005/6/5/R47 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. R47.5 Fi Thg e mo ure 1 de of (seeac previous quisitio page n de) fines the pixel image data The mode of acquisition defines the pixel image data. The meaning of a 2D-image recorded from a digital microscope imaging system varies depending on how it is collected. Almost all of the different modes in (a) and (b) can be combined to analyze cell structure and behavior. All of the parameters and configurations must be somehow recorded for the interpretation of the pixel data in an image. (a) The spatial, spectral and temporal context of an image is used to generate more information about the cell under study. Changing stage position, focus, spectral range or time of imaging all expand the meaning of an image. Modified from [33]. (b) The two aspects of the image data collection that define the pixel data. A variety of methods are used to generate contrast in modern biological imaging. In addition, the imaging method used to record the data also has meaning. experiments and analyses will be performed, it is not possible <CustomAttributes> to design a data model to contain this information ahead of time. For this reason, we have included a mechanism for <AcquisitionTiming theZ='0' theC='0' describing new types of data in the OME Data Model. As one theT='0' deltaT='0.001'/> of our goals is to define a common ontology for light micros- copy, the STs that make up this ontology are part of the 'core </CustomAttributes> set', whereas other STs can be locally defined to address evolving imaging needs. Since the data model contains its Importantly, our open-source implementation of OME (see own description, it can be extended in arbitrary ways. As below) will automatically expand its database schema when it these extensions become commonly used, the STs that define comes across an ST definition, and will populate the resulting them can be incorporated into the core set. The initial core set tables when it comes across the data in <CustomAttributes>. is concerned chiefly with acquisition parameters so that This approach allows for immense flexibility in the ontologies image data can be interpreted unambiguously. As the project OME can support. evolves, analytical STs will be incorporated into the core set in order to achieve interoperability not only at the level of inter- IDs and references preting raw image data, but also at the level of interpreting OME has adopted the Life Science ID (LSID) system of data image analysis results. registration [22]. Since LSIDs are universally unique, every piece of information stored using the OME Data Model can be Consider an example where a commercial software vendor traced to its source - regardless of how it was produced. Every might specify additional metadata in the timing information OME element that has an ID attribute may follow the LSID for acquisition of Z sections in an XYZ 3D stack of image format, but this is not a requirement. If a particular ID does planes. As the timing information would pertain to specific not follow the LSID format (it does not start with 'urn:lsid:'), images, this new data type would be declared as an Image ST. it must be assumed that this is a 'brand new' object. While this More specifically, since the timing information pertains to is a valid assumption for data, it may not be valid for an individual planes within the 5D Image, a set of plane indexes instrument description. For this reason actual globally would be included in the definition referring to a specific unique LSIDs are preferred whenever possible - especially for plane. The timing information itself can be expressed as a global data (such as Experimenters, Screens, Plates, Micro- delta-time or an absolute time (or both), and may have units scopes). If the object is identified with a proper LSID, it can that are either implied or made explicit. Regardless of how the be referred to from other documents. In this way, a single timing is expressed, it is understood that any software that document can be used to describe a microscope and its com- uses this newly declared ST agrees on the convention adopted ponents, and subsequent documents containing images can and the precise meaning of the data it represents. This agree- refer to these components by LSID. There are open-source ment on meaning allows any software application to implementations of LSID servers (resolvers) and clients exchange acquisition timing information with any other. developed by IBM Life Sciences available online [22] that make it possible to resolve an LSID remotely. Although we Using OME XML (see OME XML file below), this declaration plan to incorporate LSID resolution into OME software tools, would be stored in the <SemanticTypeDefinitions> element at the time of writing, support for LSIDs are only incorpo- in the XML document, while the timing information itself rated into the OME Data Model. (the attributes) would be stored under the <CustomAttrib- utes> element for the specific image. The names of the ele- The globally unique nature of LSIDs allows OME to trace ments under <CustomAttributes> match the names of the every piece of information back to its origin. Provenance and STs, and the data itself goes into the element's attributes. For data history will be discussed in a future report detailing the example: OME analysis system, but the use of LSIDs and a representa- tion of data history is sufficient to determine the origin of every piece of information about an image. From precisely Genome Biology 2005, 6:R47 R47.6 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. http://genomebiology.com/2005/6/5/R47 where, when and how the image was acquired, through any analysis that was done, to any structured information or conclusions that were derived as a result of analysis. LSIDs allow preservation of this chain of provenance regardless of the number of intermediate documents, and proprietary or open-source OME-compatible software systems that oper- ated on this information. The OME XML file The OME Data Model serves as the foundation of two tools we have developed to address the requirement for extensible image data management. The first addresses the absence of a universally recognized image data file format. We have built an XML-based implementation of the OME Data Model that can be used by manufacturers of acquisition hardware and developers of image-processing and analysis software who may not want to invent their own image format. With this def- inition, it is possible to specify a minimal set of commonly used parameters during image acquisition in light micros- copy, analogous to the MIAME standard that defines a mini- mal set of information about microarray experiments [23]. All the characteristics of the OME Data Model described above are reproduced in the OME XML file. Along with each 5D image (that is, the binary pixels), the OME XML file con- tains all of the associated metadata. The OME file schema [24] and the full documentation for the schema [25] are avail- able online. A description of how the schema is designed and its relationship to other OME schemas is also available online [26]. Figures 2, 3, 4 highlight some of the features of the schema. In these figures, the highest level in the schema is on the left side of the diagram, and the elements defined in it are read moving from left to right. Why XML? The structure of the OME XML document is defined in XML Schema, which is a standard language for defining XML doc- ument structure [27]. The use of XML and a publicly available schema allows OME documents to be used in several ways that are not possible with current image formats. For exam- ple, modern browsers incorporate XML parsers, and are able to display the information contained in XML with the use of a style sheet, thus allowing customized display of data in the document using a standard browser without additional soft- ware. The use of XML also allows us to take advantage of its growing popularity in various unrelated fields - including a great deal of software written for XML, including databases, editing tools, and parsing libraries. Finally, and perhaps most important, XML is a plain-text format. As a last resort, it can be opened in any text editor and the information it contains can simply be read by a person. This inherent openness is one of its most desirable features for representing scientific data. Figure 2 Defining the OME file using XML Schema allows other advantages. The document structure is specified in a form Genome Biology 2005, 6:R47 comment reviews reports deposited research refereed research interactions information http://genomebiology.com/2005/6/5/R47 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. R47.7 gzip format is a standard encoding for XML, so any XML soft- H Fiig gh u-le rev 2 el view of the elements in the OME file schema ware library will transparently read and write these com- High-level view of the elements in the OME file schema. This figure (and pressed files even though the compressed file will no longer Figures 3 and 4) should be read from left to right. A data type (for be readable by standard text editors. However, this secondary example, OME) is defined by a number of elements. In this case, OME is defined by Project, Dataset, Experiment, Image, and so on. Each of these compression will only eliminate the base64 encoding over- elements can be defined by their own individual elements. The Image and head - it will not further compress already compressed Instrument elements are expanded in Figures 3 and 4. The full XML planes. schema is available [24]. The full documentation for the schema is also available [25]. +, One or more elements of this type; ?, optional element or attribute; *, zero or more elements of this type; 1, choose one from a There are limitations to the use of this compression scheme. list of elements; D, the value of this element/attribute is constrained to Performing the compression on a per-plane basis allows lim- one of several values, a range, or a text pattern (see the online ited random access to the planes. The entire XML file need documentation for more details [25]). not be kept in memory in order to access arbitrary planes by index, but a file offset cannot be calculated for a given plane due to their different sizes when compressed. Instead, the that can be parsed, which allows third-party software to vali- entire file has to be scanned first in order to determine the file date XML documents against our published schema. This for- offsets for each plane index. It is important to note that the mal specification allows other parties to implement this primary goal of the OME XML file format is not raw perform- format without the potential misunderstanding and incom- ance, but interoperability above all else, using widely patibility that is common with textual descriptions of file for- accepted standards and practices for information exchange. mats. For example, several manufacturers are either As the OME XML file format has gained acceptance, a developing or have developed support for the OME file format demand for a high-performance variant has begun to emerge, and we are examining several possibilities that preserve the independently of each other and, to a large extent, independ- ently of our group of developers. No exchange of intellectual metadata structure that we have defined, but allow rapid property or reverse engineering is necessary to accomplish reading and writing from disc. this. The XML Schema is the definitive documentation for reading and writing OME XML files, used in the same way by third-party developers for proprietary software, as well as by Schema overview ourselves for our own open-source implementation. Figure 2 shows the main elements of the OME XML file schema. As discussed above, each image is defined as being There are a few disadvantages to XML worth considering. A part of a dataset and project, and when necessary, a given commonly perceived weakness of XML is that its human- plate and screen. The stored data is also related to the exper- readable design is often at odds with the storage of binary imenter that collected the data and his or her group. Any data. Since the bulk of an image file is represented by the pix- additional types of global data including customized or ven- dor-specific data can be defined at this level. Images and els in the image and not the metadata, this might be perceived as a serious problem. A related problem is that XML is ver- Instruments are defined as discussed below. Many of the ele- bose - XML files are often much larger than their binary ments contain IDs that uniquely identify that data element - equivalents, and image files are already quite large. The pro- Experimenter, Dataset. If these identifiers follow the LSID format they are considered globally unique and can be used as posed format addresses these two concerns by storing binary data in plain text and reducing file size using compression. references between other OME XML documents or remote OME installations. The standard approach to representing binary data in XML is This format allows for an arbitrary number of images to be with the use of base64 encoding. A 24-digit base 2 binary number (three bytes) is converted to a 4-digit base 64 number described and their relationships and grouping patterns spec- (four bytes) with each digit represented as a text character ified in a single document. Conversely, the file may describe using all the numbers, upper- and lowercase letters and two only the imaging equipment, users, or other parameters at a given site and not contain any images. Subsequent docu- punctuation marks. This conversion inflates the size of the binary data by 25%. To mitigate this increase in size, OME ments can refer to these items by LSID. Or, as is done in other XML specifies compression of the pixels on a per-plane basis formats, the file can be used to specify a single image and its in either bzip2 or gzip, both patent-free compression schemes accompanying metadata. As any information not specified in the schema must be represented as well, a section is dedicated available in open-source form online. Owing to the high com- pressibility of image data, OME XML files are in practice to defining new types of information (<SemanticTypeDecla- much smaller than their equivalents in other formats, usually rations>). The information itself is specified at the appropri- a half to a third the size of uncompressed binary data. Because ate hierarchy level within the <CustomAttributes> elements that exist in <OME>, <Dataset>, <Image> and <Feature>. the compressed stream is still encoded in base64, it still incurs the 25% overhead, but on a much smaller piece of binary data. Of course text is itself easily compressed, and the Genome Biology 2005, 6:R47 R47.8 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. http://genomebiology.com/2005/6/5/R47 Figure 3 (see legend on next page) Genome Biology 2005, 6:R47 comment reviews reports deposited research refereed research interactions information http://genomebiology.com/2005/6/5/R47 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. R47.9 The Figu In re stru 3 (see ment elemen previous page t in t ) he OME file schema The Instrument element in the OME file schema. The data elements that define the acquisition system parameters are shown. For these descriptions, we have incorporated suggestions from many colleagues and commercial partners [32]. Symbols are as in Figure 2. The least developed aspect of the OME schema is the Experi- The OME Image type ment description. Although clearly a critical part of the meta- The OME Image type (Figure 4) provides a description of the data, the design of this ontology is under development by structure, format, and display of the image data. There are many other groups (for example, MIAME/MAGE, Gene references to the light source, spectral filtering, imaging Ontology (GO), Proteomics Standards Initiative (PSI), and method, and display settings used for each channel. The minimum information specification for in situ hybridization actual binary data, referred to as 'Pixels' are also stored in this and immunohistochemistry experiments (MISFISHIE)) [16- part of the schema. A set of Pixels is a 5D-structure containing 19] and we are experimenting with several scenarios for multiple 2D-frames collected across focus (z), wavelength or merging these efforts with OME. At present, several of these channel (c), and time (t), as described above. Sets of Pixels projects including OME are evaluating the new Web Ontology that are not continuous in space are treated as separate Language (OWL) recommendation from the World Wide images even though they may be part of the same experiment. Web consortium (W3C) to standardize ontology specification for the Semantic Web initiative [28]. At the moment, Experi- The Image's binary pixels are compressed and encoded in ment is defined in simple unstructured text entered by the base-64 as described above, with one plane per <BinData> user. This situation reflects our goals of not only defining a element. The schema allows for more than one set of Pixels in data model or ontology, but also building the tools for using an Image. A given image may consist of the original 'raw' pix- that model in demanding, experimentally relevant, data- els and a set of processed pixels as is often done for deconvo- intensive applications. However, it is worth noting that a sep- lution or restoration microscopy. Because these two sets of arate group has represented the OME Data Model within the pixels share the same acquisition metadata, they are grouped Resource Description Framework (RDF), and has begun together in the same image. using this implementation [29]. We are currently studying an implementation of OME in OWL, and whether an RDF-based A critical feature in this specification is a definition of what system provides the performance required for large-scale the data stored in 'Pixels' actually mean. The meaning of the imaging applications. pixels is stored as three attributes in <ChannelInfo>: Mode, ContrastMethod, and IlluminationType. Mode describes the microscopy method used to generate the pixels, and can take The OME Instrument type The OME Instrument type (Figure 3) provides a description on values such as 'Wide-field', 'Laser-scanning confocal', and of the data-acquisition instrument and defines the actual so on. ContrastMethod describes how contrast is developed in instrument as well as available configuration choices such as the type of microscopy used and can contain terms such as the objective lens, detector, and filter sets. Instrument also 'BrightField', 'DIC', or 'Fluorescence'. The IlluminationType defines the use and configuration of lasers or arc lamps and attribute describes how the sample was illuminated and can includes a specification for a secondary illumination source contain values of 'Transmitted', 'Epifluorescence', and (for example, a photoablation laser). Once defined in the 'Oblique'. Together these terms and their controlled vocabu- Instrument, the specific components used to acquire an lary describe how the pixels were acquired. Each <Chan- image (or a channel within an image) are referenced from nelInfo> has several internal elements that allow further within the Image or its ChannelInfo elements. The <Instru- refinement of the acquisition parameters by referring to com- ment> element is meant to define a static instrument com- ponents defined in the <Instrument>, such as filters and light posed of several components: one or more light sources, one sources. Each channel in the image has its own <Chan- or more detectors, filters, objectives, and so on. Because it nelInfo>, allowing the description of multimode images. does not change from image to image and has a globally unique LSID, it does not need to be defined in every OME file The metadata associated with a channel have an additional with images collected from it. The Image elements within the important feature made possible with the nested <Channel- OME File contain references to the instrument's components Component> element. In a fluorescence experiment, each along with any necessary parameters for their use (that is fluorescence channel would be described by a <Chan- detector gain). The Instrument may also contain several nelInfo>, and each of these would contain a single optical transfer functions (OTFs), which can be referred to <ChannelComponent> referring to an index in the c dimen- from the ChannelInfo element, allowing each channel within sion of the Pixels. However, in several imaging modes, each a set of pixels to specify its own OTF. channel may contain several components. For example, in fluorescence-lifetime imaging, each fluorescence channel may contain 128 bins of fluorescence-lifetime data. The image may consist of lifetime measurements for several fluores- Genome Biology 2005, 6:R47 R47.10 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. http://genomebiology.com/2005/6/5/R47 the channel dimension to effectively represent two dimen- sions - a logical channel containing all of the metadata and one or more components representing the actual data. The same mechanism can be used to represent data from FTIR imaging. Updating the OME file specification The OME XML file has been developed with input from the OME consortium and a number of commercial partners (see Figure 3 legend). However, the specification for this format is incomplete and doubtless will be updated to accommodate unanticipated requirements. Moreover, as new data acquisi- tions methods develop, new data semantics and elements will be required. However, modifications to the specification for this file must occur in stages, preceded by announcements, if it is to be used as an export format. The OME file allows mod- ifications to the schema to be implemented and tested through the Custom Attributes type. Proposed new types and elements can be tested and modified there, and then when fully worked out and agreed upon by the OME community, can then be merged into the main schema. The OME database It is formally possible to use a library of OME XML files as a data warehouse. A true image informatics system however, must also maintain a record of all transactions with the data warehouse, including all data transformations and analyses. Storing and recording image data is a first step; a defined set of interfaces and access methods to the data must be also be provided. For this reason, we have developed a second imple- mentation of the OME Data Model as a relational database that is accessed using a series of services and interfaces. All of these tools are open source and licensed under the GNU Lesser General Public License (LGPL) [30]. The initial design has been described previously [11] and a description of more recent updates is available [15]. Image metadata are captured by the OME database when it imports a recognized file for- mat, and are then available either by accessing the database directly or through a variety of interfaces into the OME data- base. These will be the subject of a future publication, but source code and documentation are available [31]. An impor- tant consequence is that all commonly available types of metadata are stored in common tables. It is not necessary to know the format of the underlying file in order to access this information. For example, to find the exposure time for a par- ticular image, one would look in the same table regardless of The Image Figure 4 element in the OME file schema the commercial imaging system used to record the data. The Image element in the OME file schema. The data elements that define the an image in the OME file are shown. These include the image itself The use of an OME database as a record of all data transfor- (Pixels), and a variety of characteristics of the image data and display parameters. Symbols are as in Figure 2. mations contrasts with the standard approach to image processing. In a stand-alone analysis program, data relation- ships are specified by the programmer and are therefore cence channels. In this case, each fluorescence channel would 'hard-coded'. The results, while useful, do not usually link to still be represented by a single <ChannelInfo>, but each of the original data or other analyses. In an OME database, an those would have 128 <ChannelComponent>s. This allows identical algorithm can be used, but the resulting data are Genome Biology 2005, 6:R47 comment reviews reports deposited research refereed research interactions information http://genomebiology.com/2005/6/5/R47 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. R47.11 returned to the database, and are linked to the algorithm that Images can be exported, along with their metadata, and ana- produced them. A subsequent analysis can gather its inputs lytic results and exposed to external software tools or from the database as well, without having to link directly to imported to a second OME database. This strategy solves the the previous algorithm directly. The links between measure- file-format problem that has so far plagued digital ments, results and the image data can be incorporated into microscopy. other analyses defined by the user. Trends and relationships between these can easily be tested. Most important, the com- OME database extensibility plete transactional record of data elements is known and is It is clear the OME Data Model, and its representation in a available, in effect creating a transfer function for data analy- specific instance of an OME database will be adapted to sis. This kind of data provenance for biological microscopy support local experimental requirements. We have imple- has sometimes resided in lab notebooks, sometimes coded in mented this within the OME server code simply by loading an filenames, or sometimes simply retained only in experiment- OME XML containing new STs and updating the existing ers' memories. With OME, it is finally stored, managed, and database on the fly. However, an inherent problem in sup- available in a generally accessible form. porting schema extension is a potential for incompatibility between different schemas. If an OME database exports an To function as planned, OME must ensure that requirements OME XML file with a locally modified data model, how can of different processing and analysis tools are satisfied before that file be accessed by another OME site? Since OME defines execution. To accomplish this, STs are used to govern what what are considered core STs, all other STs must be defined kinds of information can flow between analysis modules. In within the same document that contains data pertaining to OME, analysis modules can exchange information only if the them. During import, local STs and imported STs are consid- output of one has the same ST as the input of the next. This ered equal if their names, elements and element types are principle means that information can flow only between logi- equal. In this way, if the structure of an ST can be agreed cally and semantically similar data types, not simply between upon, the information it describes can be seamlessly inte- numerically similar data types. This ensures that users grated across different OME installations. If the structure of employ algorithms in a logically consistent manner without an extended ST is not agreed upon beforehand, then the STs necessarily an intimate knowledge of the algorithm itself. We are considered incompatible and their data are kept separate. have used this concept to implement a user tool called 'Chain If however, two STs have the same name, but different ele- Builder' (Figure 5a). This Java tool accesses the STs in an ments or element types, a name collision will result, and the OME database and allows a user to 'chain' analysis modules import will be rejected until the discrepancy is resolved. together, linking of separate modules by matching the output Because the agreed on meaning and structure of STs is essen- STs of one module with the input STs of the next. Thus OME tially a social contract and are not defined more formally, uses 'strong semantic typing', not only to store and maintain these name collisions must be resolved manually. A common data and metadata, but also to define permitted workflows approach to resolve name collisions is the use of namespaces and potential data relationships. - essentially a prefix to differentiate similar names from dif- ferent schemas. While namespaces solve the immediate prob- Figure 5b shows a second example of the use of STs. In this lem of collision, they do not address the underlying problem example, a data manager (Figure 5b, left) displays the - that ST names and their meanings have not been agreed on. Projects, Datasets, and Images belonging to one OME user. The disadvantage of using namespaces is they would not Right-clicking a Dataset opens a Dataset browser (Figure 5b, allow the information in these STs to be used interchangea- middle) and displays image thumbnails obtained from the bly, and it is this interoperability rather than mere coexist- OME database. The browser accesses data associated with ence that is the desired result. specific STs to define how an array of thumbnails should be presented to the user. In this case, the cell-cycle position of the cell in each image is used to define the layout (a more in- Discussion depth description of this tool is in preparation). Finally, a 5D- We have designed and built OME as a data storage, manage- image viewer (Figure 5b, right) allows viewing of the individ- ment and analysis system for biological microscopy. The data ual images, with display parameters based on data obtained model used by OME is represented in two distinct ways: a set from an OME database associated with appropriate STs (sig- of open-source software tools that use a relational database nal min, max, mean, and so on). for information storage, and an XML-based file format used for transmission of this information and storage outside of Data migration databases. The OME XML file format allows the exchange of Under most circumstances, the contents of a single OME highly structured information between independently devel- database will be available only to the local lab or facility. How- oped imaging systems, which we believe is a major hurdle in ever, data sharing and migration is often critical for collabo- microscopy today. The XML schema provides support for rations or when investigators move to a new site. In OME, image data, experimental and image metadata, and any gen- database export is achieved using the OME XML file. OME erated analytic results. The use of a self-describing XML Genome Biology 2005, 6:R47 R47.12 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. http://genomebiology.com/2005/6/5/R47 Usi Fign ug ST re 5s for visualization in OME Using STs for visualization in OME. Examples of the use of STs for visualization of data within an OME database are shown. These tools are Java applications that access OME via the OME remote framework [34]. All OME code is available [31]. (a) The Chain Builder, a tool that enables a user to build analysis chains by ensuring that the input requirements of a given module are satisfied by outputs from previous modules. This is achieved by accessing the STs for the inputs and outputs within an OME database. (b) The DataManager, DatasetBrowser and 5DViewer. The DataManager shows the relationships between Projects, Datasets and Images within an OME database. The DatasetBrowser modifies the display method for images within a given dataset depending on the values of data stored as STs within an OME database. The 5Dviewer allows visualization of individual images based on STs in an OME database. schema allows this format to satisfy local requirements and Our implementation of a relational database for digital micro- enables a strategy for updating schemas to satisfy new, scopy satisfies the absolute requirement for local extensibility incoming data types. This approach provides the infrastruc- of data models. We acknowledge the impossibility of defining ture to support systematic quantitative image analysis, and a single standard that encompasses all biological microscope satisfies an indispensable need as high-throughput imaging image data. However, using the self-describing OME XML gains wider acceptance as an assay system for functional file, we can mediate between different data models, and when genomic assays. necessary, update a local model so that it can send or receive data from a different model. In this way, OME considers data Genome Biology 2005, 6:R47 comment reviews reports deposited research refereed research interactions information http://genomebiology.com/2005/6/5/R47 Genome Biology 2005, Volume 6, Issue 5, Article R47 Goldberg et al. R47.13 dialects as a compromise between a universal data language mum information about a microarray experiment (MIAME)- toward standards for microarray data. Nat Genet 2001, and a universe of separate languages. In general, although the 29:365-371. current OME system is focused on biological microscopy, its 24. Open Microscopy Environment OME: XML Schema 1.0 [http://openmicroscopy.org/XMLschemas/OME/FC/ome.xsd] concepts, and much of its architecture, can be adapted to any 25. Schema Doc: ome.xsd [http://openmicroscopy.org/XMLschemas/ data-intensive activity. OME/FC/ome_xsd/index.html] 26. XML Schemata: OME XML Overview [http://openmicros copy.org.uk/api/xml/OME] 27. Extensible Markup Language (XML) [http://www.w3.org/XML] Acknowledgements 28. OWL Web Ontology Reference Language [http:// www.w3.org/TR/owl-ref] We gratefully acknowledge helpful discussions with our academic and com- mercial partners [32]. Research in the authors' laboratories is supported by 29. Hunter J, Drennan J, Little S: Realizing the hydrogen economy through semantic web technologies. IEEE Intell Syst 2004, grants from the Wellcome Trust (068046 to J.R.S.), the National Institutes of Health (I.G.G.), the Harvard Institute of Chemistry and Cell Biology 19:40-47. 30. GNU Lesser General Public License [http://www.gnu.org/copyl (P.K.S), and NIH grant GM068762 (P.K.S). J.R.S. is a Wellcome Trust Senior Research Fellow. eft/lesser.html] 31. Open Microscopy Environment: CVS (UK) [http://cvs.openmi croscopy.org.uk] 32. About OME - Commercial Partners [http://www.openmicros References copy.org/about/partners.html] 33. Andrews PD, Harper IS, Swedlow JR: To 5D and beyond: quanti- 1. Phair RD, Misteli T: Kinetic modelling approaches to in vivo tative fluorescence microscopy in the postgenomic era. Traf- imaging. Nat Rev Mol Cell Biol 2001, 2:898-907. fic 2002, 3:29-36. 2. Eils R, Athale C: Computational imaging in cell biology. J Cell Biol 34. Remote Framework - Introduction [http://openmicros 2003, 161:477-481. copy.org.uk/api/remote] 3. Lippincott-Schwartz J, Snapp E, Kenworthy A: Studying protein dynamics in living cells. Nat Rev Mol Cell Biol 2001, 2:444-456. 4. Wouters FS, Verveer PJ, Bastiaens PI: Imaging biochemistry inside cells. Trends Cell Biol 2001, 11:203-211. 5. Ponti A, Machacek M, Gupton SL, Waterman-Storer CM, Danuser G: Two distinct actin networks drive the protrusion of migrat- ing cells. Science 2004, 305:1782-1786. 6. Huang K, Murphy RF: Boosting accuracy of automated classifi- cation of fluorescence microscope images for location proteomics. BMC Bioinformatics 2004, 5:78. 7. Hu Y, Murphy RF: Automated interpretation of subcellular patterns from immunofluorescence microscopy. J Immunol Methods 2004, 290:93-105. 8. Yarrow JC, Feng Y, Perlman ZE, Kirchhausen T, Mitchison TJ: Phe- notypic screening of small molecule libraries by high throughput cell imaging. Comb Chem High Throughput Screen 2003, 6:279-286. 9. Simpson JC, Wellenreuther R, Poustka A, Pepperkok R, Wiemann S: Systematic subcellular localization of novel proteins identi- fied by large-scale cDNA sequencing. EMBO Rep 2000, 1:287-292. 10. Conrad C, Erfle H, Warnat P, Daigle N, Lorch T, Ellenberg J, Pep- perkok R, Eils R: Automatic identification of subcellular pheno- types on human cell arrays. Genome Res 2004, 14:1130-1136. 11. Swedlow JR, Goldberg I, Brauner E, Sorger PK: Informatics and quantitative analysis in biological imaging. Science 2003, 300:100-102. 12. Huang K, Lin J, Gajnak JA, Murphy RF: Image Content-based retrieval and automated interpretation of fluorescence microscope images via the Protein Subcellular Location Image Database. Proc IEEE Symp Biomed Imaging 2002:325-328. 13. Carazo JM, Stelzer EH, Engel A, Fita I, Henn C, Machtynger J, McNeil P, Shotton DM, Chagoyen M, de Alarcon PA, et al.: Organising multi-dimensional biological image information: the BioIm- age Database. Nucleic Acids Res 1999, 27:280-283. 14. Schuldt A: Images to reveal all? Nat Cell Biol 2004, 6:909. 15. Open Microscopy Environment [http://openmicroscopy.org] 16. MGED NETWORK: MGED Ontology [http://mged.source forge.net/ontologies/MGEDontology.php] 17. Gene Ontology [http://www.geneontology.org] 18. MGED NETWORK: MISFISHIE Standard Working Group [http://mged.sourceforge.net/misfishie] 19. OBO Main [http://obo.sourceforge.net] 20. EAMNET [http://www.embl-heidelberg.de/eamnet/html/down loads.html] 21. Murphy RF: Automated interpretation of protein subcellular location patterns: implications for early cancer detection and assessment. Ann NY Acad Sci 2004, 1020:124-131. 22. Sourceforge.net: Project Info - LSID [http://sourceforge.net/ projects/lsid] 23. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, et al.: Mini- Genome Biology 2005, 6:R47

Journal

Genome BiologySpringer Journals

Published: May 1, 2005

Keywords: Animal Genetics and Genomics; Human Genetics; Plant Genetics and Genomics; Microbial Genetics and Genomics; Bioinformatics; Evolutionary Biology

There are no references for this article.