Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

The Rényi divergence enables accurate and precise cluster analysis for localization microscopy

The Rényi divergence enables accurate and precise cluster analysis for localization microscopy Motivation: Clustering analysis is a key technique for quantitatively characterizing structures in lo- calization microscopy images. To build up accurate information about biological structures, it is critical that the quantification is both accurate (close to the ground truth) and precise (has small scatter and is reproducible). Results: Here, we describe how the Re´ nyi divergence can be used for cluster radius measurements in localization microscopy data. We demonstrate that the Re´nyi divergence can operate with high levels of background and provides results which are more accurate than Ripley’s functions, Voronoi tesselation or DBSCAN. Availability and implementation: The data supporting this research and the software described are accessible at the following site: https://dx.doi.org/10.18742/RDM01-316. Correspondence and requests for materials should be addressed to the corresponding author. Contact: adela.staszowska@gmail.com or susan.cox@kcl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. 1 Introduction Localization microscopy is a super-resolution imaging method based cells (Sengupta et al., 2011)] and their influence on the structural on detecting randomly activated, single molecules in a sequence of changes [sphingolipid ceramides in Jurkat, U2OS and HBMEC images. A super-resolution image is then reconstructed using this list (Burgert et al., 2017)]. However, analysing this type of data is often of single molecule localizations (Betzig et al., 2006; Rust et al., challenging, as the data often contains background localizations or 2006). A common application of localization microscopy is to multiple fluorophore re-appearances (Deschout et al., 2014), which study proteins clustered on the cell membrane. These clusters can make it difficult to identify clusters and/or lead to inaccurate [approximated as Gaussian (Sengupta et al., 2011; Shivanandan size measurements. et al., 2015) or Lorentzian (Pertsinidis et al., 2013)] can be analysed The methods used for cluster analysis of localization microscopy with quantitative methods to provide measurement of their proper- images can be roughly divided into spatial statistical methods [such ties or functions. For example, analysis of protein clusters present in as Ripley’s functions (Ripley, 1976)] and density based algorithms the cell plasma membrane revealed their role in cell signalling [LAT, [such as DBSCAN (Ester et al., 1996) or SR-Tesseler (Levet et al., LFA-1 and Src-family proteins clusters in T-cells (Owen et al., 2010; 2015)]. DBSCAN uses a distance measure to aggregate data points Shannon et al., 2015)], dynamics [GPI-anchored proteins in COS-7 into separate categories: clustered points and noise. SR-Tesseler V The Author(s) 2018. Published by Oxford University Press. 4102 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. The Re´nyi divergence enables accurate and precise cluster analysis 4103 " ! # a1 N a1 N method is based on creating the Voronoı ¨ diagrams to segment data X X 1 1 1 D / ln Id < r ; (2) a ij points into separate ‘cells’. Both of these methods use a global defin- a  1 N N i¼1 j¼1 ition of density to filter areas with higher point aggregation (Ester et al., 1996; Levet et al., 2015). However, it has been argued that where N is the number of points in the dataset, a is the scaling coef- obtaining the density parameter for localization microscopy data ficient, d is the distance between the ith central point and the jth ij may be difficult (Ankers et al., 1999; Steinbach et al., 2003). point, and Id < r is an indicator function, which has value 1 for ij Generally, the spatial statistical analysis methods are more universal d < r and 0 otherwise. The scaling coefficient a controls the ij and require less prior knowledge about the system than the density Re ´ nyi divergence function value, as it promotes areas with higher based algorithms (Deschout et al., 2014). density (clusters) over the lower intensity of background noise Ripley’s K function is a widely used method which provides a (Supplementary Notes S3 and S2). It should be also noted that in the measure of the difference between the data distribution and a uni- case of clustering analysis all points which belong to any cluster are form distribution. There are three versions of Ripley’s functions treated as signal. The not clustered points we have called noise, these called K, L and H (Supplementary Note S1). The K and L functions points originated from localizations of background fluorescence, are usually used to detect the presence of clustering in the data and free floating dye or non-specific staining. In this study we compare the H function is used to calculate cluster size using the maximum the performance of our method, based on the Re ´ nyi divergence, of H(r) (Deschout et al.,2014; Sengupta et al.,2011; Shannon with Ripley’s H function, using simulated and experimental data. et al.,2015). Recently, it has been noted that measuring the cluster For experimental data we have also compared the performance of radius by detecting the radius for which the Ripley’s H function is the Re ´ nyi divergence with SR-Tesseler and DBSCAN, which are the maximal can be biased (Kiskowski et al., 2009; Lagache et al., density based clustering methods. We have imaged three types of 2013) and that the origin of this bias has been reported to be structures to assess the precision and accuracy of the Re ´ nyi diver- related to density of the clusters in the dataset (Kiskowski et al., gence as a measure of cluster size: DNA-origami plates with known 2009). This can lead to different bias in the measured cluster radius size (Rothemund, 2006), clathrin-coated pits in HeLa cells, the size for different cluster densities. Approaches have been proposed to of which has been confirmed via electron microscopy (Saffarian remove the bias from the cluster radius measurements, for example et al., 2009; Sochacki et al., 2017), vaccinia virus particles in HeLa by measuring the cluster size as a half of the radius for which the cells, which were imaged with correlative localization and electron first derivative of the Ripley’s H function has the value of –1 microscopy (Brama et al., 2015; Peddie et al., 2017) and podosome (Kiskowski et al., 2009)orbyusingaconstantcorrectioncoeffi- cores in THP-1 cells which size varies across the cell and were previ- cient (Lagache et al., 2013; Supplementary Note S1 and ously imaged using electron microscopy (Tanaka et al., 2015) and Supplementary Fig. S1a and b). atomic force microscopy (Labernadie et al., 2010; Luxenburg et al., However, these methods do not remove the cause of the bias 2007). These biological structures were selected because they can be which is intrinsically related to the method itself, as this method observed in different cell types. Clathrin coated pits are responsible weights signal (clustered points) and noise (not clustered points) for active intra-cellular transport (endocytosis), the mechanism of equally (Supplementary Note S2). Additionally bias removal, as pro- which is similar across different cell types. The vaccinia virus is a posed by these approaches, can lead to new inconsistencies in the poxvirus previously used in smallpox vaccination. Currently, the cluster radius measurements. For example, the first derivative of vaccinia virus is used for study of gene expression and for creating Ripley’s H function is calculated numerically, thus the frequency new vaccines [using a vaccinia virus mutant (Jacobs et al., 2009)]. with which it is calculated is going to affect the first derivative value and detecting a specific value will be either impossible or provide worse results than a traditional method (Supplementary Fig. S1a). 2 Materials and methods The other approach using a constant coefficient to decrease the 2.1 Sample preparation measured cluster radius can influence the performance of Ripley’s H DNA origami is structures formed by artificial folding of DNA. function and its resistance to noise (Supplementary Fig. S1b). It Here, we used flat DNA origami plates with known size of has also been suggested that these methods for bias removal may be 60  90 nm (GATTAquant). The DNA plates were uniformly cov- better suited for analysis of simulated datasets than real data ered with Alexa Fluor 647 molecules (each plate with approximately (Kiskowski et al., 2009). 20 molecules). Localization microscopy samples were prepared Here, we propose use of the Re ´ nyi divergence (Re ´ nyi, 1965)asa according to the protocol supplied by GATTAquant. The DNA ori- measure for cluster analysis. The Re ´ nyi divergence quantifies the dif- gami was immobilized on BSA-biotin-neutravidin surface. The ference between two distributions: dishes (35 mm dishes with #1.5 glass coverslip bottom, Ibidi, "# a1 Germany) were washed with PBS (three times) and incubated with 1 pxðÞ DðÞ px ð ÞjjqxðÞ¼ ln pxðÞ dx ; (1) 200 ll of BSA-biotin solution (0.5 g/l in PBS) for 5 min, followed by a  1 qxðÞ biotin solution removal and further washes in PBS (three times). where p(x) is the data distribution and q(x) is a clustered reference Dishes were then incubated with neutravidin solution (0.5 g/l in distribution (if different than clustered reference distribution has PBS) for 5 min (neutravidin solution was removed and washed with been used it has been indexed, e.g. ‘U’ for the uniform distribution, PBS, three times). The diluted solution of DNA origami (around 100 Supplementary Note S3). We have used a clustered reference distri- times: 0.5 ll of DNA origami solution þ 50 ml of PBS) was placed in the dish and incubated for 5 min followed by a washing step (three bution as more closely related to the sample distribution and is therefore relevant for analysis clustered biological samples. Ripley’s times in PBS). The optimal dilution factor was selected using guide- H function can be derived from the Re ´ nyi divergence (for a ¼ 2), lines provided in the sample preparation protocol, leading to an U 1 2 and a uniform reference distribution q ðÞ x ¼ (Supplementary average density of DNA-origami plates 1 plate/lm . Note S4). For the purpose of clustering analysis, we used a sampling Samples with labelled clathrin coated pits were prepared using approximation of the Re ´ nyi divergence: HeLa cells (ATCC, CCL-2) seeded on 35 mm dishes with #1.5 glass 4104 A.D.Staszowska et al. coverslip bottom (Ibidi, Germany) 510 cells/dish. After leaving 2.3 Monte Carlo data generation the cells to adhere for 16 h, the cells were fixed for 20 min in 3.6% The simulated datasets for cluster analysis were created using Cþþ and Matlab (2016b). For each dataset the cluster positions formaldehyde, permeabilized for 5 min in 1% saponin and blocked were selected randomly. Clusters were simulated with desired in 3% BSA for 30 min. The anti-clathrin anti-body (BD Biosciences, 610499) was added diluted 1:200 or 1:500 in PBS with 3% BSA and parameters varying the number of clusters in the dataset, density of points in the cluster, cluster size, distribution of points in the cluster 0.5% saponin and incubated for 1 h. An Alexa Fluor 647 conjugated (uniform or Gaussian). Uniform background noise was added at the secondary anti-body (Invitrogen, A21235) was diluted to 1:500 in desired level. Both the noise and clustered points had the same inten- PBS with 3% BSA and incubated for 30 min. sity. Signal-to-noise ratio was calculated for each noise level as a Vaccinia virus samples were prepared according to the protocol ratio of number of points in the same area, in areas classed as signal published in (Peddie et al., 2017). and background. See Supplementary Figure S2 for more information Podosome samples were prepared using THP-1 cell line (ATCC, about signal-to-noise ratio and an example of the data. TIB-202). The cell culture, fixation and staining were performed according to the protocol presented in Staszowska et al. (2017); 2.4 Localization microscopy data simulation Vijayakumar et al. (2015). For the staining of actin present in the Thelocalizationmicroscopydataweresimulated usingaMatlab podosome core we have used Alexa Fluor 647 conjugated to phal- (2016b) software. The positions of fluorophores in the data were defined loidin (Invitrogen, A22287). To identify podosomes in fixed cells, using 10 datasets simulated for Monte Carlo testing (10 clusters with 80 adhesive rings were confirmed using vinculin conjugated Alexa and 160 nm radius). As the localization precision and quality of data Fluor 488. depends in a great deal on the density of data we have simulated single molecules with varying density (by setting a different number of frames 2.2 Imaging for which a fluorophore was in the dark state). For each density, 3000 The localization microscopy imaging was performed using the frames of single molecule data were simulated with pixel size 100 nm. STORM system at Nikon Imaging Center at King’s College We have also set the simulated fluorophores not to bleach permanently. London. This system is equipped with an Eclipse Ti-E Inverted Then the single molecule images were analysed with ThunderSTORM Nikon Microscope, Andor Ixon EMCCD back-illuminated camera (similarly to experimental data) and clustering analysis. (DU-897U-CSO-#BU), laser and LED light sources (laser wave- lengths and powers are: 405 nm, 30 mW; 488 nm, 90 mW; 2.5 Clustering analysis software 514 nm, 50 mW; 561 nm, 90 mW and 647 nm, 170 mW) and is The Re ´ nyi divergence analysis of clustering datasets was performed operated with NIS Elements software with N-STORM module. using Equation (14) in Supplementary Note S3. The Re ´ nyi diver- The imaging was performed with TIRF, using a 100x, N.A. 1.49 gence was calculated for a range of a values (usually 10–120 with objective. increments of 10). Additionally, the same software also computed The 647 nm laser power was adjusted during the acquisition the Ripley’s H function. The first and second derivatives of the to acquire similar number of counts in every frame (as far as Re ´ nyi divergence and Ripley’s H function were calculated using the possible). To improve the signal-to-noise (S/N) of the images fxðÞ þhfxðÞ h fðÞ xþhfxðÞ h 0 00 numerical derivative: f ðÞ x ¼ and f ðÞ x ¼ 2h 2h the excitation light angle was adjusted to include only the for second derivative calculation. The step size h was set to be equal fluorophores at a given optical plane and exclude background to the sampling for the Re ´ nyi divergence and Ripley’s function cal- fluorescence. The imaging angle was adjusted for every imaged culation. It should be noted that the derivative calculated for the section and selected so that the image has the highest possible Re ´ nyi divergence clustering analysis was only used to detect gradient contrast. changes in the Re ´ nyi divergence and its first derivative curves. Prior to imaging, the samples (DNA origami, clathrin coated pits Prior to the Re ´ nyi divergence and Ripley’s H function calculation and podosomes) were immersed in the imaging buffer. The base buf- for the clathrin coated pit and vaccinia virus particle datasets, we fer was made with b-Mercaptoethylamine (MEA, Sigma Aldrich, have selected regions of interest for the localization microscopy 30070-50G) according to the recipe presented in (Nikon, 2015). For images. This was performed to include only areas where the ana- better stability of the Alexa Fluor 647, Cyclooctatetraene (COT, lysed structures were present in the images, and exclude areas where 98%, Sigma Aldrich, 138924-1G) dissolved in DMSO (Sigma clusters were not present or false clustering occurred. For clathrin Aldrich, 472301-1L-D) was added to the base buffer to a final con- coated pits we have created mask images to cover the area of the centration of 2 mM (Dave et al., 2009). HeLa cell as the clathrin pits were present throughout the cell. In each imaging series about 6000 frames for DNA origami sam- Region of interest was selected similarly also for podosomes, to in- ples and 10 000 for clathrin-coated pits and podosomes were clude only the areas where the podosomes were present in the acquired at a rate of 30–50 frames per second. images. For vaccinia virus particles we have selected rectangular Imaging of vaccinia virus samples, using scanning electron mi- regions of interest containing a high density of vaccinia virus par- croscopy and localization microscopy was performed as discussed in ticles. The masks and notes on the size of each region of interest can (Brama et al., 2015; Peddie et al., 2014). be found in the data bundle. It should be noted that no region of Super-resolution images were reconstructed with ThunderSTORM interest was selected for analysis of DNA-origami samples as the (Ovesn y et al., 2014) using single-emitter and multi-emitter settings. DNA-plates were randomly distributed across the whole image. Molecule localizations were filtered to remove false localizations The density based clustering was performed using the SR- with FWHM sizes much smaller than the predicted PSF (110 nm for Tesseler software (Levet et al., 2015), which also performed the DNA-origami, clathrin coated pit and podosome samples and 90 nm DBSCAN analysis. The cluster radius was also calculated using SR- for vaccinia virus particles). Additionally, drift was removed from the Tesseler software using the Voronoı ¨ diagrams, which segmented the reconstructed images using the cross correlation option with binning 5 data into single point domains. Then the segmented image was and magnification 5.0. thresholded to exclude regions with density which was two times The Re´nyi divergence enables accurate and precise cluster analysis 4105 smaller than the average density [as suggested in (Levet et al., 2015), investigate size of DNA origami in reconstructed localization mi- we have also tested different values of the density threshold param- croscopy images. eter, Supplementary Fig. S3]. The last step was to spatially filter the diagram image to exclude structures smaller or bigger than the size 2.8 Detection and characterization of vaccinia virus of the analysed structure. Both the SR-Tesseler and DBSCAN meas- particles in electron microscopy images ured the cluster radius using an ellipse fitted to a segmented cluster HeLa cells infected with vaccinia virus were prepared using an in- patch. The cluster radius is the calculated as a half of an average of resin fluorescence protocol which introduces electron contrast into the longest and shortest axis of the fitted ellipse. It should be also the sample whilst preserving fluorescence (Peddie et al., 2014). noted that the fitted ellipses were smaller than the segmented clus- Thus, the virus particles appear in the electron microscopy images as ters, leading to underestimation of cluster size. For DNA origami we dark circular features, often surrounded by a bright ring. In order to set the desired cluster radius to be between 15 nm and 100 nm, for match the area where the fluorescent proteins are present in the sam- clathrin coated pits––50 nm and 170 nm and Vaccinia virus particles ple, we applied a thresholding method to include only electron dense 60 nm and 170 nm. For the DBSCAN analysis we have adjusted the core of each particle. The radius of each particle was measured as an distance parameter, which is used by the software to estimate cluster average of its shortest and longest dimension. It should be also noted radius: 54 nm for DNA origami (Supplementary Note S5), 100 nm that in this method each particle radius is measured separately, for clathrin coated pits [estimated based on the electron microscopy while the Re ´ nyi divergence and Ripley’s H function provide an aver- images presented in (Sochacki et al., 2017)] and 108 nm for vaccinia age measurement for a dataset. virus particles (measured for scanning electron microscopy images). The analysis of vaccinia virus particles was performed using Matlab (2016b). Each electron microscopy image was thresholded 2.6 Cluster radius calculation using a range of thresholds to account for the variation of brightness The cluster radius was calculated using the Re ´ nyi divergence and on the image (this was usually from 75% to 100% of the mean in- Ripley’s H function. The Re ´ nyi divergence for a specific a was calcu- tensity value). Thresholded images were binarized using im2bn() lated as a function of radius. The function counts the number of function and inverted using function imcomplement() to ensure that points inside a given radius. Because of the weighting effect of a, the the white areas in the binary images correspond to the areas with main component of the function value comes from high density low intensity on the original image. Some particles had holes and areas (clusters). Thus, when the radius is equal to the actual cluster empty edges due to variation in brightness in the original image, size the function value does not change or changes by a very small these were filled using imfill() and bwmorph(). Then, particles were amount for several radius values. This means that a plateau can be filtered using a spatial filter (excluding particles smaller or bigger detected on the function and the cluster radius can be found by look- than a set range of area values, here it was 40 and 600 pixels, with ing for points with gradient equal to 0 (first and second derivatives pixel size 16.5 nm), eccentricity and solidity filter, which used a stat- of function need to be equal to 0). In practice, we did not detect the istic provided by the regionprops() function. The filtering to remove 0 gradient but set a tolerance values for the first and second deriva- elliptical and empty (non-solid) particles was performed using ex- tives. We have used a tolerance of 1% change for the first and 0.5% perimentally set ranges, 0–0.8 for elliptical filtering and 0.2–1 for for the second derivatives. These values were set experimentally, solidity filtering. The particles identified by the software were con- using simulated data with 10 clusters, 80 nm radius with and with- firmed by the user, by selecting the correct particle identifications. out any noise points. We have used the same tolerance values for all Lastly, the software saved an image with identified particles and a analysed data. In practice, we have used a set of tolerance values for file containing information about the radius of each virus particle the plateau detection. Our algorithm searched for the first point in a (see Supplementary Fig. S4 for a region of a particle identification plateau as an estimate for the radius value (at least three points image). needed to be equal to 0 within a set tolerance). The radius measurements using Ripley’s H function were calcu- 2.9 Statistical analysis lated by detecting the maximum of the Ripley’s H function. We used Monte Carlo simulations to establish the stability of our analysis method compared with Ripley’s H function. The Monte 2.7 Estimation of size of clathrin coated pits Carlo testing was performed for simulated data similarly to Clathrin coated pits have been extensively studied using electron mi- (Kiskowski et al., 2009). Random datasets with the same character- croscopy (Saffarian et al., 2009; Sochacki et al., 2017), and their istics (as an original dataset––number and size of clusters, noise size in HeLa cells depends on the size of their cargo (Saffarian et al., level) were generated and for each new dataset cluster radius meas- 2009). A visual inspection of electron microscopy images suggests a urements with the Re ´ nyi divergence and Ripley’s H function were size of around 100 nm (Saffarian et al., 2009; Sochacki et al., 2017). performed. The number of the Monte Carlo repetitions was selected Additionally, we also performed a visual check of the cluster radius by testing the changes in the value of measured mean and SD. The measurement for the clathrin coated pits. We have used a local- simulations stopped when the variation in both of these values was ization microscopy image of the clathrin coated pits (prepared with around 0.5% (Supplementary Fig. S5). 1:200 anti-clathrin anti-body and pre-processed with single-emitter The histograms of the results of cluster radius measurements for fitting) to draw circles which enclosed the clustered points. We have simulated data suggested that the results do not belong to the same selected this dataset at random (it was dataset six from the first sam- distributions (Fig. 1c). Thus to check if these results could originate ple). Each circle drawn was numbered (Supplementary Fig. S4c) and from the same distribution we have performed the two-sample its radius measured (by saving the size of the drawn circle). We have Kolmogorov–Smirnov test. Our null hypothesis was that they do be- collected results from 203 clusters and these are presented in long to the same distribution. The test suggested that it is extremely Supplementary Figure S4d. The mean radius measured was unlikely that the results from the Re ´ nyi divergence and Ripley’s H 746 13 nm, close to the expected value and radius measured with function originate from the same distribution. The P-values for 5% the Re ´ nyi divergence. We have also used the same methodology to confidence level are shown in Supplementary Table S1. 4106 A.D.Staszowska et al. Fig. 1. Variation of measured cluster radius with a and noise level for 100 000 simulated datasets. Clusters of radius 80 nm were simulated with a uniform distribu- tion of fluorophores across the cluster. (a) Results of cluster radius measurements with different a values for datasets with increasing levels of noise (measured as mean). (b) Comparison of cluster radius values measured with the Re´ nyi divergence (marked with black) and Ripley’s H function (red) for datasets with increas- ing noise levels. The Ripley’s H function radius measurements were only possible for datasets with S/N higher than 16 [for lower S/N H(r) function does not have a maximum]. The Re´ nyi divergence provided stable radius measurements S/N as low as 6. (c) The histogram of results of cluster radius measurements for simu- lated clusters with S/N 29. The mean and median are shown in dashed lines for each method (in green and blue, respectively). (d) Visualization of the radius measured with the Re´nyi divergence (red circles) and Ripley’s H function (cyan circles) for simulated clusters with radius 80 nm and S/N 29 3 Results same size cluster (Supplementary Fig. S1), different number of clusters in a dataset (Supplementary Fig. S7). These datasets were simulated 3.1 Selection of the scaling coefficient a with uniform noise. We have also simulated single molecule data with The scaling coefficient a tunes the Re ´ nyi divergence response to dif- different densities to test an impact of artefacts and localization preci- ferent properties of data such as the cluster size/shape or the back- sion on fidelity of cluster radius measurement (Fox-Roberts et al., ground noise level. The Re ´ nyi divergence calculation is performed 2017)(Supplementary Figs S8 and S9). The results for different by raising the number of points surrounding a central point (in a cluster types were similar so here we discuss one example, the uniform given radius) to a powerðÞ a  1 . Thus, the Re ´ nyi divergence value clusters with 80 nm radius (for results for other cluster types see from the regions with higher density (clusters) is greater than from Supplementary Figs S1 and S6–S9). The results for simulated data sug- those with smaller density of points (noise) for a > 2 gest that the Re ´ nyi divergence provides a more accurate and reliable (Supplementary Note S3). This also means that the signal is pro- cluster radius measurements than Ripley’s H function. The cluster ra- moted over the noise. The optimal value of a should be selected dius measurements for data with different noise levels suggest that the according to the data properties and usually higher noise levels re- Re ´ nyi divergence provides similar values of radius for different levels quire a higher value of a. In our investigations, we selected a by of noise caused by background localizations. For the same data looking for no or small change in the measured cluster radius value Ripley’s H function can struggle to provide similar measurements or (less than a 2% change as a is increased further, Fig. 1a). In this be unable to measure the cluster radius for data with high noise or work, we used a single value of a ¼ 70, as it provided stable radius small number of clusters (because the function does not have a max- measurements for all analysed datasets (for example, Fig. 1a). imum, required for the cluster measurement, Fig. 1b). The comparison Additionally, we investigated the noise resistance of the Re ´ nyi diver- of the SDs suggest that while the Re ´nyi divergence scatter is higher gence and Ripley’s H function on simulated datasets with different than for Ripley’s H function, it remains similar only slightly increasing noise levels (Fig. 1b and Supplementary Fig. S1c–e). with noise (15 nm, Fig. 1b and c and Supplementary Fig. S1c–e). The results for simulated data suggest that the Re ´ nyi divergence It should be also noted that the cluster measured with the Re ´ nyi provides a more accurate and reliable cluster radius measurements divergence has a small bias up to around 15%. We think that this is than Ripley’s H function. We have analysed data with different size connected to the method used to measure clustering by detecting the of clusters (80 nm and 160 nm radius, Fig. 1 and Supplementary Fig. gradient changes in the Re ´ nyi divergence curve. The formula used S1c), different point distribution across the clusters (uniform and for the numerical derivative calculation increases the measured ra- Gaussian, Supplementary Fig. S1), different number of points in the dius (offset by a sampling step value). Other cause of the biased The Re´nyi divergence enables accurate and precise cluster analysis 4107 Fig. 2. Measurement of the average size of the DNA-origami plates in localization microscopy data. (a) Localization microscopy image of DNA-origami plates, ana- lysis with single-emitter fitting. Scale bar 1 lm. (b) Magnified images of single DNA-origami plates. Scale bar 500 nm. (c) The average expected radius based on the simulations (50 000 repetitions) and the radius measured in the experimental data with the Re´nyi divergence (526 8 nm for simulated and 526 16 nm for ex- perimental data), Ripley’s H function (696 5 nm and 1516 30 nm), SR-Tesseler (226 8 nm) and DBSCAN (376 11 nm). The error bars are the SD cluster radius measurement is reporting of the mean value of the ra- cluster radius measured with the Ripley’s H function was almost dius as it is more widely used statistical measure. As the Re ´ nyi diver- three times bigger (1516 32 nm). Moreover, the comparison of scat- gence results display a sharp peak and a long tail the median ter for the Re ´ nyi divergence (SD 16 nm) and Ripley’s H function (SD indicates much better the most measure value (Fig. 1). For Ripley’s 32 nm) suggests that the Re ´ nyi divergence is more precise. The ex- H function the SD was higher for datasets with no or low noise perimental DNA origami data were also analysed with density based (5 nm for uniform and 30 nm for Gaussian clusters) and lower clustering methods: SR-Tesseler and DBSCAN. DBSCAN measured for higher noise levels (3–5 nm). This suggests that the accuracy of cluster radius which was exactly equal to an average of the plate the measurement is less influenced by noise for the Re ´ nyi divergence size, when SR-Tesseler measured smaller cluster radius (226 8nm than for Ripley’s H function. for SR-Tesseler and 376 11 nm for DBSCAN, Supplementary Note S5). We have also measured sizes of the clathrin coated pits present 3.2 Cluster size measurement for localization in HeLa cells. Clathrin-coated pits have been extensively studied microscopy images using electron microscopy (Saffarian et al., 2009; Sochacki et al., In addition to investigating performance of the Re ´ nyi divergence 2017), and their size in HeLa cells depends on the size of their cargo and Ripley’s H function with simulated data, we also used local- (Saffarian et al., 2009). A visual inspection of electron microscopy ization microscopy data of DNA origami, clathrin coated pits, vac- images suggests a size of around 100 nm (Saffarian et al., 2009; cinia virus particles and podosomes. These structures were selected Sochacki et al., 2017). We have imaged samples with two concentra- because we could confirm their size through techniques other than tions of anti-clathrin anti-body: 1:200 and 1:500 (three samples for localization microscopy. For each structure, we have collected a ser- each primary anti-body concentration, see Methods Section and ies of images and localized single molecules using ThunderSTORM Supplementary Fig. S10b and c), to establish if different amount of (Ovesny et al., 2014) with single- and multi-emitter fitting (for signal from the investigated structure affects the measured radius. multi-emitter data results see Supplementary Fig. S10). For each labelling density, we have imaged 3 samples and selected at The DNA origami constructs were designed to have a specific least 10 regions for imaging. shape and size (Rothemund, 2006), here 60  90 nm rectangular Thus, the accuracy of the radius measurements could be com- plates (with around 20 Alexa Fluor 647 molecules per plate). As the pared both with regard to how close the measured value is to the Re ´ nyi divergence and Ripley’s H function are designed to detect predicted value and how large the scatter in the measurements is, for circular clusters, the measured cluster size is not directly related to the two labelling densities and different fluorophore localization the plate size. We estimated that the measured cluster radius should methods. Our results suggest that the Re ´ nyi divergence provided a be smaller than the radius of a circle passing through all corners (ra- measurement which was both more closely related to the expected dius 54 nm) of the DNA-origami plate and bigger than a circle radius of the clathrin pits and more stable. The average radius meas- enclosed on the plate (radius 30 nm, Supplementary Fig. S4a and ured with the Ripley’s H function, 1566 23 nm, was almost two Supplementary Note S5). To validate this, we simulated rectangular times bigger than the radius measured with the Re ´ nyi divergence, plates with size and shape of the actual DNA plates. We have 866 19 nm. SR-Tesseler and DBSCAN measured cluster radii which observed that the mean radius measured for the simulated rectangu- were smaller than the expected cluster radius, 576 22 nm and lar plates was 526 8 nm (around 4% smaller than the maximum 656 20 nm, respectively. We have also observed some variation in expected radius) for the Re ´ nyi divergence measurement, and the values measured with regard to the labelling density for Ripley’s 696 5 nm (22% bigger than the maximum expected radius) for Ripley’s H function. We expected to see this behaviour preserved in H function (see Results Section in Fig. 3b and Supplementary Fig. S10d–f). The Re ´ nyi divergence, SR-Tesseler, and DBSCAN meas- the experimental results. ured radius values were not affected by the different labelling den- We analysed localization microscopy datasets collected from five sities. Additionally, to confirm our cluster radius measurements for DNA origami samples analysed using single-emitter fitting (for multi-emitter fitting results see Supplementary Fig. S10a). The the clathrin coated pits we have performed a visual estimation of the results suggest that the Re ´ nyi divergence provides cluster radius pit size using a localization microscopy image (Methods Section and measurements which are more accurate than Ripley’s H function Supplementary Fig. S4c). The mean of the estimated radius was (Fig. 2c). While the average cluster radius measured with the Re ´ nyi 746 13 nm (Supplementary Fig. S4d), close to the expected value divergence was equal to the expected radius (526 16 nm), the and radius measured with the Re ´ nyi divergence. 4108 A.D.Staszowska et al. Fig. 3. Measurement of the average size of the clathrin coated pits in localization microscopy data. (a) Localization microscopy image of clathrin coated pits in HeLa cells (labelling 1:200). (b) Magnified images of clathrin coated pits. Scale bar 500 nm. (c) The average expected (746 13 nm) and measured radius, 866 19 nm for the Re´ nyi divergence and 1566 23 nm for Ripley’s H function. SR-Tesseler and DBSCAN measured cluster radius to be smaller than the expected value, 576 22 nm and 656 20 nm, respectively. Localization position fitting was performed with single-emitter fitting. The error bars are the SD Fig. 4. Vaccinia virus particles in HeLa cells imaging and radius measurements. (a) Localization microscopy image, and (b) electron microscopy image. (c) Overlay with electron microscopy image presented in grey-scale, and localization microscopy image in green. Scale bars 1 lm. (d) The average particle radius measured for electron microscopy (EM) images, 1086 29 nm and localization microscopy with the Re´ nyi divergence (1176 48 nm) Ripley’s H function (1906 81 nm), SR-Tesseler (566 15 nm) and DBSCAN (746 57 nm). The error bars are the SD (Color version of this figure is available at Bioinformatics online.) However, due to the limitations of any one imaging technique it localization microscopy data by imaging at 200 Pa (Peddie et al., is highly desirable to cross-validate results against a second imaging 2017). Images of the matching areas in the sample were collected with technique. Recent advances have made possible imaging of the same localization and electron microscopy (Fig. 4a and b). Thus, the sizes sample with simultaneous correlative light and electron microscopy measured for both of these imaging techniques could be directly com- (CLEM; Liv et al., 2013; Zonnevylle et al., 2013). Thus we could pared for exactly the same structures. To estimate the true size of the use the electron microscopy image as an estimation of ground truth vaccinia virus particles we analysed the electron microscopy images and compare it to the quantitative results obtained from localization using an algorithm, which identified and characterized individual virus microscopy data. particles (Methods Section and Supplementary Fig. S4e–g). The mean Vaccinia virus particles in HeLa cells were imaged using correlative radius of the virus particles in the electron microscopy images was localization and scanning electron microscopy. The samples were 1086 29 nm. We compared this value with the average radius meas- transfected with YFP-A3, which localizes to the core of viral particles. ured for localization microscopy data pre-analysed with single-emitter The fixed and frozen samples were then embedded in resin and cut (Fig. 4d) and multi-emitter fitting (Supplementary Fig. S10g). into 200 nm thick sections (Brama et al.,2015; Peddie et al., 2017). Analysis of vaccinia virus particle localization microscopy sug- YFP blinks in a partially hydrated state and so it was possible to obtain gests that the Re ´ nyi divergence provides more accurate measurement The Re´nyi divergence enables accurate and precise cluster analysis 4109 Fig. 5. Podosome cores in THP-1 cells and their size measurement. (a) Localization microscopy image reconstructed with single emitter-fitting. F-actin podosome cores were stained with Alexa Fluor 647 conjugated to phalloidin (reconstructed with single emitter-fitting). Scale bar 1 lm. (b) Magnified single podosome cores with varying sizes. Scale bar 250 nm. (c) Line profiles of four podosome cores from (b). The podosome core has diameter of 500 nm at the base (indicated with black arrow) and between 200 to 300 nm in the most dense region of the core (indicated with two black arrows). (d) Cluster radius measurement of the podosome data. The Re´nyi divergence measured cluster radius of 1326 19 nm, Ripley’s H function 2316 82 nm, SR-Tesseler 1256 46 nm and DBSCAN 2056 85 nm. The error bars are the SD than Ripley’s H function, SR-Tesseler or DBSCAN. For both single independent imaging techniques in (Labernadie et al., 2010; Rafiq and multi-emitter fitting, the Re ´ nyi divergence provided cluster ra- et al., 2016). Cluster radius measurements for podosome cores are less conclu- dius measurements, 1176 48 nm and 1376 62 nm respectively, sive than cluster radius measurements performed for structures pre- which were very close to the particle radius measured in the electron viously discussed in this paper. The podosome core size had been microscopy images. However, the radius measured by Ripley’s H reported to differ across the cell and to be around 500 nm in diam- function, 1906 81 nm for single- and 2216 135 nm for multi- eter at the widest point (Labernadie et al., 2010; Rafiq et al., 2016). emitter fitting, was much bigger than the expected radius of virus The podosome core has a cone shape, meaning that the density of particle. Values smaller than expected radius were measured for localizations changes across the core with higher density in the SR-Tesseler 566 15 nm for single- and 586 17 nm for multi-emitter centre and smaller density on the edge (Fig. 4c). The Re ´ nyi diver- fitting and for DBSCAN657 nm for single- and661 nm for multi- gence and SR-Tesseler detected the radius the area of the core emitter fitting. It should also be noted that the SD values (error bars which has the highest density of localizations, 1326 19 nm and in Fig. 4 and Supplementary Fig. S10) for the Re ´ nyi divergence were 1256 46 nm respectively. Both Ripley’s H function and DBSCAN smaller (48 nm for single-emitter and 62 nm for multi-emitter fitting) reported a larger cluster radius, 2316 82 nm and 2056 85 nm. than for the Ripley’s H function (81 nm for single- and 135 nm for However, it should be noted that Ripley’s H function measured clus- multi-emitter fitting) which suggests that our method provided a ter radius only for 17 datasets out of 30, and for the rest it did not more consistent cluster radius measurement. DBSCAN provided detect any clustering. The remaining methods did not have any similar level of precision to the Re ´ nyi divergence (57 nm for single- problem with detecting clustering. The Re ´ nyi divergence provided a and 61 nm for multi-emitter fitting), where SR-Tesseler had the more stable cluster radius measurement with the smallest SD, than smallest SD for these datasets (15 nm for single- and 17 nm for other methods. It also had similar SD for the measurement for data multi-emitter fitting), however it also underestimated the size of the with clusters with the same size. While the SD for the remaining clusters by 42%. methods, Ripley’s H function, SR-Tesseler and DBSCAN, was Biological structures analysed above have very similar sizes higher than for the datasets with the same cluster size. Here, we dis- through the cell, but this is not always the case in many biological cussed results for single-fitting, the results for multi-emitter fitting structures. An example of such size variability is observed for podo- are presented in Supplementary Table S2. The results for multi- somes. Podosomes are adhesive, actin rich structures formed by the emitter fitting are similar to single-emitter results presented here, see cells to facilitate cell adhesion and migration (Schachtner et al., Supplementary Table S2. We think that the results for podosome 2013). They consist of two major components, an f-actin rich core cores clearly highlight the property of the Re ´ nyi divergence to reli- and the ring complex (build with integrins and integrin associated ably detect areas with high point density. proteins; Foxall et al., 2016; Labernadie et al., 2010; Tanaka et al., The Re ´ nyi divergence has an additional advantage of requiring 2015). The ring has diameter between 0.5 lmto2 lm in diameter no user input. In particular, the values obtained with Ripley’s H and 0.6 lmto1 lm in depth (Meddens et al., 2014; Rafiq et al., function can depend strongly on the region selected for analysis (be- 2016; Walde et al., 2014). The core is smaller, usually around cause of its equal weighting of signal and noise), whereas the Re ´ nyi 0.5 lm in diameter (at the widest point) and 0.5 lm in height divergence remains stable. This is important because it is common (Meddens et al., 2014; Rafiq et al., 2016). The size of podosomes to restrict analysis to structures of interest to decrease the data vol- varies across the cell, depending on their function or lifetime [for ex- ume. For example, for analysis of clathrin coated pits we have ample, in migrating cells, the newly formed podosomes are larger selected regions of interest to include the whole cell and exclude the than older ones (Meddens et al., 2014)]. In reconstructed local- almost empty area around it. We calculated the mean cluster radius ization microscopy images (Fig. 5a and b), we have indeed seen a for clathrin coated pits data with a selected region of interest variation in the size of the podosome cores. It should be also noted (termed ROI) and for the whole field of view (termed Image). that the podosome core has a shape of a cone, meaning that the The mean radius measured for the data changes visibly for the localization density will be vary across the structure (the highest in Ripley’s H function (Supplementary Fig. S11). For the Re ´ nyi diver- the centre and lower on the edges). This has been confirmed using gence it either changes by a small amount (Supplementary Fig. S11c) 4110 A.D.Staszowska et al. or remains the same (Supplementary Fig. S11a, b and d). This suggests References that the Re ´ nyi divergence results are more universal and can be com- Ankers,M. et al. (1999) OPTICS: ordering points to identify the clustering pared between different regions of interest (Supplementary Table S2). structure. Proc. ACM SIGMOD’99 Int. Conf. on Management of Data, 28, The major drawback of the current form of the Re ´ nyi divergence 49–60. is that it is set up to mainly analyse circular clusters. It is however Betzig,E. et al. (2006) Imaging intracellular fluorescent proteins at nanometer possible to change the cluster shape detected by this method using resolution. Science, 313, 1642–1645. similar methodology as presented for Ripley’s functions (Peters Brama,E. et al. (2015) Standard fluorescent proteins as dual-modality probes et al., 2017). However, it should be also noted that the cluster size for correlative experiments in an integrated light and electron microscope. J. Chem. Biol., 8, 179–188. measurement can be still performed for small clusters with non- Burgert,A. et al. (2017) Characterization of plasma membrane ceramides by circular shape, as for example rectangular DNA origami plates. As super-resolution microscopy. Angew. Chem., 129, 6227–6231. the DNA origami plates used in this work were seen in reconstructed Dave,R. et al. (2009) Mitigating unwanted photophysical processes localization microscopy images as circular clusters due to limited for improved single-molecule fluorescence imaging. Biophys. J., 96, localization precision. 2371–2381. Deschout,H. et al. (2014) Progress in quantitative single-molecule localization microscopy. Histochem. Cell Biol., 142, 5–17. 4 Discussion Ester,M. et al. (1996) A density-based algorithm for discovering clusters in Localization microscopy data are more and more commonly used for large spatial databases with noise. In: Simoudis,E. et al. (eds) Proceedings of quantitative analysis. However, measuring the exact size of the struc- the Second International Conference on Knowledge Discovery and Data tures in the images/data can be problematic using the popular Ripley’s Mining (KDD-96), AAAI press, Paolo Alto, CA, pp. 226–231. Fox-Roberts,P. et al. (2017) Local dimensionality determines imaging speed in H function. Here, we have presented a method based on the Re ´ nyi di- localization microscopy. Nat. Commun., 8, 13558. vergence which can be tuned in response to noise and/or data proper- Foxall,E. et al. (2016) Significance of kinase activity in the dynamic invado- ties. Our method provides more accurate cluster radius measurements some. Eur. J. Cell Biol., 95, 483–492. than widely used Ripley’s H function. It also helps to overcome the Jacobs,B.L. et al. (2009) Vaccinia virus vaccines: past, present and future. bias present in the radius measurements provided by the Ripley’s H Antiviral Res., 84, 1–13. function. Similarly, when compared with the density based clustering Kiskowski,M.A. et al. (2009) On the use of Ripley’s K-function and its deriva- methods, the Re ´nyi divergence provides cluster radius measurement tives to analyze domain size. Biophys. J., 97, 1095–1103. which is closer to the expected cluster radius. We think that the radius Labernadie,A. et al. (2010) Dynamics of podosome stiffness revealed by atom- measurement was lowered for the density based methods, as they per- ic force microscopy. Proc. Natl. Acad. Sci. USA, 107, 21016–21021. form cluster radius measurement based on an ellipse fitted to a seg- Lagache,T. et al. (2013) Analysis of the spatial organization of molecules with mented cluster, which is smaller than the segmented cluster. robust statistics. PLoS One, 8, e80914–e80917. Levet,F. et al. (2015) SR-Tesseler: a method to segment and quantify The Re ´ nyi divergence provides an accurate and precise method localization-based super-resolution microscopy data. Nat. Methods, 12, of cluster radius measurement. We have shown that the commonly 1065–1071. used Ripley’s H function provides cluster radius measurements big- Liv,N. et al. (2013) Simultaneous correlative scanning electron and high-NA ger (by around 60%) than the actual cluster radius, similarly to the fluorescence microscopy. PLoS One, 8, e55707. results presented in (Kiskowski et al., 2009; Lagache et al., 2013). Luxenburg,C. et al. (2007) The architecture of the adhesive apparatus of cul- The Re ´ nyi divergence does not require user input or selection of tured osteoclasts: from podosome formation to sealing zone assembly. PLoS region of interest (contrary to Ripley’s H function, Supplementary One, 2, e179. Fig. S11, SR-Tesseler or DBSCAN) and the selection of a can be Meddens,M.B.M. et al. (2014) Podosomes revealed by advanced bioimaging: done after the analysis was performed. Apart from these benefits the what did we learn? Eur. J. Cell Biol., 93, 380–387. Re ´ nyi divergence calculation can be performed very fast, taking Nikon (2015) Super Resolution Microscope N-STORM. STORM around 20 s to analyse a dataset containing 30 000 localized points. Protocol-Sample Preparation. (10 November 2015, date last accessed). Ovesn y,M. et al. (2014) ThunderSTORM: a comprehensive ImageJ plug-in for PALM and STORM data analysis and super-resolution imaging. Acknowledgements Bioinformatics, 30, 2389. Owen,D.M. et al. (2010) PALM imaging and cluster analysis of protein het- The authors thank R. Marsh, A. Radenovic, R. Szepietowski and E. Rosten erogeneity at the cell surface. J. Biophotonics, 3, 446–454. for early discussions. They also acknowledge help and suggestions from P. Peddie,C.J. et al. (2014) Correlative and integrated light and electron micros- Annibale, H. Deschout and A. Radenovic. They also thank M. Way for the copy of in-resin GFP fluorescence, used to localise diacylglycerol in mamma- vaccinia virus samples. lian cells. Ultramicroscopy, 143, 3–14. Peddie,C.J. et al. (2017) Correlative super-resolution fluorescence and electron microscopy using conventional fluorescent proteins in vacuo. J. Struct. Funding Biol., 199, 120–131. This work was supported by the Medical Research Council (Next Generation Pertsinidis,A. et al. (2013) Ultrahigh-resolution imaging reveals formation of Optical Microscopy Initiative Program grant MR/K015664/1), the neuronal SNARE/Munc18 complexes in situ. Proc. Natl. Acad. Sci. USA, Biotechnology and Biological Sciences Research Council (grants BB/K01563X/1 110, 2. E2812–E2820. and BB/N022696/1) and the Francis Crick Institute which receives its core fund- Peters,R. et al. (2017) Quantification of fibrous spatial point patterns from ing from Cancer Research UK (FC001999), the UK Medical Research Council single-molecule localization microscopy (SMLM) data. Bioinformatics, 33, (FC001999), the Human Frontier Science Program (RGP0035/2016), and the 1703–1711. Wellcome Trust (FC001999). S.C. was supported by a Royal Society University Rafiq,N.B.M. et al. (2016) Podosome assembly is controlled by the GTPase Research Fellowship. A.S. was funded by an EPSRC studentship. The authors ARF1 and its nucleotide exchange factor ARNO. J. Cell Biol., 216, thank the Nikon Imaging Centre at King’s College London, where localization 181–197. microscopy data were acquired. Re ´ nyi,A. (1965) On the foundations of information theory. Revue de l’institut Conflict of Interest: none declared. International de Statistique/Rev. Int. Stat. Inst., 33, 1–14. The Re´nyi divergence enables accurate and precise cluster analysis 4111 Ripley,B.D. (1976) The second-order analysis of stationary point processes. Sochacki,K.A. et al. (2017) Endocytic proteins are partitioned at the J. Appl. Prob., 13, 255–266. edge of the clathrin lattice in mammalian cells. Nat. Cell Biol., 19, Rothemund,P.W.K. (2006) Folding DNA to create nanoscale shapes and pat- 352–361. terns. Nature, 440, 297–302. Staszowska,A.D. et al. (2017) Investigation of podosome ring protein arrange- Rust,M.J. et al. (2006) Sub-diffraction-limit imaging by stochastic optical re- ment using localization microscopy images. Image Processing for Biologists. construction microscopy (STORM). Nat. Methods, 3, 793–795. Methods, 115, 9–16. Saffarian,S. et al. (2009) Distinct dynamics of endocytic Clathrin-coated pits Steinbach,M. et al. (2003) Introduction to Data Mining. Springer, New York. and coated plaques. PLoS Biol., 7, e1000191–e1000118. Tanaka,H. et al. (2015) Electron microscopic examination of podosomes Schachtner,H. et al. (2013) Podosomes in adhesion, migration, mechanosens- induced by phorbol 12, 13 dibutyrate on the surface of A7r5 cells. ing and matrix remodeling. Cytoskeleton, 70, 572–589. J. Pharmacol. Sci., 128, 78–82. Sengupta,P. et al. (2011) Probing protein heterogeneity in the plasma membrane Vijayakumar,V. et al. (2015) Tyrosine phosphorylation of WIP releases bound using PALM and pair correlation analysis. Nat. Methods, 8, 969–975. WASP and impairs podosome assembly in macrophages. J. Cell Sci., 128, Shannon,M.J. et al. (2015) Protein clustering and spatial organization in 251–265. T-cells. Biochem. Soc. Trans., 43, 315–321. Walde,M. et al. (2014) Vinculin binding angle in podosomes revealed by high Shivanandan,A. et al. (2015) Accounting for limited detection efficiency and resolution microscopy. PLoS One, 9, e88251. localization precision in cluster analysis in single molecule localization mi- Zonnevylle,A.C. et al. (2013) Integration of a high-NA light microscope in a croscopy. PLoS One, 10, e0118767. scanning electron microscope. J. Microscopy, 252, 58–70. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

The Rényi divergence enables accurate and precise cluster analysis for localization microscopy

Loading next page...
 
/lp/ou_press/the-r-nyi-divergence-enables-accurate-and-precise-cluster-analysis-for-40oNVN09Ou

References (68)

Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press.
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/bty403
Publisher site
See Article on Publisher Site

Abstract

Motivation: Clustering analysis is a key technique for quantitatively characterizing structures in lo- calization microscopy images. To build up accurate information about biological structures, it is critical that the quantification is both accurate (close to the ground truth) and precise (has small scatter and is reproducible). Results: Here, we describe how the Re´ nyi divergence can be used for cluster radius measurements in localization microscopy data. We demonstrate that the Re´nyi divergence can operate with high levels of background and provides results which are more accurate than Ripley’s functions, Voronoi tesselation or DBSCAN. Availability and implementation: The data supporting this research and the software described are accessible at the following site: https://dx.doi.org/10.18742/RDM01-316. Correspondence and requests for materials should be addressed to the corresponding author. Contact: adela.staszowska@gmail.com or susan.cox@kcl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. 1 Introduction Localization microscopy is a super-resolution imaging method based cells (Sengupta et al., 2011)] and their influence on the structural on detecting randomly activated, single molecules in a sequence of changes [sphingolipid ceramides in Jurkat, U2OS and HBMEC images. A super-resolution image is then reconstructed using this list (Burgert et al., 2017)]. However, analysing this type of data is often of single molecule localizations (Betzig et al., 2006; Rust et al., challenging, as the data often contains background localizations or 2006). A common application of localization microscopy is to multiple fluorophore re-appearances (Deschout et al., 2014), which study proteins clustered on the cell membrane. These clusters can make it difficult to identify clusters and/or lead to inaccurate [approximated as Gaussian (Sengupta et al., 2011; Shivanandan size measurements. et al., 2015) or Lorentzian (Pertsinidis et al., 2013)] can be analysed The methods used for cluster analysis of localization microscopy with quantitative methods to provide measurement of their proper- images can be roughly divided into spatial statistical methods [such ties or functions. For example, analysis of protein clusters present in as Ripley’s functions (Ripley, 1976)] and density based algorithms the cell plasma membrane revealed their role in cell signalling [LAT, [such as DBSCAN (Ester et al., 1996) or SR-Tesseler (Levet et al., LFA-1 and Src-family proteins clusters in T-cells (Owen et al., 2010; 2015)]. DBSCAN uses a distance measure to aggregate data points Shannon et al., 2015)], dynamics [GPI-anchored proteins in COS-7 into separate categories: clustered points and noise. SR-Tesseler V The Author(s) 2018. Published by Oxford University Press. 4102 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. The Re´nyi divergence enables accurate and precise cluster analysis 4103 " ! # a1 N a1 N method is based on creating the Voronoı ¨ diagrams to segment data X X 1 1 1 D / ln Id < r ; (2) a ij points into separate ‘cells’. Both of these methods use a global defin- a  1 N N i¼1 j¼1 ition of density to filter areas with higher point aggregation (Ester et al., 1996; Levet et al., 2015). However, it has been argued that where N is the number of points in the dataset, a is the scaling coef- obtaining the density parameter for localization microscopy data ficient, d is the distance between the ith central point and the jth ij may be difficult (Ankers et al., 1999; Steinbach et al., 2003). point, and Id < r is an indicator function, which has value 1 for ij Generally, the spatial statistical analysis methods are more universal d < r and 0 otherwise. The scaling coefficient a controls the ij and require less prior knowledge about the system than the density Re ´ nyi divergence function value, as it promotes areas with higher based algorithms (Deschout et al., 2014). density (clusters) over the lower intensity of background noise Ripley’s K function is a widely used method which provides a (Supplementary Notes S3 and S2). It should be also noted that in the measure of the difference between the data distribution and a uni- case of clustering analysis all points which belong to any cluster are form distribution. There are three versions of Ripley’s functions treated as signal. The not clustered points we have called noise, these called K, L and H (Supplementary Note S1). The K and L functions points originated from localizations of background fluorescence, are usually used to detect the presence of clustering in the data and free floating dye or non-specific staining. In this study we compare the H function is used to calculate cluster size using the maximum the performance of our method, based on the Re ´ nyi divergence, of H(r) (Deschout et al.,2014; Sengupta et al.,2011; Shannon with Ripley’s H function, using simulated and experimental data. et al.,2015). Recently, it has been noted that measuring the cluster For experimental data we have also compared the performance of radius by detecting the radius for which the Ripley’s H function is the Re ´ nyi divergence with SR-Tesseler and DBSCAN, which are the maximal can be biased (Kiskowski et al., 2009; Lagache et al., density based clustering methods. We have imaged three types of 2013) and that the origin of this bias has been reported to be structures to assess the precision and accuracy of the Re ´ nyi diver- related to density of the clusters in the dataset (Kiskowski et al., gence as a measure of cluster size: DNA-origami plates with known 2009). This can lead to different bias in the measured cluster radius size (Rothemund, 2006), clathrin-coated pits in HeLa cells, the size for different cluster densities. Approaches have been proposed to of which has been confirmed via electron microscopy (Saffarian remove the bias from the cluster radius measurements, for example et al., 2009; Sochacki et al., 2017), vaccinia virus particles in HeLa by measuring the cluster size as a half of the radius for which the cells, which were imaged with correlative localization and electron first derivative of the Ripley’s H function has the value of –1 microscopy (Brama et al., 2015; Peddie et al., 2017) and podosome (Kiskowski et al., 2009)orbyusingaconstantcorrectioncoeffi- cores in THP-1 cells which size varies across the cell and were previ- cient (Lagache et al., 2013; Supplementary Note S1 and ously imaged using electron microscopy (Tanaka et al., 2015) and Supplementary Fig. S1a and b). atomic force microscopy (Labernadie et al., 2010; Luxenburg et al., However, these methods do not remove the cause of the bias 2007). These biological structures were selected because they can be which is intrinsically related to the method itself, as this method observed in different cell types. Clathrin coated pits are responsible weights signal (clustered points) and noise (not clustered points) for active intra-cellular transport (endocytosis), the mechanism of equally (Supplementary Note S2). Additionally bias removal, as pro- which is similar across different cell types. The vaccinia virus is a posed by these approaches, can lead to new inconsistencies in the poxvirus previously used in smallpox vaccination. Currently, the cluster radius measurements. For example, the first derivative of vaccinia virus is used for study of gene expression and for creating Ripley’s H function is calculated numerically, thus the frequency new vaccines [using a vaccinia virus mutant (Jacobs et al., 2009)]. with which it is calculated is going to affect the first derivative value and detecting a specific value will be either impossible or provide worse results than a traditional method (Supplementary Fig. S1a). 2 Materials and methods The other approach using a constant coefficient to decrease the 2.1 Sample preparation measured cluster radius can influence the performance of Ripley’s H DNA origami is structures formed by artificial folding of DNA. function and its resistance to noise (Supplementary Fig. S1b). It Here, we used flat DNA origami plates with known size of has also been suggested that these methods for bias removal may be 60  90 nm (GATTAquant). The DNA plates were uniformly cov- better suited for analysis of simulated datasets than real data ered with Alexa Fluor 647 molecules (each plate with approximately (Kiskowski et al., 2009). 20 molecules). Localization microscopy samples were prepared Here, we propose use of the Re ´ nyi divergence (Re ´ nyi, 1965)asa according to the protocol supplied by GATTAquant. The DNA ori- measure for cluster analysis. The Re ´ nyi divergence quantifies the dif- gami was immobilized on BSA-biotin-neutravidin surface. The ference between two distributions: dishes (35 mm dishes with #1.5 glass coverslip bottom, Ibidi, "# a1 Germany) were washed with PBS (three times) and incubated with 1 pxðÞ DðÞ px ð ÞjjqxðÞ¼ ln pxðÞ dx ; (1) 200 ll of BSA-biotin solution (0.5 g/l in PBS) for 5 min, followed by a  1 qxðÞ biotin solution removal and further washes in PBS (three times). where p(x) is the data distribution and q(x) is a clustered reference Dishes were then incubated with neutravidin solution (0.5 g/l in distribution (if different than clustered reference distribution has PBS) for 5 min (neutravidin solution was removed and washed with been used it has been indexed, e.g. ‘U’ for the uniform distribution, PBS, three times). The diluted solution of DNA origami (around 100 Supplementary Note S3). We have used a clustered reference distri- times: 0.5 ll of DNA origami solution þ 50 ml of PBS) was placed in the dish and incubated for 5 min followed by a washing step (three bution as more closely related to the sample distribution and is therefore relevant for analysis clustered biological samples. Ripley’s times in PBS). The optimal dilution factor was selected using guide- H function can be derived from the Re ´ nyi divergence (for a ¼ 2), lines provided in the sample preparation protocol, leading to an U 1 2 and a uniform reference distribution q ðÞ x ¼ (Supplementary average density of DNA-origami plates 1 plate/lm . Note S4). For the purpose of clustering analysis, we used a sampling Samples with labelled clathrin coated pits were prepared using approximation of the Re ´ nyi divergence: HeLa cells (ATCC, CCL-2) seeded on 35 mm dishes with #1.5 glass 4104 A.D.Staszowska et al. coverslip bottom (Ibidi, Germany) 510 cells/dish. After leaving 2.3 Monte Carlo data generation the cells to adhere for 16 h, the cells were fixed for 20 min in 3.6% The simulated datasets for cluster analysis were created using Cþþ and Matlab (2016b). For each dataset the cluster positions formaldehyde, permeabilized for 5 min in 1% saponin and blocked were selected randomly. Clusters were simulated with desired in 3% BSA for 30 min. The anti-clathrin anti-body (BD Biosciences, 610499) was added diluted 1:200 or 1:500 in PBS with 3% BSA and parameters varying the number of clusters in the dataset, density of points in the cluster, cluster size, distribution of points in the cluster 0.5% saponin and incubated for 1 h. An Alexa Fluor 647 conjugated (uniform or Gaussian). Uniform background noise was added at the secondary anti-body (Invitrogen, A21235) was diluted to 1:500 in desired level. Both the noise and clustered points had the same inten- PBS with 3% BSA and incubated for 30 min. sity. Signal-to-noise ratio was calculated for each noise level as a Vaccinia virus samples were prepared according to the protocol ratio of number of points in the same area, in areas classed as signal published in (Peddie et al., 2017). and background. See Supplementary Figure S2 for more information Podosome samples were prepared using THP-1 cell line (ATCC, about signal-to-noise ratio and an example of the data. TIB-202). The cell culture, fixation and staining were performed according to the protocol presented in Staszowska et al. (2017); 2.4 Localization microscopy data simulation Vijayakumar et al. (2015). For the staining of actin present in the Thelocalizationmicroscopydataweresimulated usingaMatlab podosome core we have used Alexa Fluor 647 conjugated to phal- (2016b) software. The positions of fluorophores in the data were defined loidin (Invitrogen, A22287). To identify podosomes in fixed cells, using 10 datasets simulated for Monte Carlo testing (10 clusters with 80 adhesive rings were confirmed using vinculin conjugated Alexa and 160 nm radius). As the localization precision and quality of data Fluor 488. depends in a great deal on the density of data we have simulated single molecules with varying density (by setting a different number of frames 2.2 Imaging for which a fluorophore was in the dark state). For each density, 3000 The localization microscopy imaging was performed using the frames of single molecule data were simulated with pixel size 100 nm. STORM system at Nikon Imaging Center at King’s College We have also set the simulated fluorophores not to bleach permanently. London. This system is equipped with an Eclipse Ti-E Inverted Then the single molecule images were analysed with ThunderSTORM Nikon Microscope, Andor Ixon EMCCD back-illuminated camera (similarly to experimental data) and clustering analysis. (DU-897U-CSO-#BU), laser and LED light sources (laser wave- lengths and powers are: 405 nm, 30 mW; 488 nm, 90 mW; 2.5 Clustering analysis software 514 nm, 50 mW; 561 nm, 90 mW and 647 nm, 170 mW) and is The Re ´ nyi divergence analysis of clustering datasets was performed operated with NIS Elements software with N-STORM module. using Equation (14) in Supplementary Note S3. The Re ´ nyi diver- The imaging was performed with TIRF, using a 100x, N.A. 1.49 gence was calculated for a range of a values (usually 10–120 with objective. increments of 10). Additionally, the same software also computed The 647 nm laser power was adjusted during the acquisition the Ripley’s H function. The first and second derivatives of the to acquire similar number of counts in every frame (as far as Re ´ nyi divergence and Ripley’s H function were calculated using the possible). To improve the signal-to-noise (S/N) of the images fxðÞ þhfxðÞ h fðÞ xþhfxðÞ h 0 00 numerical derivative: f ðÞ x ¼ and f ðÞ x ¼ 2h 2h the excitation light angle was adjusted to include only the for second derivative calculation. The step size h was set to be equal fluorophores at a given optical plane and exclude background to the sampling for the Re ´ nyi divergence and Ripley’s function cal- fluorescence. The imaging angle was adjusted for every imaged culation. It should be noted that the derivative calculated for the section and selected so that the image has the highest possible Re ´ nyi divergence clustering analysis was only used to detect gradient contrast. changes in the Re ´ nyi divergence and its first derivative curves. Prior to imaging, the samples (DNA origami, clathrin coated pits Prior to the Re ´ nyi divergence and Ripley’s H function calculation and podosomes) were immersed in the imaging buffer. The base buf- for the clathrin coated pit and vaccinia virus particle datasets, we fer was made with b-Mercaptoethylamine (MEA, Sigma Aldrich, have selected regions of interest for the localization microscopy 30070-50G) according to the recipe presented in (Nikon, 2015). For images. This was performed to include only areas where the ana- better stability of the Alexa Fluor 647, Cyclooctatetraene (COT, lysed structures were present in the images, and exclude areas where 98%, Sigma Aldrich, 138924-1G) dissolved in DMSO (Sigma clusters were not present or false clustering occurred. For clathrin Aldrich, 472301-1L-D) was added to the base buffer to a final con- coated pits we have created mask images to cover the area of the centration of 2 mM (Dave et al., 2009). HeLa cell as the clathrin pits were present throughout the cell. In each imaging series about 6000 frames for DNA origami sam- Region of interest was selected similarly also for podosomes, to in- ples and 10 000 for clathrin-coated pits and podosomes were clude only the areas where the podosomes were present in the acquired at a rate of 30–50 frames per second. images. For vaccinia virus particles we have selected rectangular Imaging of vaccinia virus samples, using scanning electron mi- regions of interest containing a high density of vaccinia virus par- croscopy and localization microscopy was performed as discussed in ticles. The masks and notes on the size of each region of interest can (Brama et al., 2015; Peddie et al., 2014). be found in the data bundle. It should be noted that no region of Super-resolution images were reconstructed with ThunderSTORM interest was selected for analysis of DNA-origami samples as the (Ovesn y et al., 2014) using single-emitter and multi-emitter settings. DNA-plates were randomly distributed across the whole image. Molecule localizations were filtered to remove false localizations The density based clustering was performed using the SR- with FWHM sizes much smaller than the predicted PSF (110 nm for Tesseler software (Levet et al., 2015), which also performed the DNA-origami, clathrin coated pit and podosome samples and 90 nm DBSCAN analysis. The cluster radius was also calculated using SR- for vaccinia virus particles). Additionally, drift was removed from the Tesseler software using the Voronoı ¨ diagrams, which segmented the reconstructed images using the cross correlation option with binning 5 data into single point domains. Then the segmented image was and magnification 5.0. thresholded to exclude regions with density which was two times The Re´nyi divergence enables accurate and precise cluster analysis 4105 smaller than the average density [as suggested in (Levet et al., 2015), investigate size of DNA origami in reconstructed localization mi- we have also tested different values of the density threshold param- croscopy images. eter, Supplementary Fig. S3]. The last step was to spatially filter the diagram image to exclude structures smaller or bigger than the size 2.8 Detection and characterization of vaccinia virus of the analysed structure. Both the SR-Tesseler and DBSCAN meas- particles in electron microscopy images ured the cluster radius using an ellipse fitted to a segmented cluster HeLa cells infected with vaccinia virus were prepared using an in- patch. The cluster radius is the calculated as a half of an average of resin fluorescence protocol which introduces electron contrast into the longest and shortest axis of the fitted ellipse. It should be also the sample whilst preserving fluorescence (Peddie et al., 2014). noted that the fitted ellipses were smaller than the segmented clus- Thus, the virus particles appear in the electron microscopy images as ters, leading to underestimation of cluster size. For DNA origami we dark circular features, often surrounded by a bright ring. In order to set the desired cluster radius to be between 15 nm and 100 nm, for match the area where the fluorescent proteins are present in the sam- clathrin coated pits––50 nm and 170 nm and Vaccinia virus particles ple, we applied a thresholding method to include only electron dense 60 nm and 170 nm. For the DBSCAN analysis we have adjusted the core of each particle. The radius of each particle was measured as an distance parameter, which is used by the software to estimate cluster average of its shortest and longest dimension. It should be also noted radius: 54 nm for DNA origami (Supplementary Note S5), 100 nm that in this method each particle radius is measured separately, for clathrin coated pits [estimated based on the electron microscopy while the Re ´ nyi divergence and Ripley’s H function provide an aver- images presented in (Sochacki et al., 2017)] and 108 nm for vaccinia age measurement for a dataset. virus particles (measured for scanning electron microscopy images). The analysis of vaccinia virus particles was performed using Matlab (2016b). Each electron microscopy image was thresholded 2.6 Cluster radius calculation using a range of thresholds to account for the variation of brightness The cluster radius was calculated using the Re ´ nyi divergence and on the image (this was usually from 75% to 100% of the mean in- Ripley’s H function. The Re ´ nyi divergence for a specific a was calcu- tensity value). Thresholded images were binarized using im2bn() lated as a function of radius. The function counts the number of function and inverted using function imcomplement() to ensure that points inside a given radius. Because of the weighting effect of a, the the white areas in the binary images correspond to the areas with main component of the function value comes from high density low intensity on the original image. Some particles had holes and areas (clusters). Thus, when the radius is equal to the actual cluster empty edges due to variation in brightness in the original image, size the function value does not change or changes by a very small these were filled using imfill() and bwmorph(). Then, particles were amount for several radius values. This means that a plateau can be filtered using a spatial filter (excluding particles smaller or bigger detected on the function and the cluster radius can be found by look- than a set range of area values, here it was 40 and 600 pixels, with ing for points with gradient equal to 0 (first and second derivatives pixel size 16.5 nm), eccentricity and solidity filter, which used a stat- of function need to be equal to 0). In practice, we did not detect the istic provided by the regionprops() function. The filtering to remove 0 gradient but set a tolerance values for the first and second deriva- elliptical and empty (non-solid) particles was performed using ex- tives. We have used a tolerance of 1% change for the first and 0.5% perimentally set ranges, 0–0.8 for elliptical filtering and 0.2–1 for for the second derivatives. These values were set experimentally, solidity filtering. The particles identified by the software were con- using simulated data with 10 clusters, 80 nm radius with and with- firmed by the user, by selecting the correct particle identifications. out any noise points. We have used the same tolerance values for all Lastly, the software saved an image with identified particles and a analysed data. In practice, we have used a set of tolerance values for file containing information about the radius of each virus particle the plateau detection. Our algorithm searched for the first point in a (see Supplementary Fig. S4 for a region of a particle identification plateau as an estimate for the radius value (at least three points image). needed to be equal to 0 within a set tolerance). The radius measurements using Ripley’s H function were calcu- 2.9 Statistical analysis lated by detecting the maximum of the Ripley’s H function. We used Monte Carlo simulations to establish the stability of our analysis method compared with Ripley’s H function. The Monte 2.7 Estimation of size of clathrin coated pits Carlo testing was performed for simulated data similarly to Clathrin coated pits have been extensively studied using electron mi- (Kiskowski et al., 2009). Random datasets with the same character- croscopy (Saffarian et al., 2009; Sochacki et al., 2017), and their istics (as an original dataset––number and size of clusters, noise size in HeLa cells depends on the size of their cargo (Saffarian et al., level) were generated and for each new dataset cluster radius meas- 2009). A visual inspection of electron microscopy images suggests a urements with the Re ´ nyi divergence and Ripley’s H function were size of around 100 nm (Saffarian et al., 2009; Sochacki et al., 2017). performed. The number of the Monte Carlo repetitions was selected Additionally, we also performed a visual check of the cluster radius by testing the changes in the value of measured mean and SD. The measurement for the clathrin coated pits. We have used a local- simulations stopped when the variation in both of these values was ization microscopy image of the clathrin coated pits (prepared with around 0.5% (Supplementary Fig. S5). 1:200 anti-clathrin anti-body and pre-processed with single-emitter The histograms of the results of cluster radius measurements for fitting) to draw circles which enclosed the clustered points. We have simulated data suggested that the results do not belong to the same selected this dataset at random (it was dataset six from the first sam- distributions (Fig. 1c). Thus to check if these results could originate ple). Each circle drawn was numbered (Supplementary Fig. S4c) and from the same distribution we have performed the two-sample its radius measured (by saving the size of the drawn circle). We have Kolmogorov–Smirnov test. Our null hypothesis was that they do be- collected results from 203 clusters and these are presented in long to the same distribution. The test suggested that it is extremely Supplementary Figure S4d. The mean radius measured was unlikely that the results from the Re ´ nyi divergence and Ripley’s H 746 13 nm, close to the expected value and radius measured with function originate from the same distribution. The P-values for 5% the Re ´ nyi divergence. We have also used the same methodology to confidence level are shown in Supplementary Table S1. 4106 A.D.Staszowska et al. Fig. 1. Variation of measured cluster radius with a and noise level for 100 000 simulated datasets. Clusters of radius 80 nm were simulated with a uniform distribu- tion of fluorophores across the cluster. (a) Results of cluster radius measurements with different a values for datasets with increasing levels of noise (measured as mean). (b) Comparison of cluster radius values measured with the Re´ nyi divergence (marked with black) and Ripley’s H function (red) for datasets with increas- ing noise levels. The Ripley’s H function radius measurements were only possible for datasets with S/N higher than 16 [for lower S/N H(r) function does not have a maximum]. The Re´ nyi divergence provided stable radius measurements S/N as low as 6. (c) The histogram of results of cluster radius measurements for simu- lated clusters with S/N 29. The mean and median are shown in dashed lines for each method (in green and blue, respectively). (d) Visualization of the radius measured with the Re´nyi divergence (red circles) and Ripley’s H function (cyan circles) for simulated clusters with radius 80 nm and S/N 29 3 Results same size cluster (Supplementary Fig. S1), different number of clusters in a dataset (Supplementary Fig. S7). These datasets were simulated 3.1 Selection of the scaling coefficient a with uniform noise. We have also simulated single molecule data with The scaling coefficient a tunes the Re ´ nyi divergence response to dif- different densities to test an impact of artefacts and localization preci- ferent properties of data such as the cluster size/shape or the back- sion on fidelity of cluster radius measurement (Fox-Roberts et al., ground noise level. The Re ´ nyi divergence calculation is performed 2017)(Supplementary Figs S8 and S9). The results for different by raising the number of points surrounding a central point (in a cluster types were similar so here we discuss one example, the uniform given radius) to a powerðÞ a  1 . Thus, the Re ´ nyi divergence value clusters with 80 nm radius (for results for other cluster types see from the regions with higher density (clusters) is greater than from Supplementary Figs S1 and S6–S9). The results for simulated data sug- those with smaller density of points (noise) for a > 2 gest that the Re ´ nyi divergence provides a more accurate and reliable (Supplementary Note S3). This also means that the signal is pro- cluster radius measurements than Ripley’s H function. The cluster ra- moted over the noise. The optimal value of a should be selected dius measurements for data with different noise levels suggest that the according to the data properties and usually higher noise levels re- Re ´ nyi divergence provides similar values of radius for different levels quire a higher value of a. In our investigations, we selected a by of noise caused by background localizations. For the same data looking for no or small change in the measured cluster radius value Ripley’s H function can struggle to provide similar measurements or (less than a 2% change as a is increased further, Fig. 1a). In this be unable to measure the cluster radius for data with high noise or work, we used a single value of a ¼ 70, as it provided stable radius small number of clusters (because the function does not have a max- measurements for all analysed datasets (for example, Fig. 1a). imum, required for the cluster measurement, Fig. 1b). The comparison Additionally, we investigated the noise resistance of the Re ´ nyi diver- of the SDs suggest that while the Re ´nyi divergence scatter is higher gence and Ripley’s H function on simulated datasets with different than for Ripley’s H function, it remains similar only slightly increasing noise levels (Fig. 1b and Supplementary Fig. S1c–e). with noise (15 nm, Fig. 1b and c and Supplementary Fig. S1c–e). The results for simulated data suggest that the Re ´ nyi divergence It should be also noted that the cluster measured with the Re ´ nyi provides a more accurate and reliable cluster radius measurements divergence has a small bias up to around 15%. We think that this is than Ripley’s H function. We have analysed data with different size connected to the method used to measure clustering by detecting the of clusters (80 nm and 160 nm radius, Fig. 1 and Supplementary Fig. gradient changes in the Re ´ nyi divergence curve. The formula used S1c), different point distribution across the clusters (uniform and for the numerical derivative calculation increases the measured ra- Gaussian, Supplementary Fig. S1), different number of points in the dius (offset by a sampling step value). Other cause of the biased The Re´nyi divergence enables accurate and precise cluster analysis 4107 Fig. 2. Measurement of the average size of the DNA-origami plates in localization microscopy data. (a) Localization microscopy image of DNA-origami plates, ana- lysis with single-emitter fitting. Scale bar 1 lm. (b) Magnified images of single DNA-origami plates. Scale bar 500 nm. (c) The average expected radius based on the simulations (50 000 repetitions) and the radius measured in the experimental data with the Re´nyi divergence (526 8 nm for simulated and 526 16 nm for ex- perimental data), Ripley’s H function (696 5 nm and 1516 30 nm), SR-Tesseler (226 8 nm) and DBSCAN (376 11 nm). The error bars are the SD cluster radius measurement is reporting of the mean value of the ra- cluster radius measured with the Ripley’s H function was almost dius as it is more widely used statistical measure. As the Re ´ nyi diver- three times bigger (1516 32 nm). Moreover, the comparison of scat- gence results display a sharp peak and a long tail the median ter for the Re ´ nyi divergence (SD 16 nm) and Ripley’s H function (SD indicates much better the most measure value (Fig. 1). For Ripley’s 32 nm) suggests that the Re ´ nyi divergence is more precise. The ex- H function the SD was higher for datasets with no or low noise perimental DNA origami data were also analysed with density based (5 nm for uniform and 30 nm for Gaussian clusters) and lower clustering methods: SR-Tesseler and DBSCAN. DBSCAN measured for higher noise levels (3–5 nm). This suggests that the accuracy of cluster radius which was exactly equal to an average of the plate the measurement is less influenced by noise for the Re ´ nyi divergence size, when SR-Tesseler measured smaller cluster radius (226 8nm than for Ripley’s H function. for SR-Tesseler and 376 11 nm for DBSCAN, Supplementary Note S5). We have also measured sizes of the clathrin coated pits present 3.2 Cluster size measurement for localization in HeLa cells. Clathrin-coated pits have been extensively studied microscopy images using electron microscopy (Saffarian et al., 2009; Sochacki et al., In addition to investigating performance of the Re ´ nyi divergence 2017), and their size in HeLa cells depends on the size of their cargo and Ripley’s H function with simulated data, we also used local- (Saffarian et al., 2009). A visual inspection of electron microscopy ization microscopy data of DNA origami, clathrin coated pits, vac- images suggests a size of around 100 nm (Saffarian et al., 2009; cinia virus particles and podosomes. These structures were selected Sochacki et al., 2017). We have imaged samples with two concentra- because we could confirm their size through techniques other than tions of anti-clathrin anti-body: 1:200 and 1:500 (three samples for localization microscopy. For each structure, we have collected a ser- each primary anti-body concentration, see Methods Section and ies of images and localized single molecules using ThunderSTORM Supplementary Fig. S10b and c), to establish if different amount of (Ovesny et al., 2014) with single- and multi-emitter fitting (for signal from the investigated structure affects the measured radius. multi-emitter data results see Supplementary Fig. S10). For each labelling density, we have imaged 3 samples and selected at The DNA origami constructs were designed to have a specific least 10 regions for imaging. shape and size (Rothemund, 2006), here 60  90 nm rectangular Thus, the accuracy of the radius measurements could be com- plates (with around 20 Alexa Fluor 647 molecules per plate). As the pared both with regard to how close the measured value is to the Re ´ nyi divergence and Ripley’s H function are designed to detect predicted value and how large the scatter in the measurements is, for circular clusters, the measured cluster size is not directly related to the two labelling densities and different fluorophore localization the plate size. We estimated that the measured cluster radius should methods. Our results suggest that the Re ´ nyi divergence provided a be smaller than the radius of a circle passing through all corners (ra- measurement which was both more closely related to the expected dius 54 nm) of the DNA-origami plate and bigger than a circle radius of the clathrin pits and more stable. The average radius meas- enclosed on the plate (radius 30 nm, Supplementary Fig. S4a and ured with the Ripley’s H function, 1566 23 nm, was almost two Supplementary Note S5). To validate this, we simulated rectangular times bigger than the radius measured with the Re ´ nyi divergence, plates with size and shape of the actual DNA plates. We have 866 19 nm. SR-Tesseler and DBSCAN measured cluster radii which observed that the mean radius measured for the simulated rectangu- were smaller than the expected cluster radius, 576 22 nm and lar plates was 526 8 nm (around 4% smaller than the maximum 656 20 nm, respectively. We have also observed some variation in expected radius) for the Re ´ nyi divergence measurement, and the values measured with regard to the labelling density for Ripley’s 696 5 nm (22% bigger than the maximum expected radius) for Ripley’s H function. We expected to see this behaviour preserved in H function (see Results Section in Fig. 3b and Supplementary Fig. S10d–f). The Re ´ nyi divergence, SR-Tesseler, and DBSCAN meas- the experimental results. ured radius values were not affected by the different labelling den- We analysed localization microscopy datasets collected from five sities. Additionally, to confirm our cluster radius measurements for DNA origami samples analysed using single-emitter fitting (for multi-emitter fitting results see Supplementary Fig. S10a). The the clathrin coated pits we have performed a visual estimation of the results suggest that the Re ´ nyi divergence provides cluster radius pit size using a localization microscopy image (Methods Section and measurements which are more accurate than Ripley’s H function Supplementary Fig. S4c). The mean of the estimated radius was (Fig. 2c). While the average cluster radius measured with the Re ´ nyi 746 13 nm (Supplementary Fig. S4d), close to the expected value divergence was equal to the expected radius (526 16 nm), the and radius measured with the Re ´ nyi divergence. 4108 A.D.Staszowska et al. Fig. 3. Measurement of the average size of the clathrin coated pits in localization microscopy data. (a) Localization microscopy image of clathrin coated pits in HeLa cells (labelling 1:200). (b) Magnified images of clathrin coated pits. Scale bar 500 nm. (c) The average expected (746 13 nm) and measured radius, 866 19 nm for the Re´ nyi divergence and 1566 23 nm for Ripley’s H function. SR-Tesseler and DBSCAN measured cluster radius to be smaller than the expected value, 576 22 nm and 656 20 nm, respectively. Localization position fitting was performed with single-emitter fitting. The error bars are the SD Fig. 4. Vaccinia virus particles in HeLa cells imaging and radius measurements. (a) Localization microscopy image, and (b) electron microscopy image. (c) Overlay with electron microscopy image presented in grey-scale, and localization microscopy image in green. Scale bars 1 lm. (d) The average particle radius measured for electron microscopy (EM) images, 1086 29 nm and localization microscopy with the Re´ nyi divergence (1176 48 nm) Ripley’s H function (1906 81 nm), SR-Tesseler (566 15 nm) and DBSCAN (746 57 nm). The error bars are the SD (Color version of this figure is available at Bioinformatics online.) However, due to the limitations of any one imaging technique it localization microscopy data by imaging at 200 Pa (Peddie et al., is highly desirable to cross-validate results against a second imaging 2017). Images of the matching areas in the sample were collected with technique. Recent advances have made possible imaging of the same localization and electron microscopy (Fig. 4a and b). Thus, the sizes sample with simultaneous correlative light and electron microscopy measured for both of these imaging techniques could be directly com- (CLEM; Liv et al., 2013; Zonnevylle et al., 2013). Thus we could pared for exactly the same structures. To estimate the true size of the use the electron microscopy image as an estimation of ground truth vaccinia virus particles we analysed the electron microscopy images and compare it to the quantitative results obtained from localization using an algorithm, which identified and characterized individual virus microscopy data. particles (Methods Section and Supplementary Fig. S4e–g). The mean Vaccinia virus particles in HeLa cells were imaged using correlative radius of the virus particles in the electron microscopy images was localization and scanning electron microscopy. The samples were 1086 29 nm. We compared this value with the average radius meas- transfected with YFP-A3, which localizes to the core of viral particles. ured for localization microscopy data pre-analysed with single-emitter The fixed and frozen samples were then embedded in resin and cut (Fig. 4d) and multi-emitter fitting (Supplementary Fig. S10g). into 200 nm thick sections (Brama et al.,2015; Peddie et al., 2017). Analysis of vaccinia virus particle localization microscopy sug- YFP blinks in a partially hydrated state and so it was possible to obtain gests that the Re ´ nyi divergence provides more accurate measurement The Re´nyi divergence enables accurate and precise cluster analysis 4109 Fig. 5. Podosome cores in THP-1 cells and their size measurement. (a) Localization microscopy image reconstructed with single emitter-fitting. F-actin podosome cores were stained with Alexa Fluor 647 conjugated to phalloidin (reconstructed with single emitter-fitting). Scale bar 1 lm. (b) Magnified single podosome cores with varying sizes. Scale bar 250 nm. (c) Line profiles of four podosome cores from (b). The podosome core has diameter of 500 nm at the base (indicated with black arrow) and between 200 to 300 nm in the most dense region of the core (indicated with two black arrows). (d) Cluster radius measurement of the podosome data. The Re´nyi divergence measured cluster radius of 1326 19 nm, Ripley’s H function 2316 82 nm, SR-Tesseler 1256 46 nm and DBSCAN 2056 85 nm. The error bars are the SD than Ripley’s H function, SR-Tesseler or DBSCAN. For both single independent imaging techniques in (Labernadie et al., 2010; Rafiq and multi-emitter fitting, the Re ´ nyi divergence provided cluster ra- et al., 2016). Cluster radius measurements for podosome cores are less conclu- dius measurements, 1176 48 nm and 1376 62 nm respectively, sive than cluster radius measurements performed for structures pre- which were very close to the particle radius measured in the electron viously discussed in this paper. The podosome core size had been microscopy images. However, the radius measured by Ripley’s H reported to differ across the cell and to be around 500 nm in diam- function, 1906 81 nm for single- and 2216 135 nm for multi- eter at the widest point (Labernadie et al., 2010; Rafiq et al., 2016). emitter fitting, was much bigger than the expected radius of virus The podosome core has a cone shape, meaning that the density of particle. Values smaller than expected radius were measured for localizations changes across the core with higher density in the SR-Tesseler 566 15 nm for single- and 586 17 nm for multi-emitter centre and smaller density on the edge (Fig. 4c). The Re ´ nyi diver- fitting and for DBSCAN657 nm for single- and661 nm for multi- gence and SR-Tesseler detected the radius the area of the core emitter fitting. It should also be noted that the SD values (error bars which has the highest density of localizations, 1326 19 nm and in Fig. 4 and Supplementary Fig. S10) for the Re ´ nyi divergence were 1256 46 nm respectively. Both Ripley’s H function and DBSCAN smaller (48 nm for single-emitter and 62 nm for multi-emitter fitting) reported a larger cluster radius, 2316 82 nm and 2056 85 nm. than for the Ripley’s H function (81 nm for single- and 135 nm for However, it should be noted that Ripley’s H function measured clus- multi-emitter fitting) which suggests that our method provided a ter radius only for 17 datasets out of 30, and for the rest it did not more consistent cluster radius measurement. DBSCAN provided detect any clustering. The remaining methods did not have any similar level of precision to the Re ´ nyi divergence (57 nm for single- problem with detecting clustering. The Re ´ nyi divergence provided a and 61 nm for multi-emitter fitting), where SR-Tesseler had the more stable cluster radius measurement with the smallest SD, than smallest SD for these datasets (15 nm for single- and 17 nm for other methods. It also had similar SD for the measurement for data multi-emitter fitting), however it also underestimated the size of the with clusters with the same size. While the SD for the remaining clusters by 42%. methods, Ripley’s H function, SR-Tesseler and DBSCAN, was Biological structures analysed above have very similar sizes higher than for the datasets with the same cluster size. Here, we dis- through the cell, but this is not always the case in many biological cussed results for single-fitting, the results for multi-emitter fitting structures. An example of such size variability is observed for podo- are presented in Supplementary Table S2. The results for multi- somes. Podosomes are adhesive, actin rich structures formed by the emitter fitting are similar to single-emitter results presented here, see cells to facilitate cell adhesion and migration (Schachtner et al., Supplementary Table S2. We think that the results for podosome 2013). They consist of two major components, an f-actin rich core cores clearly highlight the property of the Re ´ nyi divergence to reli- and the ring complex (build with integrins and integrin associated ably detect areas with high point density. proteins; Foxall et al., 2016; Labernadie et al., 2010; Tanaka et al., The Re ´ nyi divergence has an additional advantage of requiring 2015). The ring has diameter between 0.5 lmto2 lm in diameter no user input. In particular, the values obtained with Ripley’s H and 0.6 lmto1 lm in depth (Meddens et al., 2014; Rafiq et al., function can depend strongly on the region selected for analysis (be- 2016; Walde et al., 2014). The core is smaller, usually around cause of its equal weighting of signal and noise), whereas the Re ´ nyi 0.5 lm in diameter (at the widest point) and 0.5 lm in height divergence remains stable. This is important because it is common (Meddens et al., 2014; Rafiq et al., 2016). The size of podosomes to restrict analysis to structures of interest to decrease the data vol- varies across the cell, depending on their function or lifetime [for ex- ume. For example, for analysis of clathrin coated pits we have ample, in migrating cells, the newly formed podosomes are larger selected regions of interest to include the whole cell and exclude the than older ones (Meddens et al., 2014)]. In reconstructed local- almost empty area around it. We calculated the mean cluster radius ization microscopy images (Fig. 5a and b), we have indeed seen a for clathrin coated pits data with a selected region of interest variation in the size of the podosome cores. It should be also noted (termed ROI) and for the whole field of view (termed Image). that the podosome core has a shape of a cone, meaning that the The mean radius measured for the data changes visibly for the localization density will be vary across the structure (the highest in Ripley’s H function (Supplementary Fig. S11). For the Re ´ nyi diver- the centre and lower on the edges). This has been confirmed using gence it either changes by a small amount (Supplementary Fig. S11c) 4110 A.D.Staszowska et al. or remains the same (Supplementary Fig. S11a, b and d). This suggests References that the Re ´ nyi divergence results are more universal and can be com- Ankers,M. et al. (1999) OPTICS: ordering points to identify the clustering pared between different regions of interest (Supplementary Table S2). structure. Proc. ACM SIGMOD’99 Int. Conf. on Management of Data, 28, The major drawback of the current form of the Re ´ nyi divergence 49–60. is that it is set up to mainly analyse circular clusters. It is however Betzig,E. et al. (2006) Imaging intracellular fluorescent proteins at nanometer possible to change the cluster shape detected by this method using resolution. Science, 313, 1642–1645. similar methodology as presented for Ripley’s functions (Peters Brama,E. et al. (2015) Standard fluorescent proteins as dual-modality probes et al., 2017). However, it should be also noted that the cluster size for correlative experiments in an integrated light and electron microscope. J. Chem. Biol., 8, 179–188. measurement can be still performed for small clusters with non- Burgert,A. et al. (2017) Characterization of plasma membrane ceramides by circular shape, as for example rectangular DNA origami plates. As super-resolution microscopy. Angew. Chem., 129, 6227–6231. the DNA origami plates used in this work were seen in reconstructed Dave,R. et al. (2009) Mitigating unwanted photophysical processes localization microscopy images as circular clusters due to limited for improved single-molecule fluorescence imaging. Biophys. J., 96, localization precision. 2371–2381. Deschout,H. et al. (2014) Progress in quantitative single-molecule localization microscopy. Histochem. Cell Biol., 142, 5–17. 4 Discussion Ester,M. et al. (1996) A density-based algorithm for discovering clusters in Localization microscopy data are more and more commonly used for large spatial databases with noise. In: Simoudis,E. et al. (eds) Proceedings of quantitative analysis. However, measuring the exact size of the struc- the Second International Conference on Knowledge Discovery and Data tures in the images/data can be problematic using the popular Ripley’s Mining (KDD-96), AAAI press, Paolo Alto, CA, pp. 226–231. Fox-Roberts,P. et al. (2017) Local dimensionality determines imaging speed in H function. Here, we have presented a method based on the Re ´ nyi di- localization microscopy. Nat. Commun., 8, 13558. vergence which can be tuned in response to noise and/or data proper- Foxall,E. et al. (2016) Significance of kinase activity in the dynamic invado- ties. Our method provides more accurate cluster radius measurements some. Eur. J. Cell Biol., 95, 483–492. than widely used Ripley’s H function. It also helps to overcome the Jacobs,B.L. et al. (2009) Vaccinia virus vaccines: past, present and future. bias present in the radius measurements provided by the Ripley’s H Antiviral Res., 84, 1–13. function. Similarly, when compared with the density based clustering Kiskowski,M.A. et al. (2009) On the use of Ripley’s K-function and its deriva- methods, the Re ´nyi divergence provides cluster radius measurement tives to analyze domain size. Biophys. J., 97, 1095–1103. which is closer to the expected cluster radius. We think that the radius Labernadie,A. et al. (2010) Dynamics of podosome stiffness revealed by atom- measurement was lowered for the density based methods, as they per- ic force microscopy. Proc. Natl. Acad. Sci. USA, 107, 21016–21021. form cluster radius measurement based on an ellipse fitted to a seg- Lagache,T. et al. (2013) Analysis of the spatial organization of molecules with mented cluster, which is smaller than the segmented cluster. robust statistics. PLoS One, 8, e80914–e80917. Levet,F. et al. (2015) SR-Tesseler: a method to segment and quantify The Re ´ nyi divergence provides an accurate and precise method localization-based super-resolution microscopy data. Nat. Methods, 12, of cluster radius measurement. We have shown that the commonly 1065–1071. used Ripley’s H function provides cluster radius measurements big- Liv,N. et al. (2013) Simultaneous correlative scanning electron and high-NA ger (by around 60%) than the actual cluster radius, similarly to the fluorescence microscopy. PLoS One, 8, e55707. results presented in (Kiskowski et al., 2009; Lagache et al., 2013). Luxenburg,C. et al. (2007) The architecture of the adhesive apparatus of cul- The Re ´ nyi divergence does not require user input or selection of tured osteoclasts: from podosome formation to sealing zone assembly. PLoS region of interest (contrary to Ripley’s H function, Supplementary One, 2, e179. Fig. S11, SR-Tesseler or DBSCAN) and the selection of a can be Meddens,M.B.M. et al. (2014) Podosomes revealed by advanced bioimaging: done after the analysis was performed. Apart from these benefits the what did we learn? Eur. J. Cell Biol., 93, 380–387. Re ´ nyi divergence calculation can be performed very fast, taking Nikon (2015) Super Resolution Microscope N-STORM. STORM around 20 s to analyse a dataset containing 30 000 localized points. Protocol-Sample Preparation. (10 November 2015, date last accessed). Ovesn y,M. et al. (2014) ThunderSTORM: a comprehensive ImageJ plug-in for PALM and STORM data analysis and super-resolution imaging. Acknowledgements Bioinformatics, 30, 2389. Owen,D.M. et al. (2010) PALM imaging and cluster analysis of protein het- The authors thank R. Marsh, A. Radenovic, R. Szepietowski and E. Rosten erogeneity at the cell surface. J. Biophotonics, 3, 446–454. for early discussions. They also acknowledge help and suggestions from P. Peddie,C.J. et al. (2014) Correlative and integrated light and electron micros- Annibale, H. Deschout and A. Radenovic. They also thank M. Way for the copy of in-resin GFP fluorescence, used to localise diacylglycerol in mamma- vaccinia virus samples. lian cells. Ultramicroscopy, 143, 3–14. Peddie,C.J. et al. (2017) Correlative super-resolution fluorescence and electron microscopy using conventional fluorescent proteins in vacuo. J. Struct. Funding Biol., 199, 120–131. This work was supported by the Medical Research Council (Next Generation Pertsinidis,A. et al. (2013) Ultrahigh-resolution imaging reveals formation of Optical Microscopy Initiative Program grant MR/K015664/1), the neuronal SNARE/Munc18 complexes in situ. Proc. Natl. Acad. Sci. USA, Biotechnology and Biological Sciences Research Council (grants BB/K01563X/1 110, 2. E2812–E2820. and BB/N022696/1) and the Francis Crick Institute which receives its core fund- Peters,R. et al. (2017) Quantification of fibrous spatial point patterns from ing from Cancer Research UK (FC001999), the UK Medical Research Council single-molecule localization microscopy (SMLM) data. Bioinformatics, 33, (FC001999), the Human Frontier Science Program (RGP0035/2016), and the 1703–1711. Wellcome Trust (FC001999). S.C. was supported by a Royal Society University Rafiq,N.B.M. et al. (2016) Podosome assembly is controlled by the GTPase Research Fellowship. A.S. was funded by an EPSRC studentship. The authors ARF1 and its nucleotide exchange factor ARNO. J. Cell Biol., 216, thank the Nikon Imaging Centre at King’s College London, where localization 181–197. microscopy data were acquired. Re ´ nyi,A. (1965) On the foundations of information theory. Revue de l’institut Conflict of Interest: none declared. International de Statistique/Rev. Int. Stat. Inst., 33, 1–14. The Re´nyi divergence enables accurate and precise cluster analysis 4111 Ripley,B.D. (1976) The second-order analysis of stationary point processes. Sochacki,K.A. et al. (2017) Endocytic proteins are partitioned at the J. Appl. Prob., 13, 255–266. edge of the clathrin lattice in mammalian cells. Nat. Cell Biol., 19, Rothemund,P.W.K. (2006) Folding DNA to create nanoscale shapes and pat- 352–361. terns. Nature, 440, 297–302. Staszowska,A.D. et al. (2017) Investigation of podosome ring protein arrange- Rust,M.J. et al. (2006) Sub-diffraction-limit imaging by stochastic optical re- ment using localization microscopy images. Image Processing for Biologists. construction microscopy (STORM). Nat. Methods, 3, 793–795. Methods, 115, 9–16. Saffarian,S. et al. (2009) Distinct dynamics of endocytic Clathrin-coated pits Steinbach,M. et al. (2003) Introduction to Data Mining. Springer, New York. and coated plaques. PLoS Biol., 7, e1000191–e1000118. Tanaka,H. et al. (2015) Electron microscopic examination of podosomes Schachtner,H. et al. (2013) Podosomes in adhesion, migration, mechanosens- induced by phorbol 12, 13 dibutyrate on the surface of A7r5 cells. ing and matrix remodeling. Cytoskeleton, 70, 572–589. J. Pharmacol. Sci., 128, 78–82. Sengupta,P. et al. (2011) Probing protein heterogeneity in the plasma membrane Vijayakumar,V. et al. (2015) Tyrosine phosphorylation of WIP releases bound using PALM and pair correlation analysis. Nat. Methods, 8, 969–975. WASP and impairs podosome assembly in macrophages. J. Cell Sci., 128, Shannon,M.J. et al. (2015) Protein clustering and spatial organization in 251–265. T-cells. Biochem. Soc. Trans., 43, 315–321. Walde,M. et al. (2014) Vinculin binding angle in podosomes revealed by high Shivanandan,A. et al. (2015) Accounting for limited detection efficiency and resolution microscopy. PLoS One, 9, e88251. localization precision in cluster analysis in single molecule localization mi- Zonnevylle,A.C. et al. (2013) Integration of a high-NA light microscope in a croscopy. PLoS One, 10, e0118767. scanning electron microscope. J. Microscopy, 252, 58–70.

Journal

BioinformaticsOxford University Press

Published: Jun 1, 2018

There are no references for this article.