Effective deep learning training for single-image super-resolution in endomicroscopy exploiting video-registration-based reconstruction

Effective deep learning training for single-image super-resolution in endomicroscopy exploiting... Purpose Probe-based confocal laser endomicroscopy (pCLE) is a recent imaging modality that allows performing in vivo optical biopsies. The design of pCLE hardware, and its reliance on an optical fibre bundle, fundamentally limits the image quality with a few tens of thousands fibres, each acting as the equivalent of a single-pixel detector, assembled into a single fibre bundle. Video registration techniques can be used to estimate high-resolution (HR) images by exploiting the temporal information contained in a sequence of low-resolution (LR) images. However, the alignment of LR frames, required for the fusion, is computationally demanding and prone to artefacts. Methods In this work, we propose a novel synthetic data generation approach to train exemplar-based Deep Neural Networks (DNNs). HR pCLE images with enhanced quality are recovered by the models trained on pairs of estimated HR images (generated by the video registration algorithm) and realistic synthetic LR images. Performance of three different state-of-the- art DNNs techniques were analysed on a Smart Atlas database of 8806 images from 238 pCLE video sequences. The results were validated through an extensive image quality assessment that takes into account different quality scores, including a Mean Opinion Score (MOS). Results Results indicate that the proposed solution produces an effective improvement in the quality of the obtained recon- structed image. Conclusion The proposed training strategy and associated DNNs allows us to perform convincing super-resolution of pCLE images. Keywords Example-based super-resolution · Deep learning · Probe-based confocal laser endomicroscopy · Mosaicking Introduction Probe-based confocal laser endomicroscopy (pCLE) is a Daniele Ravì and Agnieszka Barbara Szczotka have contributed equally state-of-the-art imaging system used in clinical practice for to this work. in situ and real time in vivo optical biopsy. In particular, Electronic supplementary material The online version of this article recent works using Cellvizio (Mauna Kea Technologies, (https://doi.org/10.1007/s11548-018-1764-0) contains supplementary France) have demonstrated the impact of introducing pCLE material, which is available to authorized users. as a new imaging modality for the diagnostics procedures B Agnieszka Barbara Szczotka of conditions such as pancreatic cystic tumours and the agnieszka.szczotka.15@ucl.ac.uk surveillance of Barrett’s oesophagus [4]. pCLE is a recent Daniele Ravì imaging modality in gastrointestinal and pancreaticobiliary d.ravi@ucl.ac.uk diseases [4]. Dzhoshkun Ismail Shakir d.shakir@ucl.ac.uk Wellcome/EPSRC Centre for Interventional and Surgical Stephen P. Pereira Sciences, University College London, London, UK stephen.pereira@ucl.ac.uk UCL Institute for Liver and Digestive Health, University Tom Vercauteren College London, London, UK t.vercauteren@ucl.ac.uk 123 918 International Journal of Computer Assisted Radiology and Surgery (2018) 13:917–924 The authors of [4] have shown that despite clear clinical reconstruction procedure that maps the tissue signal from the benefits of pCLE, improving its specificity and sensitivity irregular fibre grid to the Cartesian grid. would help it become a routine diagnostic tool. Specificity The contribution of this work is threefold. First, three deep and sensitivity are directly dependent on the quality of the learning models for SISR are examined on the pCLE data. pCLE images. Therefore, increasing the resolution of these Second, to overcome the problem of the lack of ground-truth images might bring a more reliable source of information and low-resolution (LR)/HR image pairs for training purposes, improve pCLE diagnosis. a novel pipeline to generate pseudo-ground-truth data by Certainly, the key point of pCLE is its suitability for real- leveraging an existing video registration technique [13]is time and intraoperative usage. Having high-quality images in proposed. Third, in the absence of a reference HR ground real time potentially allows for better pCLE interpretability. truth, to assess the clinical validity of our approach, a Mean Thus, offline processing would not fit in the standard clinical Opinion Score (MOS) study was conducted with nine experts work-flow required in this context. (1–10 years of experience) each assessing 46 images accord- The trend for image sensor manufacturers is to increase the ing to three different criteria. To our knowledge, this is the resolution, as apparent in the current move to high-definition first research work to address the challenge of SISR recon- endoscopic detectors. Recently introduced 4K endoscopes struction for pCLE images based on deep learning, generate provide 8M pixels, a difference to pCLE of 2-to-3 orders pCLE pseudo-ground-truth data for training of EBSR mod- of magnitude. In pCLE, reliance on an imaging guide—an els and demonstrate that pseudo-ground-truth trained models optical fibre bundle, composed of a few tens of thousands of provide convincing SR reconstruction. optical fibres, each acting as the equivalent of a single-pixel The rest of the paper is organised as follows. “Related detector—fundamentally limits the image quality. These work” section presents the state of the art for SISR with nat- fibres are irregularly positioned in the bundle which implies ural images. “Materials and methods” section presents the that tissue signal is a collection of pixels sampled on an proposed training methodology based on realistic pseudo- irregular grid. Hence, a reconstruction procedure is needed ground-truth generation and detail the implementation of for mapping the irregular samples to a Cartesian image. the SISR models. “Results” section gives information on the Other factors that reduce pCLE image quality are cross-talk evaluation of our approach using a quantitative image quality among neighbouring fibres and limited signal-to-noise ratio. assessment (IQA) and a MOS study. “Discussion and conclu- All these factors lead to the generation of images with arte- sions” section summarises the contribution of this research facts, noise, relatively low contrast and resolution. This work to pCLE SISR. proposes a software-based resolution augmentation method which is more agile and simpler to implement than hardware engineering solutions. Related work Building on from the idea that high-resolution (HR) images are desired, this study explores advanced single- Super-resolution (SR) has received a lot of interest from the image super-resolution (SISR) techniques which can con- computer vision community in the recent decades [10]. Initial tribute to effective improvement in image quality. Although SR approaches were based on single-image super-resolution SISR for natural images is a relatively mature field, this work (SISR) and exploited signal processing techniques applied is the first attempt to translate these solutions into the pCLE to the input image. An alternative to SISR is multi-frame context. Beyond SISR, video registration technique [13]have image super-resolution based on the idea that HR image can been proposed to increase the resolution of pCLE. Such meth- be reconstructed by fusing many LR images together. Ideally, ods provide a baseline super-resolution technique, but suffers the combination of several LR image sources enriches the from artefact and are computationally too expensive to be information content of the reconstructed HR image and con- applied in real time. Because of the recent success of deep tributes to improving its quality. Registration can be used to learning for SISR on natural images [1], this work focuses merge LR images acquired at slightly shifted field-of-views on exemplar-based super-resolution (EBSR) deep learning into a unified HR image. techniques. However, the translation of these methods to the In the specific context of pCLE, the work proposed by pCLE domain is not straightforward, notably due to the lack Vercauteren et al. [13] presents a video registration algorithm of ground-truth HR images required for the training. There that, in some cases, can improve spatial information of the is indeed no equivalent imaging device capable of producing reconstructed pCLE image, and reveals details which were higher-resolution endomicroscopic imaging, nor any robust not visible initially. The quality of the registration is a key and highly accurate means of spatially matching microscopic step to the success of the SR reconstruction, but the alignment images acquired across scales with different devices. Further- of images captured at different times is not trivial. Misalign- more, in comparison with natural images, currently available ment leads to incorrect fusion and generates artefacts such as pCLE images suffer from specific artefacts introduced by the ghosting. Moreover, registration is a computationally expen- 123 International Journal of Computer Assisted Radiology and Surgery (2018) 13:917–924 919 sive technique, making this approach unsuitable for real-time ing set (70%), validation set (15%), and test set (15%). Each purposes. subset was created ensuring that colon and oesophagus tissue Another interesting approach to SISR is exemplar-based were equally represented. Data were acquired with 23 unique super-resolution (EBSR), which learns the correspondence probes of the same pCLE probe type. The SR models are spe- between low- and the high-resolution images. Thanks to the cific to the type of the probe but generic to the exact probe recent success of deep learning and Convolutional Neural being used. Thus, the models do not need to be retrained Networks (CNNs), EBSR methods currently represent the for probes of the same type. Another type of probe, such state-of-the-art for the SR task [1]. Although many research as needle-CLE (nCLE), would require a specifically trained groups have worked on deep-learning-based SR for natural model. nCLE and pCLE differ by the number of optical fibres images, and although CNNs are currently widely used in and the design of the distal optics. various medical imaging problems [11], only recently have “Pseudo-ground-truth image estimation based on video CNNs been used for SR in medical imaging. Noteworthy is registration” section explains how the pseudo-ground-truth the work proposed in [12] that attempt to improve the quality HR images were generated. “Generation of realistic syn- of magnetic resonance images. thetic pCLE data” section describes our proposed simulation The behaviour of CNNs, especially in the context of SR, is framework to generate synthetic LR (LR ) images from syn original LR (LR ) images. “Implementation details” sec- strongly driven by the choice of a loss function, and the most org popular one is mean squared error (MSE) [16]. Although tion presents the pre-processing steps needed for standard- MSE as a loss function steers the SR models towards the ising the input images and details the implementation of the reconstitution of HR images with high peak signal-to-noise super-resolution CNNs used in this study. ratios, this does not necessarily mean that the final images will provide a good quality perception. A model trained with Pseudo-ground-truth image estimation based on a selective loss function involving a Generative Adversarial video registration Network for Image Super-Resolution (SRGAN) was pro- posed by Ledig et al. [7]. The authors designed an adversarial To compensate for the lack of ground-truth HR pCLE data, loss to classify HR images into SR images and ground-truth a registration-based mosaicking technique [13] was used to HR images. Based on a MOS study, the authors showed that estimate HR images. Mosaicking acts as a classical SR tech- the participants perceived the quality of the restored HR nique and fuses several registered input frames by averaging images as higher compared to the image quality measured the temporal information. The mosaics were generated for only by a PSNR. the entire Smart Atlas database and used as a source of HR Another critical issue with deep CNNs is the conver- frames. gence speed. Several solutions, such as using a very high Since mosaicking generates a single large field-of-view learning rate for network training [5], and removing batch- mosaic image from a collection of input LR images, it does normalisation modules [8] were proposed to tackle this issue. not directly provide a matched HR image for each LR input. To circumvent this, we used the mosaic-to-image diffeomor- Materials and methods phic spatial transformation resulting from the mosaicking The Smart Atlas database [2], a collection of 238 anonymised process to propagate and crop the fused information from pCLE video sequences of the colon and oesophagus, is used the mosaic back into each input LR image space. The image in this study. The database was split into three subsets: train- sequences resulting from this method are regarded as esti- Fibre Bundle LR reconstrucon Cropping Registraon Fibre Posions LRSyn Mosaicking LR Images Fig. 1 Pipeline used to generate LR synthetic images 123 920 International Journal of Computer Assisted Radiology and Surgery (2018) 13:917–924 mates of HR frames. These estimates will be referred to as Delaunay-based linear interpolation was performed thereby HR in the text. leading to our final simulated LR images. The image quality of the mosaic image heavily depends LR and HR images were combined into two datasets: 1. on the accuracy of the underpinning registration which is a Original pCLE (pCLE ) built with pairs of LR taken org org difficult task. The corresponding pairs of LR and HR images from sequences of Smart Atlas database and HR images, and generated by the proposed registration-based method suffer 2. synthetic pCLE (pCLE ) built by replacing the LR org syn from artefacts, which can hinder the training of the EBSR images with LR images. syn models (Fig. 1). Specifically, it can be observed that alignment inaccura- Implementation details cies occurring during mosaicking were a source of ghosting artefacts which in combination with residual misalignments The datasets were pre-processed in three steps. First, intensity between the LR and HR images, creates unsuitable data for values were normalised: LR = LR − mean /std and HR = lr lr the training. Sequences with obvious artefacts were manually HR−mean /std . Second, pixels values were scaled of every lr lr discarded. However, even on this selected dataset, training frame individually in the range [0–1]. Third, non-overlapping issues were observed. To address these, we simulated LR-HR patches of 64×64 pixels were extracted for the training phase, image pairs for training EBSR algorithms while leveraging considering only pixels in the pCLE Field of View (FoV). the registration-based HR images as realistic HR images. A stochastic patch-based training was used for training the networks, with a minibatch of size 54 patches to fit into the GPU memory (12 GB). Generation of realistic synthetic pCLE data Models were trained with patches from the training set. The patches from the validation set were used to monitor Currently available pCLE images are reconstructed from the loss during training with the purpose to avoid overfitting. scattered fibre signal. Every fibre in the bundle acts as Since all the considered networks are fully convolutional, the a single-pixel detector. To reconstruct pCLE images on a test images were processed full size and no patch processing Cartesian grid, Delaunay triangulation and piecewise linear is required during the inference phase. interpolation are used. The simulation framework developed Three CNNs networks for SR were used: sparse-coding- in this study mimics the standard pCLE reconstruction algo- based FSRCNN [3], residual-based EDSR [8], and generative rithm and starts by assigning to each fibre the average of the adversarial network SRGAN [7]. Every model was trained signal from seven neighbouring pixels [6]. In the standard with the two datasets presented in “Generation of realistic reconstruction algorithm, the fibre signal, which includes synthetic pCLE data” section. noise, is then interpolated. Similarly, noise was added to the MSE is the most commonly used loss function for SR. simulated data to produce realistic images and avoid creating Zhao et al. [16] showed that MSE has two limitations: a wide domain gap between real and simulated pCLE images. it does not converge to the global minimum and produces Despite some misalignment artefacts, the registration- blocky artefacts. In addition to demonstrating that L1 loss based generation of HR presented in “Pseudo-ground-truth outperforms L2, the authors also introduced a new loss func- image estimation based on video registration” section pro- tion SSIM + L1 by incorporating the Structural Similarity duces images with fine details and a high signal-to-noise (SSIM) [15]. FSRCNN and EDSR were trained considering ratio. Our simulation framework uses these HR and produces independently both L1 and SSIM + L1 to investigate their simulated LR images with a perfect alignment. applicability for our data based on a quantitative compari- The proposed simulation framework relies on observed son. irregular fibre arrangements and corresponding Voronoi dia- grams. Each fibre signal was extracted from an HR image, by averaging the HR pixel values within the corresponding Voronoi cell. Results To replicate realistic noise patterns on the simulated LR images, additive and multiplicative Gaussian noise (a and Acknowledging the lack of proper ground truth for super- m respectively) is added to the extracted fibre signals fs resolution of pCLE and the ambiguous nature of estab- to obtain a noisy fibre signal nfs as: nfs = (1 + m). ∗ lished IQA metrics, a three-stage approach was designed fs + a. The standard deviation of the noise distributions for the evaluation of the proposed method using the three was tuned based on visual similarity between LR and SR architectures considered in “Materials and methods” org LR and between their histograms. Sigma values were section. syn 0.05 and 0.01 * (max fs − min fs) for multiplicative and The first stage, presented in “Experiments on synthetic additive Gaussian distribution, respectively. In the last step, data” section and relying on the quantitative assessment, 123 International Journal of Computer Assisted Radiology and Surgery (2018) 13:917–924 921 Table 1 Quantitative results obtained on full-size images from the test set for different training and testing strategies The best results for each section are highlighted in bold demonstrates the applicability of EBSR for pCLE in the Experiments on synthetic data ideal synthetic case where ground-truth is available. In this quantitative stage, the inadequacy of the existing video- In the first experiment, synthetic data are used to demon- registration-based high-resolution images as a ground truth strate that our models work in the ideal situation where for EBSR training purpose is demonstrated. ground truth is available. The first section of Table 1 shows The second stage, presented in “Experiments on original the scores obtained when the SR models are trained on data” section, focuses on the quantitative assessment of our pCLE and tested on LR . Here, it is evident that the syn syn methods in the context of real input images and on the eval- EDSR and FSRCNN trained with SSIM + L1 obtain a sub- uation of our best model against other state-of-the-art SISR stantial improvement on the different quality factors with methods. respect to the LR image. More specifically, in comparison In the third stage, performed to overcome the limitations with the initial LR image, the SSIM was increased by + 0.06 of the quantitative assessment, a MOS study was carried out when EDSR is used and by + 0.05 when FSRCNN is used. by recruiting nine independent experts, having 1–10 years of These approaches also yield a GCF value that is very close to experience working with pCLE images. the GCF in HR and an improvement of + 0.32 and + 0.36 in the GCF with respect to LR images. Statistical significance of these improvements was assessed with a paired t test (p value less than 0.0001). From this experiment, it is possible to Quantitative analysis conclude that the proposed solution is capable of performing SR reconstruction when the models are trained on synthetic For the quantitative analysis, the SR images were examined data with no domain gap at test time. exploiting two complementary metrics: (i) SSIM to evaluate the similarity between the SR image and the HR, and (ii) Global Contrast Factor (GCF) [9] as a reference-free metric Experiments on original data for measuring image contrast which is one of the key char- acteristic of image quality in our context. Analysing both When real images are considered, the same conclusions can- SSIM and GCF in combination leads to a more robust evalu- not be reached. The results obtained by training on pCLE org ation. SSIM alone cannot be depended on when the reference and testing on LR are reported in the second section of org image is unreliable, while improvements in GCF alone can Table 1, and here it is evident that all the different quality be achieved deceitfully for example by adding a large amount factors decrease. The best approach is the FSRCNN trained of noise. using SSIM + L1 as loss function. With respect to the previ- Using these metrics, six scores for each SR method were ous case this approach loses 0.04 on the SSIM, and 0.12 on extracted: mean and standard deviation of (i) SSIM between the  GCF with LR. This leads to a final reduction of 0.14 for SR and HR, (ii) GCF differences between SR and LR and the Tot score. In this scenario, the deterioration of SSIM and cs (iii) GCF differences between SR and the HR. Finally, to GCF compared to the previous synthetic case can be due to determine which approach performs better, a composite score the use of inadequate HR images during the training (i.e. mis- Tot obtained by averaging the normalised value of SSIM alignment during the fusion, lack of compensation for motion cs with the normalised GCF difference between SR and LR deformations, etc.). Better results are instead obtained when was defined. Both factors are re-scaled to the range [0–1]. In the SR models performed on LR images are trained using org our quantitative assessment, the score obtained by the initial the pCLE (last section of Table 1). Here, the quality fac- syn LR was considered as baseline reference. tors increased when compared to the previous case, although org 123 922 International Journal of Computer Assisted Radiology and Surgery (2018) 13:917–924 SR Specifically, in this experiment a Wiener deconvolution, a variational Bayesian inference approach with sparse and pCLEsyn non-sparse priors [14], the SRGAN and EDSR networks pretrained on natural images were considered. The Wiener deconvolution was assumed to have a Gaussian point-spread function with the parameter σ = 2 estimated experimen- tally from the training set. Finally, the last column of Table 2 includes the results of a contrast-enhancement approach obtained by sharpening the input with parameters similarly pCLEorg tuned on the trained set. Although our approach is not con- sistently outperforming the other on each individual quality score, when the combined score Tot is considered, our cs method outperforms the others by a large margin. Semi-quantitative analysis (MOS) To perform the MOS, nine independent experts were asked to evaluate 46 images each. Full-size LR were selected org Fig. 2 Example of SR images obtained when pCLE and pCLE syn org randomly from test set of pCLE , and used to generate org are used for train and test. From top to the bottom, the images in the SR reconstructions. At each step, the SR images obtained middle represent the SR image obtained when: (i) pCLE are used for syn by the three different methods (SRGAN, FSRCNN and train and test, (ii) pCLE are used for train, and the pCLE are used syn org EDSR) trained on synthetic data and a contrast-enhancement for test, and (iii) pCLE are used for train and test org obtained by sharpening the input (used as a baseline) are shown to the user, in a randomly shuffled order. The input they do not overcome the results obtained when the approach and the HR are also displayed on the screen as references for the participants. For each of the four images, the user assigns is trained and tested on synthetic data. EDSR, in particular, has a Tot score of 0.65 that is 0.08 better than the best a score between 1 (strongly disagree) to 5 (strongly agree) cs approach trained on pCLE (the second section of Table 1) on three different questions: org and 0.06 worse than the best approach trained and tested on pCLE (first section of Table 1). The GCF obtained here – Q1: Are there any artefacts/noise in the image? syn are in general much better when compared to the previous – Q2: Can you see an improvement in contrast with respect two cases. An example of the visual results from the dif- to the input? ferent training modalities is shown in Fig. 2. In conclusion, – Q3: Can you see an improvement in the details with our findings suggest that existing video-registration-based respect to the input? approaches are inadequate to serve as a ground truth for HR images, while EBSR approaches, such as the EDSR and FSR- To make sure that the questions were correctly interpreted, CNN, when trained on synthetic data, can produce SR images each participant received a short training before starting the that enhance the quality of the LR images. study. The results on the MOS are shown in Fig. 3.EDSR Due to our conclusions, the MOS study was performed is the approach that achieves the best performance on Q2 using images obtained from the models trained only with and Q3. Instead based on Q1, both FRSCNN and EDSR do synthetic data. not introduce a significant amount of artefact or noise. The To further validate our methodology, in Table 2, the results results of the MOS give us one more indication, which our obtained by the best model of our approach (EDSR trained training methodology allows improvements on the quality of on synthetic data with SSIM + L1 as loss function) were the pCLE images. In Fig. 4 is shown a few examples of the compared against other state-of-the-art SISR methodologies. obtained SR images using our proposed methodology. Table 2 Results of the proposed approach against state-of-the-art methods The best results for each section are highlighted in bold 123 International Journal of Computer Assisted Radiology and Surgery (2018) 13:917–924 923 Contrast Q3: Details Q1: Noise/Artefacts Q2: Contrast enhancement FSRCNN Strongly Agree EDSR SRGAN Neutral 3 Strongly Disagree Fig. 3 Results of the MOS using a contrast-enhancement approach, FSRCNN, EDSR and SRGAN. The plots report the results on the three different questions Fig. 4 Example of visual results INPUT SRGAN EDSR FSRCNN from the proposed approaches: Input (left), SRGAN (middle left), EDSR (middle) and FSRCNN (middle right) HR (right) Discussion and conclusions registration method, and simulating realistic LR image based on physical model of pCLE acquisition is proposed. This work addresses the challenge of super-resolution for The conclusions are that synthetic pCLE data can be pCLE images. This is the first work to evaluate the potential of used to train CNNs while applying them to real scenario deep learning and exemplar-based super-resolution in pCLE data because of a physically inspired simulation process context. that reduces the domain gap between real and simulated The main contribution of this work is to overcome the images. challenge of lack of ground-truth data. A novel methodology The robust IQA test based on the Structural Similarity to produce pseudo-ground-truth exploiting an existing video (SSIM) and global contrast factor (GCF) score confirmed 123 924 International Journal of Computer Assisted Radiology and Surgery (2018) 13:917–924 the improvement of obtained results in respects to the input (CVPRW), pp 1122–1131. https://doi.org/10.1109/CVPRW.2017. image. An analysis of perceptual quality of images with a 2. André B, Vercauteren T, Buchner AM, Wallace MB, Ayache N Mean Opinion Score (MOS) study recruiting nine indepen- (2011) A smart atlas for endomicroscopy using automated video dent pCLE experts showed that SR models give clinically retrieval. Med Image Anal 15(4):460–476 interesting results. Experts perceived an improvement in 3. Dong C, Loy CC, Tang X (2016) Accelerating the super-resolution convolutional neural network. In: European conference on com- the quality of the reconstructed images with respect to the puter vision, Springer, pp 391–407 input image without noting a significant increase in the 4. Fugazza A, Gaiani F, Carra MC, Brunetti F, Lévy M, Sobhani I, amount of noise and artefacts. The quantitative and semi- Azoulay D, Catena F, de’Angelis GL, de’Angelis N (2016) Confo- quantitative user perception analysis provided consistent cal laser endomicroscopy in gastrointestinal and pancreatobiliary diseases: a systematic review and meta-analysis. BioMed Res Int conclusions. 2016:1–31. https://doi.org/10.1155/2016/4638683 Providing a better quality of pCLE images might improve 5. Kim J, Kwon Lee J, Mu Lee K (2016) Accurate image super- the decision process during the endoscopic examination. resolution using very deep convolutional networks. In: Proceedings Further evaluation will focus on the temporal consistency of the IEEE conference on computer vision and pattern recognition, pp 1646–1654 of the super-resolution and will rely on histopathological 6. Le Goualher G, Perchant A, Genet M, Cavé C, Viellerobe B, Berier confirmation to validate the authenticity of the generated F, Abrat B, Ayache N (2004) Towards optical biopsies with an details. integrated fibered confocal fluorescence microscope. Med Image Comput Comput Assist Interv MICCAI 2004:761–768 Acknowledgements The authors would like to thank the High Dimen- 7. Ledig C, Theis L, Huszar F, Caballero J, Cunningham A, Acosta sional Neurology group, Institute of Neurology, UCL for provide A, Aitken A, Tejani A, Totz J, Wang Z et al. (2017) Photo-realistic computational support. The authors would like to thank the indepen- single image super-resolution using a generative adversarial net- dent experts at Mauna Kea Technologies for participating to the MOS work. In: Proceedings of the IEEE conference on computer vision survey. and pattern recognition, pp 4681–4690 8. Lim B, Son S, Kim H, Nah S, Lee KM (2017) Enhanced deep residual networks for single image super-resolution. In: The IEEE Funding This work was supported by Wellcome/EPSRC [203145Z/ conference on computer vision and pattern recognition (CVPR) 16/Z; NS/A000050/1; WT101957; NS/A000027/1; EP/N027078/1]. workshops This work was undertaken at UCL and UCLH, which receive a propor- 9. Matkovic K, Neumann L, Neumann A, Psik T, Purgathofer W tion of funding from the DoH NIHR UCLH BRC funding scheme. The (2005) Global contrast factor—a new approach to image contrast. PhD studentship of Agnieszka Barbara Szczotka is funded by Mauna Comput Aesthet 2005:159–168 Kea Technologies, Paris, France. 10. Park SC, Park MK, Kang MG (2003) Super-resolution image reconstruction: a technical overview. IEEE Signal Process Mag 20(3):21–36 Compliance with ethical standards 11. Ravì D, Wong C, Deligianni F, Berthelot M, Andreu-Perez J, Lo B, Yang GZ (2017) Deep learning for health informatics. IEEE J Biomed Health Inf 21(1):4–21 Conflict of interest The PhD studentship of Agnieszka Barbara Szc- 12. Tanno R, Ghosh A, Grussu F, Kaden E, Criminisi A, Alexan- zotka is funded by Mauna Kea Technologies, Paris, France. Tom der DC (2017) Bayesian image quality transfer. In: International Vercauteren owns stock in Mauna Kea Technologies, Paris, France. conference on medical image computing and computer-assisted The other authors declare no conflict of interest. intervention, Springer, pp 265–273 13. Vercauteren T, Perchant A, Malandain G, Pennec X, Ayache N Ethical approval All procedures performed in studies involving human (2006) Robust mosaicing with correction of motion distortions and participants were in accordance with the ethical standards of the insti- tissue deformations for in vivo fibered microscopy. Med Image tutional and/or national research committee and with the 1964 Helsinki Anal 10(5):673–692 declaration and its later amendments or comparable ethical standards. 14. Villena S, Vega M, Babacan SD, Molina R, Katsaggelos AK (2013) Bayesian combination of sparse and non-sparse priors in image Informed consent For this type of study, formal consent is not required. super resolution. Digit Signal Process 23(2):530–541 This article does not contain patient data. 15. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image qual- ity assessment: from error visibility to structural similarity. IEEE Open Access This article is distributed under the terms of the Creative Trans Image Process 13(4):600–612 Commons Attribution 4.0 International License (http://creativecomm 16. Zhao H, Gallo O, Frosio I, Kautz J (2015) Loss functions for ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, image restoration with neural networks. IEEE Trans Comput Imag and reproduction in any medium, provided you give appropriate credit 3(1):47–57. https://doi.org/10.1109/TCI.2016.2644865 to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. References 1. Agustsson E, Timofte R (2017) NTIRE 2017 challenge on single image super-resolution: dataset and study. In: 2017 IEEE con- ference on computer vision and pattern recognition workshops http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Computer Assisted Radiology and Surgery Springer Journals

Effective deep learning training for single-image super-resolution in endomicroscopy exploiting video-registration-based reconstruction

Free
8 pages

Loading next page...
 
/lp/springer_journal/effective-deep-learning-training-for-single-image-super-resolution-in-KUb2txEzHR
Publisher
Springer Journals
Copyright
Copyright © 2018 by The Author(s)
Subject
Medicine & Public Health; Imaging / Radiology; Surgery; Health Informatics; Computer Imaging, Vision, Pattern Recognition and Graphics; Computer Science, general
ISSN
1861-6410
eISSN
1861-6429
D.O.I.
10.1007/s11548-018-1764-0
Publisher site
See Article on Publisher Site

Abstract

Purpose Probe-based confocal laser endomicroscopy (pCLE) is a recent imaging modality that allows performing in vivo optical biopsies. The design of pCLE hardware, and its reliance on an optical fibre bundle, fundamentally limits the image quality with a few tens of thousands fibres, each acting as the equivalent of a single-pixel detector, assembled into a single fibre bundle. Video registration techniques can be used to estimate high-resolution (HR) images by exploiting the temporal information contained in a sequence of low-resolution (LR) images. However, the alignment of LR frames, required for the fusion, is computationally demanding and prone to artefacts. Methods In this work, we propose a novel synthetic data generation approach to train exemplar-based Deep Neural Networks (DNNs). HR pCLE images with enhanced quality are recovered by the models trained on pairs of estimated HR images (generated by the video registration algorithm) and realistic synthetic LR images. Performance of three different state-of-the- art DNNs techniques were analysed on a Smart Atlas database of 8806 images from 238 pCLE video sequences. The results were validated through an extensive image quality assessment that takes into account different quality scores, including a Mean Opinion Score (MOS). Results Results indicate that the proposed solution produces an effective improvement in the quality of the obtained recon- structed image. Conclusion The proposed training strategy and associated DNNs allows us to perform convincing super-resolution of pCLE images. Keywords Example-based super-resolution · Deep learning · Probe-based confocal laser endomicroscopy · Mosaicking Introduction Probe-based confocal laser endomicroscopy (pCLE) is a Daniele Ravì and Agnieszka Barbara Szczotka have contributed equally state-of-the-art imaging system used in clinical practice for to this work. in situ and real time in vivo optical biopsy. In particular, Electronic supplementary material The online version of this article recent works using Cellvizio (Mauna Kea Technologies, (https://doi.org/10.1007/s11548-018-1764-0) contains supplementary France) have demonstrated the impact of introducing pCLE material, which is available to authorized users. as a new imaging modality for the diagnostics procedures B Agnieszka Barbara Szczotka of conditions such as pancreatic cystic tumours and the agnieszka.szczotka.15@ucl.ac.uk surveillance of Barrett’s oesophagus [4]. pCLE is a recent Daniele Ravì imaging modality in gastrointestinal and pancreaticobiliary d.ravi@ucl.ac.uk diseases [4]. Dzhoshkun Ismail Shakir d.shakir@ucl.ac.uk Wellcome/EPSRC Centre for Interventional and Surgical Stephen P. Pereira Sciences, University College London, London, UK stephen.pereira@ucl.ac.uk UCL Institute for Liver and Digestive Health, University Tom Vercauteren College London, London, UK t.vercauteren@ucl.ac.uk 123 918 International Journal of Computer Assisted Radiology and Surgery (2018) 13:917–924 The authors of [4] have shown that despite clear clinical reconstruction procedure that maps the tissue signal from the benefits of pCLE, improving its specificity and sensitivity irregular fibre grid to the Cartesian grid. would help it become a routine diagnostic tool. Specificity The contribution of this work is threefold. First, three deep and sensitivity are directly dependent on the quality of the learning models for SISR are examined on the pCLE data. pCLE images. Therefore, increasing the resolution of these Second, to overcome the problem of the lack of ground-truth images might bring a more reliable source of information and low-resolution (LR)/HR image pairs for training purposes, improve pCLE diagnosis. a novel pipeline to generate pseudo-ground-truth data by Certainly, the key point of pCLE is its suitability for real- leveraging an existing video registration technique [13]is time and intraoperative usage. Having high-quality images in proposed. Third, in the absence of a reference HR ground real time potentially allows for better pCLE interpretability. truth, to assess the clinical validity of our approach, a Mean Thus, offline processing would not fit in the standard clinical Opinion Score (MOS) study was conducted with nine experts work-flow required in this context. (1–10 years of experience) each assessing 46 images accord- The trend for image sensor manufacturers is to increase the ing to three different criteria. To our knowledge, this is the resolution, as apparent in the current move to high-definition first research work to address the challenge of SISR recon- endoscopic detectors. Recently introduced 4K endoscopes struction for pCLE images based on deep learning, generate provide 8M pixels, a difference to pCLE of 2-to-3 orders pCLE pseudo-ground-truth data for training of EBSR mod- of magnitude. In pCLE, reliance on an imaging guide—an els and demonstrate that pseudo-ground-truth trained models optical fibre bundle, composed of a few tens of thousands of provide convincing SR reconstruction. optical fibres, each acting as the equivalent of a single-pixel The rest of the paper is organised as follows. “Related detector—fundamentally limits the image quality. These work” section presents the state of the art for SISR with nat- fibres are irregularly positioned in the bundle which implies ural images. “Materials and methods” section presents the that tissue signal is a collection of pixels sampled on an proposed training methodology based on realistic pseudo- irregular grid. Hence, a reconstruction procedure is needed ground-truth generation and detail the implementation of for mapping the irregular samples to a Cartesian image. the SISR models. “Results” section gives information on the Other factors that reduce pCLE image quality are cross-talk evaluation of our approach using a quantitative image quality among neighbouring fibres and limited signal-to-noise ratio. assessment (IQA) and a MOS study. “Discussion and conclu- All these factors lead to the generation of images with arte- sions” section summarises the contribution of this research facts, noise, relatively low contrast and resolution. This work to pCLE SISR. proposes a software-based resolution augmentation method which is more agile and simpler to implement than hardware engineering solutions. Related work Building on from the idea that high-resolution (HR) images are desired, this study explores advanced single- Super-resolution (SR) has received a lot of interest from the image super-resolution (SISR) techniques which can con- computer vision community in the recent decades [10]. Initial tribute to effective improvement in image quality. Although SR approaches were based on single-image super-resolution SISR for natural images is a relatively mature field, this work (SISR) and exploited signal processing techniques applied is the first attempt to translate these solutions into the pCLE to the input image. An alternative to SISR is multi-frame context. Beyond SISR, video registration technique [13]have image super-resolution based on the idea that HR image can been proposed to increase the resolution of pCLE. Such meth- be reconstructed by fusing many LR images together. Ideally, ods provide a baseline super-resolution technique, but suffers the combination of several LR image sources enriches the from artefact and are computationally too expensive to be information content of the reconstructed HR image and con- applied in real time. Because of the recent success of deep tributes to improving its quality. Registration can be used to learning for SISR on natural images [1], this work focuses merge LR images acquired at slightly shifted field-of-views on exemplar-based super-resolution (EBSR) deep learning into a unified HR image. techniques. However, the translation of these methods to the In the specific context of pCLE, the work proposed by pCLE domain is not straightforward, notably due to the lack Vercauteren et al. [13] presents a video registration algorithm of ground-truth HR images required for the training. There that, in some cases, can improve spatial information of the is indeed no equivalent imaging device capable of producing reconstructed pCLE image, and reveals details which were higher-resolution endomicroscopic imaging, nor any robust not visible initially. The quality of the registration is a key and highly accurate means of spatially matching microscopic step to the success of the SR reconstruction, but the alignment images acquired across scales with different devices. Further- of images captured at different times is not trivial. Misalign- more, in comparison with natural images, currently available ment leads to incorrect fusion and generates artefacts such as pCLE images suffer from specific artefacts introduced by the ghosting. Moreover, registration is a computationally expen- 123 International Journal of Computer Assisted Radiology and Surgery (2018) 13:917–924 919 sive technique, making this approach unsuitable for real-time ing set (70%), validation set (15%), and test set (15%). Each purposes. subset was created ensuring that colon and oesophagus tissue Another interesting approach to SISR is exemplar-based were equally represented. Data were acquired with 23 unique super-resolution (EBSR), which learns the correspondence probes of the same pCLE probe type. The SR models are spe- between low- and the high-resolution images. Thanks to the cific to the type of the probe but generic to the exact probe recent success of deep learning and Convolutional Neural being used. Thus, the models do not need to be retrained Networks (CNNs), EBSR methods currently represent the for probes of the same type. Another type of probe, such state-of-the-art for the SR task [1]. Although many research as needle-CLE (nCLE), would require a specifically trained groups have worked on deep-learning-based SR for natural model. nCLE and pCLE differ by the number of optical fibres images, and although CNNs are currently widely used in and the design of the distal optics. various medical imaging problems [11], only recently have “Pseudo-ground-truth image estimation based on video CNNs been used for SR in medical imaging. Noteworthy is registration” section explains how the pseudo-ground-truth the work proposed in [12] that attempt to improve the quality HR images were generated. “Generation of realistic syn- of magnetic resonance images. thetic pCLE data” section describes our proposed simulation The behaviour of CNNs, especially in the context of SR, is framework to generate synthetic LR (LR ) images from syn original LR (LR ) images. “Implementation details” sec- strongly driven by the choice of a loss function, and the most org popular one is mean squared error (MSE) [16]. Although tion presents the pre-processing steps needed for standard- MSE as a loss function steers the SR models towards the ising the input images and details the implementation of the reconstitution of HR images with high peak signal-to-noise super-resolution CNNs used in this study. ratios, this does not necessarily mean that the final images will provide a good quality perception. A model trained with Pseudo-ground-truth image estimation based on a selective loss function involving a Generative Adversarial video registration Network for Image Super-Resolution (SRGAN) was pro- posed by Ledig et al. [7]. The authors designed an adversarial To compensate for the lack of ground-truth HR pCLE data, loss to classify HR images into SR images and ground-truth a registration-based mosaicking technique [13] was used to HR images. Based on a MOS study, the authors showed that estimate HR images. Mosaicking acts as a classical SR tech- the participants perceived the quality of the restored HR nique and fuses several registered input frames by averaging images as higher compared to the image quality measured the temporal information. The mosaics were generated for only by a PSNR. the entire Smart Atlas database and used as a source of HR Another critical issue with deep CNNs is the conver- frames. gence speed. Several solutions, such as using a very high Since mosaicking generates a single large field-of-view learning rate for network training [5], and removing batch- mosaic image from a collection of input LR images, it does normalisation modules [8] were proposed to tackle this issue. not directly provide a matched HR image for each LR input. To circumvent this, we used the mosaic-to-image diffeomor- Materials and methods phic spatial transformation resulting from the mosaicking The Smart Atlas database [2], a collection of 238 anonymised process to propagate and crop the fused information from pCLE video sequences of the colon and oesophagus, is used the mosaic back into each input LR image space. The image in this study. The database was split into three subsets: train- sequences resulting from this method are regarded as esti- Fibre Bundle LR reconstrucon Cropping Registraon Fibre Posions LRSyn Mosaicking LR Images Fig. 1 Pipeline used to generate LR synthetic images 123 920 International Journal of Computer Assisted Radiology and Surgery (2018) 13:917–924 mates of HR frames. These estimates will be referred to as Delaunay-based linear interpolation was performed thereby HR in the text. leading to our final simulated LR images. The image quality of the mosaic image heavily depends LR and HR images were combined into two datasets: 1. on the accuracy of the underpinning registration which is a Original pCLE (pCLE ) built with pairs of LR taken org org difficult task. The corresponding pairs of LR and HR images from sequences of Smart Atlas database and HR images, and generated by the proposed registration-based method suffer 2. synthetic pCLE (pCLE ) built by replacing the LR org syn from artefacts, which can hinder the training of the EBSR images with LR images. syn models (Fig. 1). Specifically, it can be observed that alignment inaccura- Implementation details cies occurring during mosaicking were a source of ghosting artefacts which in combination with residual misalignments The datasets were pre-processed in three steps. First, intensity between the LR and HR images, creates unsuitable data for values were normalised: LR = LR − mean /std and HR = lr lr the training. Sequences with obvious artefacts were manually HR−mean /std . Second, pixels values were scaled of every lr lr discarded. However, even on this selected dataset, training frame individually in the range [0–1]. Third, non-overlapping issues were observed. To address these, we simulated LR-HR patches of 64×64 pixels were extracted for the training phase, image pairs for training EBSR algorithms while leveraging considering only pixels in the pCLE Field of View (FoV). the registration-based HR images as realistic HR images. A stochastic patch-based training was used for training the networks, with a minibatch of size 54 patches to fit into the GPU memory (12 GB). Generation of realistic synthetic pCLE data Models were trained with patches from the training set. The patches from the validation set were used to monitor Currently available pCLE images are reconstructed from the loss during training with the purpose to avoid overfitting. scattered fibre signal. Every fibre in the bundle acts as Since all the considered networks are fully convolutional, the a single-pixel detector. To reconstruct pCLE images on a test images were processed full size and no patch processing Cartesian grid, Delaunay triangulation and piecewise linear is required during the inference phase. interpolation are used. The simulation framework developed Three CNNs networks for SR were used: sparse-coding- in this study mimics the standard pCLE reconstruction algo- based FSRCNN [3], residual-based EDSR [8], and generative rithm and starts by assigning to each fibre the average of the adversarial network SRGAN [7]. Every model was trained signal from seven neighbouring pixels [6]. In the standard with the two datasets presented in “Generation of realistic reconstruction algorithm, the fibre signal, which includes synthetic pCLE data” section. noise, is then interpolated. Similarly, noise was added to the MSE is the most commonly used loss function for SR. simulated data to produce realistic images and avoid creating Zhao et al. [16] showed that MSE has two limitations: a wide domain gap between real and simulated pCLE images. it does not converge to the global minimum and produces Despite some misalignment artefacts, the registration- blocky artefacts. In addition to demonstrating that L1 loss based generation of HR presented in “Pseudo-ground-truth outperforms L2, the authors also introduced a new loss func- image estimation based on video registration” section pro- tion SSIM + L1 by incorporating the Structural Similarity duces images with fine details and a high signal-to-noise (SSIM) [15]. FSRCNN and EDSR were trained considering ratio. Our simulation framework uses these HR and produces independently both L1 and SSIM + L1 to investigate their simulated LR images with a perfect alignment. applicability for our data based on a quantitative compari- The proposed simulation framework relies on observed son. irregular fibre arrangements and corresponding Voronoi dia- grams. Each fibre signal was extracted from an HR image, by averaging the HR pixel values within the corresponding Voronoi cell. Results To replicate realistic noise patterns on the simulated LR images, additive and multiplicative Gaussian noise (a and Acknowledging the lack of proper ground truth for super- m respectively) is added to the extracted fibre signals fs resolution of pCLE and the ambiguous nature of estab- to obtain a noisy fibre signal nfs as: nfs = (1 + m). ∗ lished IQA metrics, a three-stage approach was designed fs + a. The standard deviation of the noise distributions for the evaluation of the proposed method using the three was tuned based on visual similarity between LR and SR architectures considered in “Materials and methods” org LR and between their histograms. Sigma values were section. syn 0.05 and 0.01 * (max fs − min fs) for multiplicative and The first stage, presented in “Experiments on synthetic additive Gaussian distribution, respectively. In the last step, data” section and relying on the quantitative assessment, 123 International Journal of Computer Assisted Radiology and Surgery (2018) 13:917–924 921 Table 1 Quantitative results obtained on full-size images from the test set for different training and testing strategies The best results for each section are highlighted in bold demonstrates the applicability of EBSR for pCLE in the Experiments on synthetic data ideal synthetic case where ground-truth is available. In this quantitative stage, the inadequacy of the existing video- In the first experiment, synthetic data are used to demon- registration-based high-resolution images as a ground truth strate that our models work in the ideal situation where for EBSR training purpose is demonstrated. ground truth is available. The first section of Table 1 shows The second stage, presented in “Experiments on original the scores obtained when the SR models are trained on data” section, focuses on the quantitative assessment of our pCLE and tested on LR . Here, it is evident that the syn syn methods in the context of real input images and on the eval- EDSR and FSRCNN trained with SSIM + L1 obtain a sub- uation of our best model against other state-of-the-art SISR stantial improvement on the different quality factors with methods. respect to the LR image. More specifically, in comparison In the third stage, performed to overcome the limitations with the initial LR image, the SSIM was increased by + 0.06 of the quantitative assessment, a MOS study was carried out when EDSR is used and by + 0.05 when FSRCNN is used. by recruiting nine independent experts, having 1–10 years of These approaches also yield a GCF value that is very close to experience working with pCLE images. the GCF in HR and an improvement of + 0.32 and + 0.36 in the GCF with respect to LR images. Statistical significance of these improvements was assessed with a paired t test (p value less than 0.0001). From this experiment, it is possible to Quantitative analysis conclude that the proposed solution is capable of performing SR reconstruction when the models are trained on synthetic For the quantitative analysis, the SR images were examined data with no domain gap at test time. exploiting two complementary metrics: (i) SSIM to evaluate the similarity between the SR image and the HR, and (ii) Global Contrast Factor (GCF) [9] as a reference-free metric Experiments on original data for measuring image contrast which is one of the key char- acteristic of image quality in our context. Analysing both When real images are considered, the same conclusions can- SSIM and GCF in combination leads to a more robust evalu- not be reached. The results obtained by training on pCLE org ation. SSIM alone cannot be depended on when the reference and testing on LR are reported in the second section of org image is unreliable, while improvements in GCF alone can Table 1, and here it is evident that all the different quality be achieved deceitfully for example by adding a large amount factors decrease. The best approach is the FSRCNN trained of noise. using SSIM + L1 as loss function. With respect to the previ- Using these metrics, six scores for each SR method were ous case this approach loses 0.04 on the SSIM, and 0.12 on extracted: mean and standard deviation of (i) SSIM between the  GCF with LR. This leads to a final reduction of 0.14 for SR and HR, (ii) GCF differences between SR and LR and the Tot score. In this scenario, the deterioration of SSIM and cs (iii) GCF differences between SR and the HR. Finally, to GCF compared to the previous synthetic case can be due to determine which approach performs better, a composite score the use of inadequate HR images during the training (i.e. mis- Tot obtained by averaging the normalised value of SSIM alignment during the fusion, lack of compensation for motion cs with the normalised GCF difference between SR and LR deformations, etc.). Better results are instead obtained when was defined. Both factors are re-scaled to the range [0–1]. In the SR models performed on LR images are trained using org our quantitative assessment, the score obtained by the initial the pCLE (last section of Table 1). Here, the quality fac- syn LR was considered as baseline reference. tors increased when compared to the previous case, although org 123 922 International Journal of Computer Assisted Radiology and Surgery (2018) 13:917–924 SR Specifically, in this experiment a Wiener deconvolution, a variational Bayesian inference approach with sparse and pCLEsyn non-sparse priors [14], the SRGAN and EDSR networks pretrained on natural images were considered. The Wiener deconvolution was assumed to have a Gaussian point-spread function with the parameter σ = 2 estimated experimen- tally from the training set. Finally, the last column of Table 2 includes the results of a contrast-enhancement approach obtained by sharpening the input with parameters similarly pCLEorg tuned on the trained set. Although our approach is not con- sistently outperforming the other on each individual quality score, when the combined score Tot is considered, our cs method outperforms the others by a large margin. Semi-quantitative analysis (MOS) To perform the MOS, nine independent experts were asked to evaluate 46 images each. Full-size LR were selected org Fig. 2 Example of SR images obtained when pCLE and pCLE syn org randomly from test set of pCLE , and used to generate org are used for train and test. From top to the bottom, the images in the SR reconstructions. At each step, the SR images obtained middle represent the SR image obtained when: (i) pCLE are used for syn by the three different methods (SRGAN, FSRCNN and train and test, (ii) pCLE are used for train, and the pCLE are used syn org EDSR) trained on synthetic data and a contrast-enhancement for test, and (iii) pCLE are used for train and test org obtained by sharpening the input (used as a baseline) are shown to the user, in a randomly shuffled order. The input they do not overcome the results obtained when the approach and the HR are also displayed on the screen as references for the participants. For each of the four images, the user assigns is trained and tested on synthetic data. EDSR, in particular, has a Tot score of 0.65 that is 0.08 better than the best a score between 1 (strongly disagree) to 5 (strongly agree) cs approach trained on pCLE (the second section of Table 1) on three different questions: org and 0.06 worse than the best approach trained and tested on pCLE (first section of Table 1). The GCF obtained here – Q1: Are there any artefacts/noise in the image? syn are in general much better when compared to the previous – Q2: Can you see an improvement in contrast with respect two cases. An example of the visual results from the dif- to the input? ferent training modalities is shown in Fig. 2. In conclusion, – Q3: Can you see an improvement in the details with our findings suggest that existing video-registration-based respect to the input? approaches are inadequate to serve as a ground truth for HR images, while EBSR approaches, such as the EDSR and FSR- To make sure that the questions were correctly interpreted, CNN, when trained on synthetic data, can produce SR images each participant received a short training before starting the that enhance the quality of the LR images. study. The results on the MOS are shown in Fig. 3.EDSR Due to our conclusions, the MOS study was performed is the approach that achieves the best performance on Q2 using images obtained from the models trained only with and Q3. Instead based on Q1, both FRSCNN and EDSR do synthetic data. not introduce a significant amount of artefact or noise. The To further validate our methodology, in Table 2, the results results of the MOS give us one more indication, which our obtained by the best model of our approach (EDSR trained training methodology allows improvements on the quality of on synthetic data with SSIM + L1 as loss function) were the pCLE images. In Fig. 4 is shown a few examples of the compared against other state-of-the-art SISR methodologies. obtained SR images using our proposed methodology. Table 2 Results of the proposed approach against state-of-the-art methods The best results for each section are highlighted in bold 123 International Journal of Computer Assisted Radiology and Surgery (2018) 13:917–924 923 Contrast Q3: Details Q1: Noise/Artefacts Q2: Contrast enhancement FSRCNN Strongly Agree EDSR SRGAN Neutral 3 Strongly Disagree Fig. 3 Results of the MOS using a contrast-enhancement approach, FSRCNN, EDSR and SRGAN. The plots report the results on the three different questions Fig. 4 Example of visual results INPUT SRGAN EDSR FSRCNN from the proposed approaches: Input (left), SRGAN (middle left), EDSR (middle) and FSRCNN (middle right) HR (right) Discussion and conclusions registration method, and simulating realistic LR image based on physical model of pCLE acquisition is proposed. This work addresses the challenge of super-resolution for The conclusions are that synthetic pCLE data can be pCLE images. This is the first work to evaluate the potential of used to train CNNs while applying them to real scenario deep learning and exemplar-based super-resolution in pCLE data because of a physically inspired simulation process context. that reduces the domain gap between real and simulated The main contribution of this work is to overcome the images. challenge of lack of ground-truth data. A novel methodology The robust IQA test based on the Structural Similarity to produce pseudo-ground-truth exploiting an existing video (SSIM) and global contrast factor (GCF) score confirmed 123 924 International Journal of Computer Assisted Radiology and Surgery (2018) 13:917–924 the improvement of obtained results in respects to the input (CVPRW), pp 1122–1131. https://doi.org/10.1109/CVPRW.2017. image. An analysis of perceptual quality of images with a 2. André B, Vercauteren T, Buchner AM, Wallace MB, Ayache N Mean Opinion Score (MOS) study recruiting nine indepen- (2011) A smart atlas for endomicroscopy using automated video dent pCLE experts showed that SR models give clinically retrieval. Med Image Anal 15(4):460–476 interesting results. Experts perceived an improvement in 3. Dong C, Loy CC, Tang X (2016) Accelerating the super-resolution convolutional neural network. In: European conference on com- the quality of the reconstructed images with respect to the puter vision, Springer, pp 391–407 input image without noting a significant increase in the 4. Fugazza A, Gaiani F, Carra MC, Brunetti F, Lévy M, Sobhani I, amount of noise and artefacts. The quantitative and semi- Azoulay D, Catena F, de’Angelis GL, de’Angelis N (2016) Confo- quantitative user perception analysis provided consistent cal laser endomicroscopy in gastrointestinal and pancreatobiliary diseases: a systematic review and meta-analysis. BioMed Res Int conclusions. 2016:1–31. https://doi.org/10.1155/2016/4638683 Providing a better quality of pCLE images might improve 5. Kim J, Kwon Lee J, Mu Lee K (2016) Accurate image super- the decision process during the endoscopic examination. resolution using very deep convolutional networks. In: Proceedings Further evaluation will focus on the temporal consistency of the IEEE conference on computer vision and pattern recognition, pp 1646–1654 of the super-resolution and will rely on histopathological 6. Le Goualher G, Perchant A, Genet M, Cavé C, Viellerobe B, Berier confirmation to validate the authenticity of the generated F, Abrat B, Ayache N (2004) Towards optical biopsies with an details. integrated fibered confocal fluorescence microscope. Med Image Comput Comput Assist Interv MICCAI 2004:761–768 Acknowledgements The authors would like to thank the High Dimen- 7. Ledig C, Theis L, Huszar F, Caballero J, Cunningham A, Acosta sional Neurology group, Institute of Neurology, UCL for provide A, Aitken A, Tejani A, Totz J, Wang Z et al. (2017) Photo-realistic computational support. The authors would like to thank the indepen- single image super-resolution using a generative adversarial net- dent experts at Mauna Kea Technologies for participating to the MOS work. In: Proceedings of the IEEE conference on computer vision survey. and pattern recognition, pp 4681–4690 8. Lim B, Son S, Kim H, Nah S, Lee KM (2017) Enhanced deep residual networks for single image super-resolution. In: The IEEE Funding This work was supported by Wellcome/EPSRC [203145Z/ conference on computer vision and pattern recognition (CVPR) 16/Z; NS/A000050/1; WT101957; NS/A000027/1; EP/N027078/1]. workshops This work was undertaken at UCL and UCLH, which receive a propor- 9. Matkovic K, Neumann L, Neumann A, Psik T, Purgathofer W tion of funding from the DoH NIHR UCLH BRC funding scheme. The (2005) Global contrast factor—a new approach to image contrast. PhD studentship of Agnieszka Barbara Szczotka is funded by Mauna Comput Aesthet 2005:159–168 Kea Technologies, Paris, France. 10. Park SC, Park MK, Kang MG (2003) Super-resolution image reconstruction: a technical overview. IEEE Signal Process Mag 20(3):21–36 Compliance with ethical standards 11. Ravì D, Wong C, Deligianni F, Berthelot M, Andreu-Perez J, Lo B, Yang GZ (2017) Deep learning for health informatics. IEEE J Biomed Health Inf 21(1):4–21 Conflict of interest The PhD studentship of Agnieszka Barbara Szc- 12. Tanno R, Ghosh A, Grussu F, Kaden E, Criminisi A, Alexan- zotka is funded by Mauna Kea Technologies, Paris, France. Tom der DC (2017) Bayesian image quality transfer. In: International Vercauteren owns stock in Mauna Kea Technologies, Paris, France. conference on medical image computing and computer-assisted The other authors declare no conflict of interest. intervention, Springer, pp 265–273 13. Vercauteren T, Perchant A, Malandain G, Pennec X, Ayache N Ethical approval All procedures performed in studies involving human (2006) Robust mosaicing with correction of motion distortions and participants were in accordance with the ethical standards of the insti- tissue deformations for in vivo fibered microscopy. Med Image tutional and/or national research committee and with the 1964 Helsinki Anal 10(5):673–692 declaration and its later amendments or comparable ethical standards. 14. Villena S, Vega M, Babacan SD, Molina R, Katsaggelos AK (2013) Bayesian combination of sparse and non-sparse priors in image Informed consent For this type of study, formal consent is not required. super resolution. Digit Signal Process 23(2):530–541 This article does not contain patient data. 15. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image qual- ity assessment: from error visibility to structural similarity. IEEE Open Access This article is distributed under the terms of the Creative Trans Image Process 13(4):600–612 Commons Attribution 4.0 International License (http://creativecomm 16. Zhao H, Gallo O, Frosio I, Kautz J (2015) Loss functions for ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, image restoration with neural networks. IEEE Trans Comput Imag and reproduction in any medium, provided you give appropriate credit 3(1):47–57. https://doi.org/10.1109/TCI.2016.2644865 to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. References 1. Agustsson E, Timofte R (2017) NTIRE 2017 challenge on single image super-resolution: dataset and study. In: 2017 IEEE con- ference on computer vision and pattern recognition workshops

Journal

International Journal of Computer Assisted Radiology and SurgerySpringer Journals

Published: Apr 23, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off