GPU based parallel acceleration for fast C-arm cone-beam CT reconstruction

GPU based parallel acceleration for fast C-arm cone-beam CT reconstruction yq.xie@siat.ac.cn Shenzhen Institute Background: With the introduction of Flat Panel Detector technology, cone-beam of Advanced Technology, CT (CBCT ) has become a novel image modality, and widely applied in clinical prac- Chinese Academy of Sciences, Shenzhen, China tices. C-arm mounted CBCT has shown extra suitability in image guided interventional Full list of author information surgeries. During practice, how to acquire high resolution and high quality 3D images is available at the end of the with the real time requirement of clinical applications remain challenging. article Methods: In this paper, we propose a GPU based accelerated method for fast C-arm CBCT 3D image reconstructions. A filtered back projection method is optimized and implemented with GPU parallel acceleration technique. A distributed system is designed to make full use of the image acquisition consumption to hide the recon- struction delay to further improve system performance. Results: With the acceleration both in algorithm and system design, we show that our method significantly increases system efficiency. The optimized GPU accelerated FDK algorithm improves the reconstruction efficiency. The system performance is further enhanced with the proposed system design by 26% and reconstruction delay is accelerated by 2.1 times when 90 frames of projections are used. When the number of frames used increases to 120, the numbers are 39% and 3.3 times. We also show that when the projection acquisition consumption increases, the reconstruction accelera- tion rate increases significantly. Keywords: Image guided therapy, Fast reconstruction, CBCT, GPU Background With the introduction of Flat Panel Detector (FPD) technique, cone-beam computed tomography (CBCT) has become a novel image technology. FPD provides several theo- retical advantages such as high space resolution, wide dynamic range, square FOV and real-time imaging capability with no geometric distortion [1]. Such good features ena- ble CBCT to generate an entire volumetric data set in a single gantry rotation [2], and allows for verification of the delivered dose distribution [ 3]. The radiation dose is also reported to decrease [1, 4]. Therefore, CBCT has been widely applied in clinical appli - cations in image guided surgery and interventional radiology, such as CBCT guidance of brachytherapy, spinal, orthopedics, thoracic and abdominal surgery [5–10]. Some groups reported that CBCT can achieve good performance in fenestrated/branched aor- tic endografting and small lung nodule percutaneous transthoracic needle biopsy. Some group showed that CBCT depicts considerably more small aneurysms and important © The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creat iveco mmons .org/publi cdoma in/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Chen et al. BioMed Eng OnLine (2018) 17:73 Page 2 of 14 anatomic details, and can be used as new gold standard in the detection of intracranial aneurysms [11, 12]. Besides, C-arm mounted CT shows especially suitable features for image guided inter- ventions. The system is compact, therefore the patient can stay stationary during the image acquisition. Volumetric tomographic images can be combined and co-displayed with conventional 2D angiographic imaging, therefore pre-operative surgery planning, surgery device tracking and navigation, final result access and margins verification achievable [13, 14]. To acquire 3D volumetric images, several categories of algorithms are explored. One of the major category is the iterative algorithms such as ART combined with compress sensing theory using a Total Variation (TV) norm to regularize the cost function such as mentioned in [15, 16]. The main challenge of such algorithms is the cost of calcula - tion. The time consuming process and high hardware requirement may limit their use in clinical applications. Therefore, FDK algorithm still seems to be a better choice for practical application. The filtered back projection algorithms can be further accelerated using GPU parallel techniques. From [17–19] we can see that some groups have made progress about accelerate FDK algorithm with GPU. In [18] the author reviewed how the GPU can be applied to almost every kind of image reconstruction algorithms. In [19] the author compared implementations of FDK method over different platforms to show a significant performance improvement. What is more, with a carefully designed distrib - uted system, the algorithm can be run on high performance devices especially targeted to parallel acceleration, and the system delay can be further improved with latency hid- ing techniques. In this paper, we propose a distributed system for c-arm mounted CBCT imaging system, and a GPU based acceleration method for fast CBCT reconstruction. As stated above, filtered back projection methods are more suitable for real time clinical 3D imag - ing acquisition than iterative optimization kind methods, and GPU parallel acceleration can be applied. Although the GPU parallel acceleration technique is not new, the accel- eration plan can be further optimized with geometric symmetry and a proper system design. Therefore we propose to further optimize the FDK algorithm based on geomet - ric symmetry, and implement it with GPU parallel acceleration techniques. We also pro- pose to design a delay hiding scheme based on a distributed system layout connected via TCP/IP protocol, making full use of the projection acquisition consumption to hide the reconstruction delay. The rest of the paper is organized as follows: in “Methods ” section we explain the details of system design, the GPU accelerated FDK algorithm implemen- tation and latency hiding scheme. In “Experiment results and discussion” section, we show the reconstruction result and evaluate the system performance. Methods System design To achieve a better acceleration effect a high performance GPU specially designed for computing task may be required, which may make ordinary hardware system not suit- able. Besides, as the main constrain of the system efficiency, the reconstruction pro - cess is relatively an independent part of the image chain, which implies further change or update will not intervene with other parts of the system. Therefore, a pluggable control order 3d volume 2d projections control order Chen et al. BioMed Eng OnLine (2018) 17:73 Page 3 of 14 computing unit with a distributed system architecture is favored. We briefly describe our system design as follows. The system is composed of three main parts as shown in Fig.  1. The C-arm control unit controls the gantry rotation, acquires projection images. The computing unit recon - structs 3D volume data from 2d projection images. The main console unit controls the image chain, sending orders and requests to corresponding units, manages data stream and display system statues and 2D/3D visualization. The three parts are connected and communicating via TCP/IP protocol to transmit data and orders. GPU accelerated FDK algorithm FDK algorithm was originally proposed by Feldkamp et al. [20] for approximate 3D fil - tered back projection reconstruction with circular trajectory and flat panel detectors. The algorithm can be briefly represented as follows: 2π 1 DSD DSD f (x, y, z) = φ (u, v)√ ∗ h(u) dθ (1) 2 2 2 2 U DSD + u + v where U = DSD − z sin(θ ) + y cos(θ ) (2) and we define the two weighting factors as DSD W = (3) DSD W = , 2 2 2 DSD + u + v C-Arm Control Unit Computing Unit TCP TRANSPORT LAYER Main Console Fig. 1 System architecture. The system is divided into three parts: a C-arm control unit, a computing unit, and a main console. The units are connected through TCP/IP protocol for data and order transmission Chen et al. BioMed Eng OnLine (2018) 17:73 Page 4 of 14 while DSD is the distance from source to the detector, φ is the projection data, h is the filter kernel, W and W are weighting factors to compensate the different ray length. u, 1 2 v are the projection of the ray with angle θ on the flat panel detector. The coordinate sys - tem is defined in Fig.  2. O is the center of FOV as well as the C-arm rotation geometry, O1 is the projection of O on flat panel detector, and is defined as the origin of projection images. The nature of FDK algorithm is especially suitable for parallel acceleration. The main idea of GPU parallel acceleration technique is that the GPU provides far more arithmetic units than general purpose processors, and a stream processing scheme for high efficient parallel computing. For each element of an input stream data, a kernel is defined to carry out arbitrary calculations to produce an output stream data. There - fore GPU acceleration is especially suitable for pixel-wise operations, turning iterative loops of similar operations into parallel execution. For FDK algorithm, the projection position calculation process to determine the projection position of each volume voxel on the flat panel detector plane, and the cal - culation of the weighting factor W and W for each volume voxel in the back pro- 1 2 jection procedure are most time consuming. However, these calculations are highly similar for each volume voxel, and there is no dependency between each voxel, so intuitively the voxel-wise iterative loop can be parallelized by assigning a kernel to each volume voxel to improve efficiency. We also observe that W is only dependent on the projection coordinate on the detector plane, therefore the calculation of W can be separated from the voxel wise calculation and treated as a filtering process pd pd O1 px2 px1 flat panel plane x1 x2 d d volume source Fig. 2 FDK reconstruction coordinate system definition. The definition of the volume voxel coordinate system axis X, Y, Z with origin O and the flat panel plane coordinate system U,V with origin O1. Assume voxel x1 and x2 are symmetric with the plane s formed by axis Z and axis V, and the distance to plane s is d, their projections on the flat panel plane px1 and px2 are symmetric with axis V, with the same distance pd to axis V. Therefore we only need to calculate one half of the volume voxels. The calculation for the other half can be accomplished by a mirror action Chen et al. BioMed Eng OnLine (2018) 17:73 Page 5 of 14 before the back projection process. The stream processing scheme of the reconstruc - tion for an arbitrary frame can be briefly described as Fig. 3 . With the geometric symmetry, the number of kernels needed can be further opti- mized. For a pair of voxels symmetric with a plane s formed with z axis and v axis in Fig. 2, the projection positions on the flat panel detector plane are also symmetric with axis v, and the two weighting factors are the same. Therefore, we only need calculations for one half of the voxels along axis x+ or x−, and the calculations for the other half can be achieved with a mirror action. For the horizontal filtering, convolution can be achieved efficiently by fast Fourier transform method CUDA provides. Latency hiding implementation With the distributed system design, the efficiency of reconstruction process can be further boosted with a latency hiding technique. As stated in the last section, the only dependency of the reconstruction with an arbitrary projection frame is the acquisition of the frame, while the projection acquisition does not depend on the reconstruction result. Therefore the efficiency can be further improved on system level by designing a parallel control time sequence of the image chain to make full use of system time consumption such as C-arm rotation, image acquisition and processing, and data transmission, etc., input stream data FK1 FK2 FKN kernel stream data 1 BK1 BK2 BKN output stream data Fig. 3 Stream processing scheme for FDK algorithm. Each volume voxel is considered as an element of the input data stream. The projection position calculation kernel provides each element a parallel thread to calculate the corresponding projections on the flat panel plane. The results form a new data stream as the input to the back projection kernel, which performs the back projection and calculate the value of the corresponding volume voxel parallelly Chen et al. BioMed Eng OnLine (2018) 17:73 Page 6 of 14 which we define as the projection acquisition consumption. The control is designed as follows: We design three control time sequences, T1 T2 and T3, all of which take values of − 1 and + 1, as shown in Fig. 4. T1 represent the statue of projection image acquisition pro- cess, when the C-arm has moved to an arbitrary position and the image is acquired and transmitted to computing unit via TCP/IP, T1 is set opposite. T2 represents the statue of reconstruction process. When an arbitrary frame is filtered back projected to the volume data, the T2 value is set opposite. T3 is the control signal to synchronize T1 and T2. T3 is generated by a timer with a very small time interval. Whenever T3 has a falling edge, T1 and T2 signal is checked. When T1 has a falling/rising edge, the newly acquired image and corresponding parameters are pushed into a queue L. When T2 has a falling/rising edge, the first image and the parameters are popped out of the queue, and correspond - ing memory is released. When the queue L is empty, the reconstruction is complete, and all the time sequences and the queue are reset. Compared with the linear image chain system, the control sequences allow the usually less time consuming image acquisition process be carried out as the reconstruction progressing. In the experiment section we show that the latency hiding scheme can further improve system performance. Experiment design We test our proposed method from two aspects. First we show the reconstruction result of our methods. We use a Shepp–Logan numeric phantom for quantitative evalua- tion of reconstruction accuracy. We also show reconstruction results for phantoms of blood vessel, head and foot respectively. Then we discuss the efficiency of our proposed method. We first discuss solely the reconstruction process by comparing our proposed method with other methods either with different methodology, or with different acceler - ation technique. We then evaluate the system performance enhancement by introducing two acceleration ratio. The first ratio, system performance ratio β represents the sys- sys tem performance boost by comparing the system overall delay of our proposed system and a linear image chain system, yielding β = 1 − T /(T + T ) , where T sys prop recon acq prop is the average of the measured system delay of our proposed system, and T and T acq recon acquire image 01 acquire image 02 acquire image 03 acquire image 04 acquire image 05 acquire image 06 acquire image 07 acquire image 08 acquire image 09 acquire image 10 T1 reconstruct with image 01 reconstruct with image 02 reconstruct with image 03 reconstruct with image 04 reconstruct with image 05 T2 T3 push push push push push push push push pop pop pop pop pop Fig. 4 System control time sequences. At every falling edge of T3, the states of T1 and T2 are checked to decide a push or pop action on the projection queue. When the reconstruction of an arbitrary frame is in progress, the acquisitions for the next frames can be carried out at the same time to form a latency hiding scheme. The push and pop action controls the frame queue for reconstruction Chen et al. BioMed Eng OnLine (2018) 17:73 Page 7 of 14 are the average time consumption for projection acquisition and reconstruction pro- cess respectively. Another ratio, reconstruction acceleration ratio β aims to evaluate recon the reconstruction efficiency improvement provided by our proposed system, yielding β = T /(T − T ) . The average is acquired over a test data set of 10 gantry recon recon prop acq rotations of our C-arm mounted CBCT. System and environment setup We test our method on our designed C-arm imaging system. The C-arm DSD is 1000  mm, SAD is 500  mm. The X-ray source is imd X-RAY TUBE HEAD E-40R, the parameter is 65  kv 2  mA with an exposure time of 15  ms. The projection image has a dimension of 1560 × 1440 pixels, with a 0.18 × 0.18-mm resolution, with the acquisition angle averagely covers a range of 210°. A Quadro 6000 is used for GPU acceleration, with 256 threads in parallel. Experiment results and discussion 3D reconstruction evaluation We first discuss the 3D reconstruction result from our system, to show that our method does not compromise the reconstruction accuracy. We test our method on a numeric phantom for quantitative analysis by evaluating the reconstruction error with the ground truth. We also show reconstruction result of a blood vessel phantom, a head phantom and a foot phantom respectively, to show that our proposed method is capable of cor- rectly reconstructing the interested structure from actual projection data acquired from a clinical practical C-arm CBCT. The Shepp–Logan numeric phantom we use to test our method is shown in Fig.  5. We compare the reconstruction result and the ground truth on line profiles shown in Fig.  5, and the result is shown in Fig.  6. If we define the relative error between reconstruction � (abs(f −f )/f )∗ 100% rec 0 0 f and the ground truth f as I = , where N is the number of pix- rec 0 els counted, the relative error along the profile line is 2%, which suggests the reconstruc - tion basically preserves the feature of the phantom. In Fig 7 we show the reconstruction Fig. 5 Reconstruction result of a standard Shepp–Logan numeric phantom. a Standard Shepp–Logan phantom; b reconstruction result Chen et al. BioMed Eng OnLine (2018) 17:73 Page 8 of 14 standard shepp-logan reconstruc on 1.2 0.8 0.6 0.4 0.2 Fig. 6 Line profile difference between reconstruction and standard phantom along the dash line shown in Fig. 4. The black line represents the standard Shepp–Logan phantom; the red line represents reconstruction result. Along the line the relative error is 2% of a Elatras brain blood vessel phantom with 3 aneurysms. The head and foot phantom and the reconstruction results are shown in Figs. 8 and 9 respectively. Efficiency analysis In Table  1, we present the time consumption T of proposed GPU accelerated FDK, recon non-accelerated FDK, IPP accelerated FDK and an iterative TV norm regulated algebra reconstruction technique (TV-ART) algorithm, with different number of projection frames used for reconstruction. Theoretically the acceleration ratio should be equal to the number of parallel threads, which in our case should be 256. However, due to host– device communication and data transmission delay, only an acceleration ratio of 69 is achieved. Nevertheless, from Table  1 we can still show that GPU acceleration can sig- nificantly improve calculation efficiency, and also achieve a better performance enhance than existing acceleration techniques such as IPP, even when tested on a relatively low performance GPU. In Table 2, we present the total system delay of our proposed image chain T , and the prop time consumption T solely for projection acquisition. Combining Tables  1 and 2, we acq calculate the linear image chain system delay T = T + T and the reconstruc- linear acq recon tion delay of our proposed system defined as T = T − T , and a summary recon_prop prop acq is provided in Table 3. We compare the system delay of our proposed system T from prop Table 2 with the linear image chain system delay T from Table 3 to show the system linear performance improvement under different circumstances. We also compare the recon - struction delay of our proposed system T from Table 3 with the reconstruction recon_prop time T from Table  1 to show the reconstruction efficiency boost. The two accelera - recon tion ratios β and β are also calculated. We can see that when we use 90 frames sys recon of projection data, a linear image chain system delay T is 29 s, while our proposed linear method yields a system delay of 21.49 s, which implies a β of 26%. The reconstruction sys 267 Chen et al. BioMed Eng OnLine (2018) 17:73 Page 9 of 14 Fig. 7 Reconstruction result of a vessel phantom. a The brain blood vessel phantom with 3 cranial aneurysms. b MIP of reconstruction result. c–e Sagittal view, coronal view and transverse view delay T is 6.78 s, which yields a β of 2.1. When we use 120 frames in case 1, recon_prop recon the numbers are 46.29, 28.17 and 7.64  s respectively, yielding a β of 39% and a β sys recon of 3.3. We can infer that, as the frames of 2d projections used for 3D reconstruction increases, the system performance benefit more from our proposed method. We also simulate a remote connection situation in case 2 by connecting the system units through WAN while in case 1 the system units are connected via LAN with the same router. Although the system performance enhancement ratio β is 33.4%, the sys reconstruction delay decreases to 0.39 s, which is accelerated by β of 66 times. With recon the reconstruction optimization techniques, when we significantly increase the recon - struction efficiency and the projection acquisition consumption is dominant, the recon - struction delay can be very low. When the time consumption for reconstruction and project acquisition equals, the system performance benefit is maximized, and the recon - struction process can almost be hidden. The upper limit of the system performance Chen et al. BioMed Eng OnLine (2018) 17:73 Page 10 of 14 Fig. 8 Reconstruction result of a head phantom. a The head phantom. b MIP of reconstruction result. c–e Sagittal view, coronal view and transverse view enhancement β , which can be approached but not achievable, is 50%. A more straight sys forward comparison is shown in Fig. 10. Conclusion In this paper, we propose a GPU parallel acceleration based fast CBCT 3D reconstruc- tion method. We describe how the FDK algorithm is parallelized, and also a control time sequence designed to further improve efficiency by hiding system latency. We can see that our proposed method significantly improves system performance. GPU paral - lel acceleration significantly improves the FDK reconstruction process. Our designed latency hiding scheme further improves the system performance. When 90 frames of Chen et al. BioMed Eng OnLine (2018) 17:73 Page 11 of 14 Fig. 9 Reconstruction result of a foot phantom. a The foot phantom. b MIP of reconstruction result. c–e Sagittal view, coronal view and transverse view Table 1 Comparison of GPU, IPP accelerated FDK and GPU accelerated TV-ART algorithms time consumption T recon Non-accelerated FDK IPP FDK GPUFDK GPU ART Frames 90 90 90 120 120 Time 964 s 53.40 s 14.28 s 25.76 s 32 min Chen et al. BioMed Eng OnLine (2018) 17:73 Page 12 of 14 Table 2 Summary of system delay T and projection acquisition cost T prop acq System delay Projection acquisition cost Frames 90 120 case 1 120 case 2 90 120 case 1 120 case 2 Time (s) 21.49 28.17 50.53 14.72 20.53 50.14 Table 3 Summary of  linear system delay T , proposed reconstruction delay T , recon_prop linear system performance ratio β and reconstruction acceleration ratio β sys recon Linear system Proposed System performance Reconstruction delay (s) reconstruction delay ratio (%) acceleration (s) ratio 90 frames 29.00 6.78 26 2.1 120 frames LAN 46.29 7.64 39 3.3 120 frame WAN 75.90 0.39 33 66 Fig. 10 Efficiency comparison. We compare the linear image chain system with our proposed system design under different circumstance. (90 frames, 120 frames with LAN connection, 120 frames with WAN connection) up: the system delay comparison. We can conclude that our proposed design further improves the system performance. Down: the reconstruction delay comparison. We can conclude that the latency hide design significantly reduces reconstruction delay, and when other part of the image chain is dominant, the reconstruction process can be almost fully hidden, yielding nearly no reconstruction delay Chen et al. BioMed Eng OnLine (2018) 17:73 Page 13 of 14 projections are used for reconstruction, our proposed method improve system delay by 26% and the reconstruction delay by 2.1 times. When 120 frames are used, the numbers are 39% and 3.3 times. We also show that when the projection acquisition delay is domi- nant in the image chain, the reconstruction process can be almost fully hidden, yielding a significant improvement of reconstruction delay, which is 66 times in our case. Although the quality of the reconstruction volume may suffer from the approximate nature of the filtered back projection algorithms compared with iterative algorithms such as ART, we show that the features of interest are acceptably preserved. A typical ART kind algorithm as described in [16] may take more than 40  min for reconstruc- tion, while our proposed method only take 20+ seconds under the same circumstance. To trade off between image quality and real time requirement, our proposed method will be more suitable for clinical practice. A distributed system design with TCP/IP pro- tocol makes the system pluggable and adaptive. With this design, algorithm and hard- ware update of reconstruction techniques can be fulfilled more easily, the system is also prepared for further expansion, such as multi-task support and distant network medical applications. Authors’ contributions KC developed the algorithm and implemented the system, wrote the manuscripts. CW anticipated the system design and implementation, and helped the experiment. JX and YX oversaw the project. All authors read and approved the final manuscript. Author details 1 2 Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China. Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, China. Acknowledgements This work is supported partly by grants of National Key Research Program of China (Grant No. 2016YFC0105102), National Natural Science Foundation of China (No. 61403368), Union of Production, Study and Research Project of Guangdong Province (Grant No. 2015B090901039), Science Foundation of Guangdong (2017B020229002, 2014A030312006), Leading Talent of Special Support Project in Guangdong (2016TX03R139), Technological Breakthrough Project of Shenzhen City (Grant No. JSGG20160229203812944), Shenzhen High-level Oversea Talent Program Grant (KQJSCX20160301144248), Shenzhen Fundamental Research Project, Shenzhen Key Technical Research Project (JSGG20160229203812944) and Beijing Center for Mathematics and Information Interdisciplinary Sciences. Competing interests The authors declare that they have no competing interests. Ethics approval and consent to participate Not applicable. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Received: 8 March 2018 Accepted: 23 May 2018 References 1. Hatakeyama Y, Kakeda S, Korogi Y, Ohnari N, Moriya J, Oda N, Nishino K, Miyamoto W. Intracranial 2d and 3d dsa with flat panel detector of the direct conversion type initial experience. Eur Radiol. 2006;16:2594–602. 2. Jaffray D, Siewerdsen J. Conebeam computed tomography with a flat-panel imager: initial performance characteri- zation. Med Phys. 2000;27(6):1311–23. 3. Xing L, Thorndyke B, Schreibmann E, Yang Y, Li TF, Kim GY, Luxton G, Koong A. Overview of image-guided radiation therapy. Med Dosim. 2006;31(2):91–112. 4. Ishikura R, Ando K, Nagami Y, et al. Evaluation of vascular supply with cone-beam computed tomography during intraarterial chemotherapy for a skull base tumor. Radiat Med. 2006;24(5):384–7. 5. Siewerdsen JH, Jaffray DA, Edmundson GK, Sanders WP, Wong JW, Martinez A. Flat-panel cone-beam CT: a novel imaging technology for image-guided procedures. Proc SPIE. 2001;4319:435–44. Chen et al. BioMed Eng OnLine (2018) 17:73 Page 14 of 14 6. Jaffray DA, Siewerdsen JH, Edmundson GK, Wong JW, Martinez A. Flat-panel cone-beam CT on a mobile isocentric c-arm for image-guided brachytherapy. Proc SPIE. 2002;4682:209–17. 7. Siewerdsen JH, Moseley DJ, Burch S, Bisland SK, Bogaards A, Wilson BC, Jaffray DA. Volume CT with a flat-panel detector on a mobile, isocentric C-arm: pre-clinical investigation in guidance of minimally invasive surgery. Med Phys. 2005;32(1):241–54. 8. Khoury A, Whyne CM, Daly MJ, Moseley DJ, Bootsma G, Skrinskas T, Siewerdsen JH, Jaffray DA. Intraoperative cone-beam CT for correction of periaxial malrotation of the femoral shaft: a surfacematching approach. Med Phys. 2007;34(4):1380–7. 9. Siewerdsen JH, Chan Y, Rafferty MA, Moseley DJ, Jaffray DA, Irish JC. Cone-beam CT with a flat-panel detector on a mobile C-arm: pre-clinical investigation in image-guided surgery of the head and neck. In: Galloway RL, Cleary KR, editors. Medical imaging. Proceedings of SPIE, SPIE, Bellingham, vol. 5744; 2005. pp. 789–797 10. Chan Y, Siewerdsen JH, Rafferty MA, Moseley DJ, Jaffray DA, Irish JC. Cone-beam, CT on a mobile C-arm: a novel intraoperative imaging technology for guidance of head and neck surgery. Proc SPIE. 2001;4319:435–44. 11. Karamessini MT, Kagadis GC, Petsas T, Karnabatidis D, Konstantinou D, Sakellaropoulos GC, Nikiforidis GC, Siablis D. CT angiography with three-dimensional techniques for the early diagnosis of intracranial aneurysms. Comparison with intra-arterial DSA and the surgical findings. Eur J Radiol. 2004;49(3):212–23. 12. van Rooij W, Sprengers M, de Gast A, Peluso J, Sluzewski M. 3d rotational angiography: the new gold standard in the detection of additional intracranial aneurysms. Am J Neuroradiol. 2008;29(5):976–9. 13. Orth RC, Wallace MJ, Kuo MD. C-arm cone-beam CT: general principles and technical considerations for use in interventional radiology. J Vasc Interv Radiol. 2008;19(6):814–20. 14. Floridi C, Radaelli A, Abi-Jaoudeh N, Grass M, Lin MD, Chiaradia M, Geschwind J-F, Kobeiter H, Squillaci E, Maleux G, Giovagnoni A, Brunese L, Wood B, Carrafiello G, Rotondo A. C-arm cone-beam computed tomography in interven- tional oncology technical aspects and clinical applications. Radiol Med. 2014;119(7):521–32. 15. Niu T, Ye X, Fruhauf Q, Petrongolo M, Zhu L. Accelerated barrier optimization compressed sensing (ABOCS) for CT reconstruction with improved convergence. Phys Med Biol. 2017;59(7):1801–14. 16. Park JC, Song B, Kim JS, Park SH, Kim HK, Liu Z, Suh TS, Song WY. Fast compressed sensing-based CBCT reconstruc- tion using Barzilai–Borwein formulation for application to on-line IGRT. Med Phys. 2012;39(3):1207–17. 17. Sharp G, Kandasamy N, Singh H, Folkert M. GPU-based streaming architectures for fast cone-beam CT image recon- struction and demons deformable registration. Phys Med Biol. 2007;52(19):5771–83. 18. Despres P, Jia X. A review of GPU-based medical image reconstruction. Phys Medica. 2017;42:76–92. 19. Leeser M, Mukherjee S, Brock J. Fast reconstruction of 3D volumes from 2D CT projection data with GPUs. BMC Res Notes. 2014;7:582. 20. Feldkamp L, Davis L, Kress J. Practical cone-beam algorithm. J Opt Soc Am A. 1984;1(6):612–9. Ready to submit your research ? Choose BMC and benefit from: fast, convenient online submission thorough peer review by experienced researchers in your field rapid publication on acceptance support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year At BMC, research is always in progress. Learn more biomedcentral.com/submissions http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png BioMedical Engineering OnLine Springer Journals

GPU based parallel acceleration for fast C-arm cone-beam CT reconstruction

Free
14 pages
Loading next page...
 
/lp/springer_journal/gpu-based-parallel-acceleration-for-fast-c-arm-cone-beam-ct-y0Ensde70T
Publisher
BioMed Central
Copyright
Copyright © 2018 by The Author(s)
Subject
Engineering; Biomedical Engineering; Biomaterials; Biotechnology; Biomedical Engineering/Biotechnology
eISSN
1475-925X
D.O.I.
10.1186/s12938-018-0506-4
Publisher site
See Article on Publisher Site

Abstract

yq.xie@siat.ac.cn Shenzhen Institute Background: With the introduction of Flat Panel Detector technology, cone-beam of Advanced Technology, CT (CBCT ) has become a novel image modality, and widely applied in clinical prac- Chinese Academy of Sciences, Shenzhen, China tices. C-arm mounted CBCT has shown extra suitability in image guided interventional Full list of author information surgeries. During practice, how to acquire high resolution and high quality 3D images is available at the end of the with the real time requirement of clinical applications remain challenging. article Methods: In this paper, we propose a GPU based accelerated method for fast C-arm CBCT 3D image reconstructions. A filtered back projection method is optimized and implemented with GPU parallel acceleration technique. A distributed system is designed to make full use of the image acquisition consumption to hide the recon- struction delay to further improve system performance. Results: With the acceleration both in algorithm and system design, we show that our method significantly increases system efficiency. The optimized GPU accelerated FDK algorithm improves the reconstruction efficiency. The system performance is further enhanced with the proposed system design by 26% and reconstruction delay is accelerated by 2.1 times when 90 frames of projections are used. When the number of frames used increases to 120, the numbers are 39% and 3.3 times. We also show that when the projection acquisition consumption increases, the reconstruction accelera- tion rate increases significantly. Keywords: Image guided therapy, Fast reconstruction, CBCT, GPU Background With the introduction of Flat Panel Detector (FPD) technique, cone-beam computed tomography (CBCT) has become a novel image technology. FPD provides several theo- retical advantages such as high space resolution, wide dynamic range, square FOV and real-time imaging capability with no geometric distortion [1]. Such good features ena- ble CBCT to generate an entire volumetric data set in a single gantry rotation [2], and allows for verification of the delivered dose distribution [ 3]. The radiation dose is also reported to decrease [1, 4]. Therefore, CBCT has been widely applied in clinical appli - cations in image guided surgery and interventional radiology, such as CBCT guidance of brachytherapy, spinal, orthopedics, thoracic and abdominal surgery [5–10]. Some groups reported that CBCT can achieve good performance in fenestrated/branched aor- tic endografting and small lung nodule percutaneous transthoracic needle biopsy. Some group showed that CBCT depicts considerably more small aneurysms and important © The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creat iveco mmons .org/publi cdoma in/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Chen et al. BioMed Eng OnLine (2018) 17:73 Page 2 of 14 anatomic details, and can be used as new gold standard in the detection of intracranial aneurysms [11, 12]. Besides, C-arm mounted CT shows especially suitable features for image guided inter- ventions. The system is compact, therefore the patient can stay stationary during the image acquisition. Volumetric tomographic images can be combined and co-displayed with conventional 2D angiographic imaging, therefore pre-operative surgery planning, surgery device tracking and navigation, final result access and margins verification achievable [13, 14]. To acquire 3D volumetric images, several categories of algorithms are explored. One of the major category is the iterative algorithms such as ART combined with compress sensing theory using a Total Variation (TV) norm to regularize the cost function such as mentioned in [15, 16]. The main challenge of such algorithms is the cost of calcula - tion. The time consuming process and high hardware requirement may limit their use in clinical applications. Therefore, FDK algorithm still seems to be a better choice for practical application. The filtered back projection algorithms can be further accelerated using GPU parallel techniques. From [17–19] we can see that some groups have made progress about accelerate FDK algorithm with GPU. In [18] the author reviewed how the GPU can be applied to almost every kind of image reconstruction algorithms. In [19] the author compared implementations of FDK method over different platforms to show a significant performance improvement. What is more, with a carefully designed distrib - uted system, the algorithm can be run on high performance devices especially targeted to parallel acceleration, and the system delay can be further improved with latency hid- ing techniques. In this paper, we propose a distributed system for c-arm mounted CBCT imaging system, and a GPU based acceleration method for fast CBCT reconstruction. As stated above, filtered back projection methods are more suitable for real time clinical 3D imag - ing acquisition than iterative optimization kind methods, and GPU parallel acceleration can be applied. Although the GPU parallel acceleration technique is not new, the accel- eration plan can be further optimized with geometric symmetry and a proper system design. Therefore we propose to further optimize the FDK algorithm based on geomet - ric symmetry, and implement it with GPU parallel acceleration techniques. We also pro- pose to design a delay hiding scheme based on a distributed system layout connected via TCP/IP protocol, making full use of the projection acquisition consumption to hide the reconstruction delay. The rest of the paper is organized as follows: in “Methods ” section we explain the details of system design, the GPU accelerated FDK algorithm implemen- tation and latency hiding scheme. In “Experiment results and discussion” section, we show the reconstruction result and evaluate the system performance. Methods System design To achieve a better acceleration effect a high performance GPU specially designed for computing task may be required, which may make ordinary hardware system not suit- able. Besides, as the main constrain of the system efficiency, the reconstruction pro - cess is relatively an independent part of the image chain, which implies further change or update will not intervene with other parts of the system. Therefore, a pluggable control order 3d volume 2d projections control order Chen et al. BioMed Eng OnLine (2018) 17:73 Page 3 of 14 computing unit with a distributed system architecture is favored. We briefly describe our system design as follows. The system is composed of three main parts as shown in Fig.  1. The C-arm control unit controls the gantry rotation, acquires projection images. The computing unit recon - structs 3D volume data from 2d projection images. The main console unit controls the image chain, sending orders and requests to corresponding units, manages data stream and display system statues and 2D/3D visualization. The three parts are connected and communicating via TCP/IP protocol to transmit data and orders. GPU accelerated FDK algorithm FDK algorithm was originally proposed by Feldkamp et al. [20] for approximate 3D fil - tered back projection reconstruction with circular trajectory and flat panel detectors. The algorithm can be briefly represented as follows: 2π 1 DSD DSD f (x, y, z) = φ (u, v)√ ∗ h(u) dθ (1) 2 2 2 2 U DSD + u + v where U = DSD − z sin(θ ) + y cos(θ ) (2) and we define the two weighting factors as DSD W = (3) DSD W = , 2 2 2 DSD + u + v C-Arm Control Unit Computing Unit TCP TRANSPORT LAYER Main Console Fig. 1 System architecture. The system is divided into three parts: a C-arm control unit, a computing unit, and a main console. The units are connected through TCP/IP protocol for data and order transmission Chen et al. BioMed Eng OnLine (2018) 17:73 Page 4 of 14 while DSD is the distance from source to the detector, φ is the projection data, h is the filter kernel, W and W are weighting factors to compensate the different ray length. u, 1 2 v are the projection of the ray with angle θ on the flat panel detector. The coordinate sys - tem is defined in Fig.  2. O is the center of FOV as well as the C-arm rotation geometry, O1 is the projection of O on flat panel detector, and is defined as the origin of projection images. The nature of FDK algorithm is especially suitable for parallel acceleration. The main idea of GPU parallel acceleration technique is that the GPU provides far more arithmetic units than general purpose processors, and a stream processing scheme for high efficient parallel computing. For each element of an input stream data, a kernel is defined to carry out arbitrary calculations to produce an output stream data. There - fore GPU acceleration is especially suitable for pixel-wise operations, turning iterative loops of similar operations into parallel execution. For FDK algorithm, the projection position calculation process to determine the projection position of each volume voxel on the flat panel detector plane, and the cal - culation of the weighting factor W and W for each volume voxel in the back pro- 1 2 jection procedure are most time consuming. However, these calculations are highly similar for each volume voxel, and there is no dependency between each voxel, so intuitively the voxel-wise iterative loop can be parallelized by assigning a kernel to each volume voxel to improve efficiency. We also observe that W is only dependent on the projection coordinate on the detector plane, therefore the calculation of W can be separated from the voxel wise calculation and treated as a filtering process pd pd O1 px2 px1 flat panel plane x1 x2 d d volume source Fig. 2 FDK reconstruction coordinate system definition. The definition of the volume voxel coordinate system axis X, Y, Z with origin O and the flat panel plane coordinate system U,V with origin O1. Assume voxel x1 and x2 are symmetric with the plane s formed by axis Z and axis V, and the distance to plane s is d, their projections on the flat panel plane px1 and px2 are symmetric with axis V, with the same distance pd to axis V. Therefore we only need to calculate one half of the volume voxels. The calculation for the other half can be accomplished by a mirror action Chen et al. BioMed Eng OnLine (2018) 17:73 Page 5 of 14 before the back projection process. The stream processing scheme of the reconstruc - tion for an arbitrary frame can be briefly described as Fig. 3 . With the geometric symmetry, the number of kernels needed can be further opti- mized. For a pair of voxels symmetric with a plane s formed with z axis and v axis in Fig. 2, the projection positions on the flat panel detector plane are also symmetric with axis v, and the two weighting factors are the same. Therefore, we only need calculations for one half of the voxels along axis x+ or x−, and the calculations for the other half can be achieved with a mirror action. For the horizontal filtering, convolution can be achieved efficiently by fast Fourier transform method CUDA provides. Latency hiding implementation With the distributed system design, the efficiency of reconstruction process can be further boosted with a latency hiding technique. As stated in the last section, the only dependency of the reconstruction with an arbitrary projection frame is the acquisition of the frame, while the projection acquisition does not depend on the reconstruction result. Therefore the efficiency can be further improved on system level by designing a parallel control time sequence of the image chain to make full use of system time consumption such as C-arm rotation, image acquisition and processing, and data transmission, etc., input stream data FK1 FK2 FKN kernel stream data 1 BK1 BK2 BKN output stream data Fig. 3 Stream processing scheme for FDK algorithm. Each volume voxel is considered as an element of the input data stream. The projection position calculation kernel provides each element a parallel thread to calculate the corresponding projections on the flat panel plane. The results form a new data stream as the input to the back projection kernel, which performs the back projection and calculate the value of the corresponding volume voxel parallelly Chen et al. BioMed Eng OnLine (2018) 17:73 Page 6 of 14 which we define as the projection acquisition consumption. The control is designed as follows: We design three control time sequences, T1 T2 and T3, all of which take values of − 1 and + 1, as shown in Fig. 4. T1 represent the statue of projection image acquisition pro- cess, when the C-arm has moved to an arbitrary position and the image is acquired and transmitted to computing unit via TCP/IP, T1 is set opposite. T2 represents the statue of reconstruction process. When an arbitrary frame is filtered back projected to the volume data, the T2 value is set opposite. T3 is the control signal to synchronize T1 and T2. T3 is generated by a timer with a very small time interval. Whenever T3 has a falling edge, T1 and T2 signal is checked. When T1 has a falling/rising edge, the newly acquired image and corresponding parameters are pushed into a queue L. When T2 has a falling/rising edge, the first image and the parameters are popped out of the queue, and correspond - ing memory is released. When the queue L is empty, the reconstruction is complete, and all the time sequences and the queue are reset. Compared with the linear image chain system, the control sequences allow the usually less time consuming image acquisition process be carried out as the reconstruction progressing. In the experiment section we show that the latency hiding scheme can further improve system performance. Experiment design We test our proposed method from two aspects. First we show the reconstruction result of our methods. We use a Shepp–Logan numeric phantom for quantitative evalua- tion of reconstruction accuracy. We also show reconstruction results for phantoms of blood vessel, head and foot respectively. Then we discuss the efficiency of our proposed method. We first discuss solely the reconstruction process by comparing our proposed method with other methods either with different methodology, or with different acceler - ation technique. We then evaluate the system performance enhancement by introducing two acceleration ratio. The first ratio, system performance ratio β represents the sys- sys tem performance boost by comparing the system overall delay of our proposed system and a linear image chain system, yielding β = 1 − T /(T + T ) , where T sys prop recon acq prop is the average of the measured system delay of our proposed system, and T and T acq recon acquire image 01 acquire image 02 acquire image 03 acquire image 04 acquire image 05 acquire image 06 acquire image 07 acquire image 08 acquire image 09 acquire image 10 T1 reconstruct with image 01 reconstruct with image 02 reconstruct with image 03 reconstruct with image 04 reconstruct with image 05 T2 T3 push push push push push push push push pop pop pop pop pop Fig. 4 System control time sequences. At every falling edge of T3, the states of T1 and T2 are checked to decide a push or pop action on the projection queue. When the reconstruction of an arbitrary frame is in progress, the acquisitions for the next frames can be carried out at the same time to form a latency hiding scheme. The push and pop action controls the frame queue for reconstruction Chen et al. BioMed Eng OnLine (2018) 17:73 Page 7 of 14 are the average time consumption for projection acquisition and reconstruction pro- cess respectively. Another ratio, reconstruction acceleration ratio β aims to evaluate recon the reconstruction efficiency improvement provided by our proposed system, yielding β = T /(T − T ) . The average is acquired over a test data set of 10 gantry recon recon prop acq rotations of our C-arm mounted CBCT. System and environment setup We test our method on our designed C-arm imaging system. The C-arm DSD is 1000  mm, SAD is 500  mm. The X-ray source is imd X-RAY TUBE HEAD E-40R, the parameter is 65  kv 2  mA with an exposure time of 15  ms. The projection image has a dimension of 1560 × 1440 pixels, with a 0.18 × 0.18-mm resolution, with the acquisition angle averagely covers a range of 210°. A Quadro 6000 is used for GPU acceleration, with 256 threads in parallel. Experiment results and discussion 3D reconstruction evaluation We first discuss the 3D reconstruction result from our system, to show that our method does not compromise the reconstruction accuracy. We test our method on a numeric phantom for quantitative analysis by evaluating the reconstruction error with the ground truth. We also show reconstruction result of a blood vessel phantom, a head phantom and a foot phantom respectively, to show that our proposed method is capable of cor- rectly reconstructing the interested structure from actual projection data acquired from a clinical practical C-arm CBCT. The Shepp–Logan numeric phantom we use to test our method is shown in Fig.  5. We compare the reconstruction result and the ground truth on line profiles shown in Fig.  5, and the result is shown in Fig.  6. If we define the relative error between reconstruction � (abs(f −f )/f )∗ 100% rec 0 0 f and the ground truth f as I = , where N is the number of pix- rec 0 els counted, the relative error along the profile line is 2%, which suggests the reconstruc - tion basically preserves the feature of the phantom. In Fig 7 we show the reconstruction Fig. 5 Reconstruction result of a standard Shepp–Logan numeric phantom. a Standard Shepp–Logan phantom; b reconstruction result Chen et al. BioMed Eng OnLine (2018) 17:73 Page 8 of 14 standard shepp-logan reconstruc on 1.2 0.8 0.6 0.4 0.2 Fig. 6 Line profile difference between reconstruction and standard phantom along the dash line shown in Fig. 4. The black line represents the standard Shepp–Logan phantom; the red line represents reconstruction result. Along the line the relative error is 2% of a Elatras brain blood vessel phantom with 3 aneurysms. The head and foot phantom and the reconstruction results are shown in Figs. 8 and 9 respectively. Efficiency analysis In Table  1, we present the time consumption T of proposed GPU accelerated FDK, recon non-accelerated FDK, IPP accelerated FDK and an iterative TV norm regulated algebra reconstruction technique (TV-ART) algorithm, with different number of projection frames used for reconstruction. Theoretically the acceleration ratio should be equal to the number of parallel threads, which in our case should be 256. However, due to host– device communication and data transmission delay, only an acceleration ratio of 69 is achieved. Nevertheless, from Table  1 we can still show that GPU acceleration can sig- nificantly improve calculation efficiency, and also achieve a better performance enhance than existing acceleration techniques such as IPP, even when tested on a relatively low performance GPU. In Table 2, we present the total system delay of our proposed image chain T , and the prop time consumption T solely for projection acquisition. Combining Tables  1 and 2, we acq calculate the linear image chain system delay T = T + T and the reconstruc- linear acq recon tion delay of our proposed system defined as T = T − T , and a summary recon_prop prop acq is provided in Table 3. We compare the system delay of our proposed system T from prop Table 2 with the linear image chain system delay T from Table 3 to show the system linear performance improvement under different circumstances. We also compare the recon - struction delay of our proposed system T from Table 3 with the reconstruction recon_prop time T from Table  1 to show the reconstruction efficiency boost. The two accelera - recon tion ratios β and β are also calculated. We can see that when we use 90 frames sys recon of projection data, a linear image chain system delay T is 29 s, while our proposed linear method yields a system delay of 21.49 s, which implies a β of 26%. The reconstruction sys 267 Chen et al. BioMed Eng OnLine (2018) 17:73 Page 9 of 14 Fig. 7 Reconstruction result of a vessel phantom. a The brain blood vessel phantom with 3 cranial aneurysms. b MIP of reconstruction result. c–e Sagittal view, coronal view and transverse view delay T is 6.78 s, which yields a β of 2.1. When we use 120 frames in case 1, recon_prop recon the numbers are 46.29, 28.17 and 7.64  s respectively, yielding a β of 39% and a β sys recon of 3.3. We can infer that, as the frames of 2d projections used for 3D reconstruction increases, the system performance benefit more from our proposed method. We also simulate a remote connection situation in case 2 by connecting the system units through WAN while in case 1 the system units are connected via LAN with the same router. Although the system performance enhancement ratio β is 33.4%, the sys reconstruction delay decreases to 0.39 s, which is accelerated by β of 66 times. With recon the reconstruction optimization techniques, when we significantly increase the recon - struction efficiency and the projection acquisition consumption is dominant, the recon - struction delay can be very low. When the time consumption for reconstruction and project acquisition equals, the system performance benefit is maximized, and the recon - struction process can almost be hidden. The upper limit of the system performance Chen et al. BioMed Eng OnLine (2018) 17:73 Page 10 of 14 Fig. 8 Reconstruction result of a head phantom. a The head phantom. b MIP of reconstruction result. c–e Sagittal view, coronal view and transverse view enhancement β , which can be approached but not achievable, is 50%. A more straight sys forward comparison is shown in Fig. 10. Conclusion In this paper, we propose a GPU parallel acceleration based fast CBCT 3D reconstruc- tion method. We describe how the FDK algorithm is parallelized, and also a control time sequence designed to further improve efficiency by hiding system latency. We can see that our proposed method significantly improves system performance. GPU paral - lel acceleration significantly improves the FDK reconstruction process. Our designed latency hiding scheme further improves the system performance. When 90 frames of Chen et al. BioMed Eng OnLine (2018) 17:73 Page 11 of 14 Fig. 9 Reconstruction result of a foot phantom. a The foot phantom. b MIP of reconstruction result. c–e Sagittal view, coronal view and transverse view Table 1 Comparison of GPU, IPP accelerated FDK and GPU accelerated TV-ART algorithms time consumption T recon Non-accelerated FDK IPP FDK GPUFDK GPU ART Frames 90 90 90 120 120 Time 964 s 53.40 s 14.28 s 25.76 s 32 min Chen et al. BioMed Eng OnLine (2018) 17:73 Page 12 of 14 Table 2 Summary of system delay T and projection acquisition cost T prop acq System delay Projection acquisition cost Frames 90 120 case 1 120 case 2 90 120 case 1 120 case 2 Time (s) 21.49 28.17 50.53 14.72 20.53 50.14 Table 3 Summary of  linear system delay T , proposed reconstruction delay T , recon_prop linear system performance ratio β and reconstruction acceleration ratio β sys recon Linear system Proposed System performance Reconstruction delay (s) reconstruction delay ratio (%) acceleration (s) ratio 90 frames 29.00 6.78 26 2.1 120 frames LAN 46.29 7.64 39 3.3 120 frame WAN 75.90 0.39 33 66 Fig. 10 Efficiency comparison. We compare the linear image chain system with our proposed system design under different circumstance. (90 frames, 120 frames with LAN connection, 120 frames with WAN connection) up: the system delay comparison. We can conclude that our proposed design further improves the system performance. Down: the reconstruction delay comparison. We can conclude that the latency hide design significantly reduces reconstruction delay, and when other part of the image chain is dominant, the reconstruction process can be almost fully hidden, yielding nearly no reconstruction delay Chen et al. BioMed Eng OnLine (2018) 17:73 Page 13 of 14 projections are used for reconstruction, our proposed method improve system delay by 26% and the reconstruction delay by 2.1 times. When 120 frames are used, the numbers are 39% and 3.3 times. We also show that when the projection acquisition delay is domi- nant in the image chain, the reconstruction process can be almost fully hidden, yielding a significant improvement of reconstruction delay, which is 66 times in our case. Although the quality of the reconstruction volume may suffer from the approximate nature of the filtered back projection algorithms compared with iterative algorithms such as ART, we show that the features of interest are acceptably preserved. A typical ART kind algorithm as described in [16] may take more than 40  min for reconstruc- tion, while our proposed method only take 20+ seconds under the same circumstance. To trade off between image quality and real time requirement, our proposed method will be more suitable for clinical practice. A distributed system design with TCP/IP pro- tocol makes the system pluggable and adaptive. With this design, algorithm and hard- ware update of reconstruction techniques can be fulfilled more easily, the system is also prepared for further expansion, such as multi-task support and distant network medical applications. Authors’ contributions KC developed the algorithm and implemented the system, wrote the manuscripts. CW anticipated the system design and implementation, and helped the experiment. JX and YX oversaw the project. All authors read and approved the final manuscript. Author details 1 2 Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China. Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, China. Acknowledgements This work is supported partly by grants of National Key Research Program of China (Grant No. 2016YFC0105102), National Natural Science Foundation of China (No. 61403368), Union of Production, Study and Research Project of Guangdong Province (Grant No. 2015B090901039), Science Foundation of Guangdong (2017B020229002, 2014A030312006), Leading Talent of Special Support Project in Guangdong (2016TX03R139), Technological Breakthrough Project of Shenzhen City (Grant No. JSGG20160229203812944), Shenzhen High-level Oversea Talent Program Grant (KQJSCX20160301144248), Shenzhen Fundamental Research Project, Shenzhen Key Technical Research Project (JSGG20160229203812944) and Beijing Center for Mathematics and Information Interdisciplinary Sciences. Competing interests The authors declare that they have no competing interests. Ethics approval and consent to participate Not applicable. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Received: 8 March 2018 Accepted: 23 May 2018 References 1. Hatakeyama Y, Kakeda S, Korogi Y, Ohnari N, Moriya J, Oda N, Nishino K, Miyamoto W. Intracranial 2d and 3d dsa with flat panel detector of the direct conversion type initial experience. Eur Radiol. 2006;16:2594–602. 2. Jaffray D, Siewerdsen J. Conebeam computed tomography with a flat-panel imager: initial performance characteri- zation. Med Phys. 2000;27(6):1311–23. 3. Xing L, Thorndyke B, Schreibmann E, Yang Y, Li TF, Kim GY, Luxton G, Koong A. Overview of image-guided radiation therapy. Med Dosim. 2006;31(2):91–112. 4. Ishikura R, Ando K, Nagami Y, et al. Evaluation of vascular supply with cone-beam computed tomography during intraarterial chemotherapy for a skull base tumor. Radiat Med. 2006;24(5):384–7. 5. Siewerdsen JH, Jaffray DA, Edmundson GK, Sanders WP, Wong JW, Martinez A. Flat-panel cone-beam CT: a novel imaging technology for image-guided procedures. Proc SPIE. 2001;4319:435–44. Chen et al. BioMed Eng OnLine (2018) 17:73 Page 14 of 14 6. Jaffray DA, Siewerdsen JH, Edmundson GK, Wong JW, Martinez A. Flat-panel cone-beam CT on a mobile isocentric c-arm for image-guided brachytherapy. Proc SPIE. 2002;4682:209–17. 7. Siewerdsen JH, Moseley DJ, Burch S, Bisland SK, Bogaards A, Wilson BC, Jaffray DA. Volume CT with a flat-panel detector on a mobile, isocentric C-arm: pre-clinical investigation in guidance of minimally invasive surgery. Med Phys. 2005;32(1):241–54. 8. Khoury A, Whyne CM, Daly MJ, Moseley DJ, Bootsma G, Skrinskas T, Siewerdsen JH, Jaffray DA. Intraoperative cone-beam CT for correction of periaxial malrotation of the femoral shaft: a surfacematching approach. Med Phys. 2007;34(4):1380–7. 9. Siewerdsen JH, Chan Y, Rafferty MA, Moseley DJ, Jaffray DA, Irish JC. Cone-beam CT with a flat-panel detector on a mobile C-arm: pre-clinical investigation in image-guided surgery of the head and neck. In: Galloway RL, Cleary KR, editors. Medical imaging. Proceedings of SPIE, SPIE, Bellingham, vol. 5744; 2005. pp. 789–797 10. Chan Y, Siewerdsen JH, Rafferty MA, Moseley DJ, Jaffray DA, Irish JC. Cone-beam, CT on a mobile C-arm: a novel intraoperative imaging technology for guidance of head and neck surgery. Proc SPIE. 2001;4319:435–44. 11. Karamessini MT, Kagadis GC, Petsas T, Karnabatidis D, Konstantinou D, Sakellaropoulos GC, Nikiforidis GC, Siablis D. CT angiography with three-dimensional techniques for the early diagnosis of intracranial aneurysms. Comparison with intra-arterial DSA and the surgical findings. Eur J Radiol. 2004;49(3):212–23. 12. van Rooij W, Sprengers M, de Gast A, Peluso J, Sluzewski M. 3d rotational angiography: the new gold standard in the detection of additional intracranial aneurysms. Am J Neuroradiol. 2008;29(5):976–9. 13. Orth RC, Wallace MJ, Kuo MD. C-arm cone-beam CT: general principles and technical considerations for use in interventional radiology. J Vasc Interv Radiol. 2008;19(6):814–20. 14. Floridi C, Radaelli A, Abi-Jaoudeh N, Grass M, Lin MD, Chiaradia M, Geschwind J-F, Kobeiter H, Squillaci E, Maleux G, Giovagnoni A, Brunese L, Wood B, Carrafiello G, Rotondo A. C-arm cone-beam computed tomography in interven- tional oncology technical aspects and clinical applications. Radiol Med. 2014;119(7):521–32. 15. Niu T, Ye X, Fruhauf Q, Petrongolo M, Zhu L. Accelerated barrier optimization compressed sensing (ABOCS) for CT reconstruction with improved convergence. Phys Med Biol. 2017;59(7):1801–14. 16. Park JC, Song B, Kim JS, Park SH, Kim HK, Liu Z, Suh TS, Song WY. Fast compressed sensing-based CBCT reconstruc- tion using Barzilai–Borwein formulation for application to on-line IGRT. Med Phys. 2012;39(3):1207–17. 17. Sharp G, Kandasamy N, Singh H, Folkert M. GPU-based streaming architectures for fast cone-beam CT image recon- struction and demons deformable registration. Phys Med Biol. 2007;52(19):5771–83. 18. Despres P, Jia X. A review of GPU-based medical image reconstruction. Phys Medica. 2017;42:76–92. 19. Leeser M, Mukherjee S, Brock J. Fast reconstruction of 3D volumes from 2D CT projection data with GPUs. BMC Res Notes. 2014;7:582. 20. Feldkamp L, Davis L, Kress J. Practical cone-beam algorithm. J Opt Soc Am A. 1984;1(6):612–9. Ready to submit your research ? Choose BMC and benefit from: fast, convenient online submission thorough peer review by experienced researchers in your field rapid publication on acceptance support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year At BMC, research is always in progress. Learn more biomedcentral.com/submissions

Journal

BioMedical Engineering OnLineSpringer Journals

Published: Jun 5, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off