Robust and semantic needle detection in 3D ultrasound using orthogonal-plane convolutional neural networks

Robust and semantic needle detection in 3D ultrasound using orthogonal-plane convolutional neural... Purpose During needle interventions, successful automated detection of the needle immediately after insertion is necessary to allow the physician identify and correct any misalignment of the needle and the target at early stages, which reduces needle passes and improves health outcomes. Methods We present a novel approach to localize partially inserted needles in 3D ultrasound volume with high precision using convolutional neural networks. We propose two methods based on patch classification and semantic segmentation of the needle from orthogonal 2D cross-sections extracted from the volume. For patch classification, each voxel is classified from locally extracted raw data of three orthogonal planes centered on it. We propose a bootstrap resampling approach to enhance the training in our highly imbalanced data. For semantic segmentation, parts of a needle are detected in cross-sections perpendicular to the lateral and elevational axes. We propose to exploit the structural information in the data with a novel thick-slice processing approach for efficient modeling of the context. Results Our introduced methods successfully detect 17 and 22 G needles with a single trained network, showing a robust generalized approach. Extensive ex-vivo evaluations on datasets of chicken breast and porcine leg show 80 and 84% F1- scores, respectively. Furthermore, very short needles are detected with tip localization errors of less than 0.7 mm for lengths of only 5 and 10 mm at 0.2 and 0.36 mm voxel sizes, respectively. Conclusion Our method is able to accurately detect even very short needles, ensuring that the needle and its tip are maximally visible in the visualized plane during the entire intervention, thereby eliminating the need for advanced bi-manual coordination of the needle and transducer. Keywords Needle detection · 3D ultrasound · Convolutional neural networks Introduction needle and US transducer is challenging, as the limited US field of view obscures the visualization of the complete nee- Ultrasound (US) imaging is broadly used to visualize and dle and an inadequate view leads to an erroneous placement guide the interventions that involve percutaneous advancing of the needle tip. Therefore, while advancing the needle, con- of a needle to a target inside the patients’ body. However, tinuous manipulation of the transducer is necessary to search for a typical 2D US system, bi-manual coordination of the for the needle in the imaging data for the best needle plane visualization. As an alternative, 3D US transducers with an image-based needle-tracking system can overcome these B Arash Pourtaherian a.pourtaherian@tue.nl limitations and minimize the manual coordination, while pre- serving the use of a conventional needle, signal generation Eindhoven University of Technology, 5612 AJ Eindhoven, and transducers [12]. In such a system, the needle is conve- The Netherlands niently placed in the larger 3D US field of view and the Philips Research Eindhoven, 5656 AE Eindhoven, The processing unit automatically localizes and visualizes the Netherlands entire needle. Therefore, the manual skills are significantly Philips Healthcare, Bothell, WA 98021, USA simplified when the entire needle is visible in the visualized Catharina Hospital Eindhoven, 5623 EJ Eindhoven, The plane, after the needle is advanced or the transducer is moved. Netherlands 123 International Journal of Computer Assisted Radiology and Surgery Several image-based needle localization techniques have accuracy, while small patches allow the network to infer from been proposed based on maximizing the intensity over par- only parts of the needle. allel projections [1]. Due to the complexity of realistic data, As an alternative to patch training, semantic segmentation methods that solely rely on the brightness of the needle are not methods can generate dense prediction maps by omitting robust for localizing thin objects in a cluttered background. the use of fully connected layers. Examples of such net- Therefore, information regarding the line-like structure of a works are fully convolutional networks (FCN) [7,16], and needle is used by Hessian-based line filtering methods [21]. context modeling by employing atrous convolutions [3,24]. Although shown to be limited in localization accuracy [12], Although integrating atrous (or dilated) convolutions in the they can be beneficial for reducing the imaging artifacts. deep layers of the network increases the field of view while Other techniques involve exploiting the intensity changes preserving the spatial dimensions, applying convolutions in caused by needle movement to track the needle in the US a large number of high-resolution feature maps is computa- data [2]. Nevertheless, large movements of the transducer or tionally expensive. However, original FCN architectures can the patient will increase the difficulty of motion-based track- simultaneously exploit the global and local information in ing and therefore, we aim at repeated detection in static 3D the data and remain more memory efficient by introducing volumes. When realizing the real-time operation, tracking of skip connections from higher-resolution feature maps to the the needle is implemented by repeated detection with suffi- deconvolutional layers. Initial attempts of applying these net- cient time resolution. This will result in detection per volume works on US data are presented for fetal heart segmentation in in a 4D US sequence, which allows for arbitrary inter-volume 2D US [19]. Further improvement is shown for segmentation movements. of fetus, gestational sac, and placenta in 3D US volumes by More recently, attenuation of the US signal due to energy integrating the sequential information [23]. The drawback of loss beyond the US beam incident with the needle, is used using such 3D+time models are, however, the exponentially to detect the position of the needle [8,11]. However, signal increased computational complexity of the 3D convolution loss due to the presence of other attenuating structures may operations, a very large dataset is required for training the degrade the accuracy of estimation and must be explicitly increased number of network parameters, and the sequential handled. Alternatively, supervised needle-voxel classifiers modeling will be suboptimal in the early timesteps after large that employ the needle shape and its brightness have shown movements of the transducer or subject. to be superior to the traditional methods [12]. Nevertheless, In this paper, we build upon our recent contribution using as the needle is assumed to be already inserted in the volume CNN [10] and extend it to create semantic FCN models. We up to a considerable length, they typically do not achieve high modify and extend this architecture such that it models 3D detection precision and therefore cannot localize the needle needle context information and achieves high needle seg- when it is partially inserted in the volume. Moreover, when mentation precision at a low false negative rate. We propose the target structure is deep, the degraded resolution and pos- a novel multi-view thick-sliced FCN model for an efficient sible needle deflections further complicate the interpretation modeling of 3D context information. Therefore, the system of data and reduce voxel classification performance, which will successfully perform needle localization both in cases should be addressed by better modeling of both local and of invisible needle parts and when only a short part of the contextual information. needle is inserted into the patient’s body, yielding an early In our recently published work in training convolu- correction of inaccurate insertions. The needle is then visual- tional neural networks (CNN), substantial improvement has ized in an intuitive manner, eliminating the need for advanced been shown to the detection accuracy of needle voxels in manual coordination of the transducer. The main contribu- 3D US data [10]. Although this method was shown to tions of this paper are: (1) a novel approach for segmentation achieve high-performance results for ex vivo data acquired and localization of partially inserted and partly invisible from linear-array transducers, the choice of patch classifi- needles in 3D US data using CNN models, (2) an original cation in this framework can be further improved for US update strategy for CNN parameters using most-aggressive data segmentation. In US imaging using sector, curved and non-needle samples, which significantly improves the per- phased-array transducers, the insonification angle of the US formance on highly imbalanced datasets, (3) a novel method beams is changed throughout the volume, which creates vary- for modeling of 3D US context information using 2.5D (thick ing angles with different parts of the needle. Therefore, the slice) data, which enables an accurate training of the network needle can be partially invisible, due to the lack of received using limited training samples, and (4) extensive evaluation US reflections from parts of the needle. The missed data of the proposed methods on two types of ex-vivo data giv- enforces a trade-off between patch sizes for richer needle con- ing a very high average of 81.4% precision at 88.8% recall text information and localization accuracy. Larger patches rate. require more max-pooling layers that reduce the localization 123 International Journal of Computer Assisted Radiology and Surgery Patch Classification Needle-Axis Estimation and Visualization Semantic Segmentation Fig. 1 Block diagram of our proposed framework for needle detection in 3D US data Methods CNN architecture For our experiments, we evaluate two CNN architectures based on shared convolutional (ShareCNN) and The block diagram of our proposed framework consists of independent convolutional (IndepCNN) filters. In ShareCNN, three main stages as depicted in Fig. 1. In this study, we a single convolutional filter bank is trained for the three input introduce two different approaches for segmentation of nee- planes to have the same set of filters for all the planes. In Inde- dle voxels in 3D US volumes, i.e., classification of extracted pCNN, three sets of filter banks are trained independently, patches in the data using their triplanar orthogonal views each to be convolved with one of the three planes. As depicted (“Patch classification” subsection), and end-to-end dense in Fig. 2, both architectures consist of four convolutional lay- segmentation of needle voxels in multi-view thick slices ers having 32, 48, 64 and 96 filters of 3 × 3 kernel size, three (“Semantic segmentation” subsection). The segmentation fully connected layers having 128, 64 and 2 neurons, and results are then used to fit a predefined model of the needle one softmax layer. According to the given number of fil- and extract the cross-section containing the entire needle and ters, ShareCNN and IndepCNN architectures have 2160 and the tip (“Needle axis estimation and visualization” subsec- 6480 parameters in their convolutional layers, respectively. tion). For clarity, we emphasize here that we detect the plane In both architectures, extracted feature maps after the last where the needle and its tip are maximally visible, but do not convolutional layer are concatenated prior to the fully con- explicitly detect the needle tip. This localization processing nected layers [14]. is done for every data volume individually. For a 3D+time CNN training Our dataset is significantly imbalanced due to US sequence, this would effectively mean repeated detection the small size of a needle compared to the full volume, i.e., for every volume or image. approximately only 1 voxel out of 3000 voxels in a volume belongs to the needle. This is common in representation of an instrument in 3D US volumes. Therefore, in order to avoid Patch classification a prediction bias toward the majority class, we downsample the negative training data to match the number of needle sam- The block diagram of the proposed patch classification tech- ples. For an informed sampling of the negative (non-needle) nique is shown in Fig. 2. A CNN model is trained to robustly set, we propose an iterative scheme based on bootstrapping classify the needle voxels in the 3D US volumes from other [15] to achieve the maximum precision. In the first step, we echogenic structures, such as bones and muscular tissues. train our network with uniformly sampled needle and non- Our voxel classification network predicts the label of each needle patches. Training patches are rotated arbitrarily by 90 voxel from the raw voxel values of local proximity. In a 3D steps around the axial axis to improve the orientation invari- volume, this local neighborhood can simply be a 2D cross- ance. The trained network then classifies the same training section in any orientation, multiple cross-sections, or a 3D set for validation. Finally, misclassified false positives are patch. Here, we use three orthogonal cross-sections centered harvested as the most-aggressive non-needle voxels, which at the reference voxel, which is a compromise with respect are used to update the network. Figure 3 shows how the itera- to the complexity of the network. The size of triplanar cross- tive and informed sampling can increase the precision of the sections is chosen based on the diameter of a typical needle network. It is worth mentioning that commonly used meth- (0.7–1.5 mm), voxel size, and spatial resolution of the trans- ods for imbalanced data, like weighted loss function, do not ducer, to contain sufficient context information. We extract necessarily improve precision. For example, the majority of triplanar cross-sections of 21 × 21 pixels (4.2 × 4.2 mm), our negative set consists of “easy” samples that can be clas- which provides sufficient global shape information and still sified beyond the model’s margin and will influence the loss remains spatially accurate. For low-frequency transducers, function in their favor. more context is required for a discriminative modeling, as The CNN parameters are trained using stochastic gradi- the structure details of a needle will be distorted at low spa- ent descent (SGD) and the categorical cross-entropy cost tial resolutions. 123 International Journal of Computer Assisted Radiology and Surgery Fig. 2 Block diagram of the patch classification approach using CNN Latent variable 1 Latent variable 1 Latent variable 1 Feature-space of an imbalanced dataset Step 1. Random negative sampling Step 2. Most-aggressive negative sampling Fig. 3 An example of the iterative sampling strategy to increase the precision of the network. The red circles represent the positive data points, gray and blue triangles are the negative and sampled data points, respectively and the dashed line represents the decision boundary of a classifier function. All activation functions are chosen to be rectified thermore, in contrast to the patch-based methods, where linear units (ReLU) [5]. Furthermore, for optimization of the redundant processing of voxels is inevitable, FCN models network weights, we divide the learning rate by the exponen- are more computationally efficient as they exploit the one- tially weighted average of recent gradients (RMSProp) [20]. time extracted features to simultaneously label all the data −4 −5 Initial learning rates are chosen to be 10 and 10 for points using deconvolutional networks. train and update iterations, respectively. In order to prevent Figure 4 shows the architecture of the proposed seman- overfitting, we implement the dropout approach [18] with tic needle segmentation technique in 3D US volumes. Our a probability of 0.5 in the first two fully connected layers. method is based on decomposing the 3D volume into 2D The trained network computes a label per voxel indicating cross-sections for labeling the needle parts and reconstruct- whether it belongs to the needle or not. ing the 3D needle labels from the multiple views. Therefore in our approach, the number of parameters in the convolution kernels decreases exponentially compared to the 3D kernels, Semantic segmentation and consequently, the network requires fewer training sam- ples and executes faster. We will now present our strategy for As discussed in “Introduction” section, semantic segmenta- selecting the cross-sections to be processed. tion of a needle using FCN architectures are more interest- The 2D cross-sections are selected in multiple directions ing than patch classification as the context information is and perpendicular to the transducer with the step size equal to modeled, while the spatial dimensions are preserved. Fur- Latent variable 2 Latent variable 2 Latent variable 2 International Journal of Computer Assisted Radiology and Surgery the voxel size. Since in a 3D US volume, the needle can enter function. The learning rate is adaptively computed using the the field of view from either the lateral or elevational direc- ADAM optimization method [6] with an initial learning rate tions, we consider cross-sections perpendicular to these axes. equal to 1e−4. Furthermore, dropout layers with a probability The segmentation outcome of each cross-section is mapped of 0.85 are added to the layer numbers 17 and 18 of the onto its corresponding position in 3D. Afterward, the result- convolution network. ing probability volume from the two directions is combined together using multiplicative averaging to create the final Needle axis estimation and visualization labeling outcome in 3D. In order to exploit the 3D structural information in our In order to robustly detect the instrument axis in the presence model, instead of only using 2D planar data, we opt for of outliers, we fit a model of the needle to the detected voxels processing the consecutive cross-sections before and after using the RANSAC algorithm [4]. The needle model can be the processing plane as additional inputs to the network. In represented by a straight cylinder having a fixed diameter. In this study, we add two additional cross-sections and evaluate cases of large instrument deflection, the model can be adapted several spacing gaps d, between them. Therefore, as shown to define a parabolic segment, as shown in [9]. Using the in Fig. 4, a 3-channel input to the network is formed from RANSAC algorithm, the cylindrical model that contains the the 2.5D (thick slice) US data at a specific position, which highest number of voxels is chosen to be the estimated needle. is used to create a 2D segmentation map of the processing As the experimented needle diameters are less than 2 mm, cross-section. we set the cylindrical model diameter to be approximately FCN architecture Figure 4 depicts the FCN architecture 2 mm. used in our system comprising two stages of convolution After successful detection of needle axis, the 2D cross- and deconvolution networks. Inspired by ShareCNN, we use section of the volume is visualized that contains the plane shared convolution filters (ShareFCN) for both lateral and with the entire needle with maximum visibility, which is also elevational planes. The convolution network is identical to perpendicular to coronal (xy) planes. This cross-section is the the design of the VGG very deep 19 layer CNN [17]. The in-plane view of the needle that is very intuitive for physicians deconvolution network consists of three unpooling masks of to interpret. This ensures that while advancing the needle, the 2, 2 and 8 pixels, respective convolution layers having 512, entire instrument is visualized as much as it is visible and any 256 and 2 filters with 3×3 kernel size, and one softmax layer. misalignment of the needle and target is corrected without Therefore, the receptive field of the network is equal to a win- maneuvering the transducer. dow of 96 × 96 pixels, which is equivalent to approximately 19.2×19.2 mm for higher-resolution VL13-5 transducer and Implementation details 34.5×34.5 mm for lower-resolution X6-1 transducer, achiev- ing a large context modeling at the same inference resolution Our Python implementations of the proposed patch classifi- of the input data. Convolution layers are stacked together fol- cation and semantic segmentation methods take on average lowed by an activation function. As discussed, the network 74 and 0.5 µs for each voxel, respectively (1180 and 15 ms takes a 3-channel 2D input and the output layer indicates the for each 2D cross-section) on a standard PC with a GeForce class-assignment probability for each pixel in the processing GTX TITAN X GPU. Therefore, when implementing a full cross-section. scan to process all voxels and cross-sections in the volume, FCN training The training set consists of 3-channel cross- patch classification executes in 4–5 min, whereas semantic sections extracted with a gap of d mm in both elevational segmentation takes only 2–3 s. Nevertheless, further opti- and lateral directions. The training volumes are augmented mization is possible using conventional techniques such as a by 10 arbitrary rotations around the axial (z) axis prior to coarse-fine search strategy with a hierarchical grid to achieve extraction of the cross-sections. Therefore, several views of real-time performance. Furthermore, the execution time of the needle are used to train the network, including in-plane, RANSAC model fitting is negligible, as the expected num- out-of-plane and cross-sections with partial visibility of the ber of outliers is very small. needle. Similar to our approach presented in “Patch classi- The required computational power for realization of our fication” subsection, we downsample the negative training proposed methods is expected to be widely available on high- data, which are the sections that do not contain the nee- end ultrasound devices that benefit from parallel computing dle, to match the number of cross-sections from the needle. platforms, such as a GPU. However, for implementation However, since the initial training samples are not highly in mid-range and portable systems, more efficient and imbalanced, we do not perform bootstrapping for training compressed architectures should be investigated. Still, ever- the FCN parameters. increasing computational capacity of mobile processors, as We trained the network parameters using SGD update with well as fast development and availability of on-board embed- a batch size of one sample and softmax cross-entropy cost ded units with pre-programmed convolutional modules will 123 International Journal of Computer Assisted Radiology and Surgery Fig. 4 Block diagram of the semantic segmentation approach using FCN Table 1 Specifications and experimental settings of 3D US volumes used for evaluation Tissue type / transducer Needle type and diameter Experimental settings Voxel size (mm) # of vols. Maximum length (mm) Steepness angles a ◦ ◦ Chicken breast/ VL13-5 17 G (1.47 mm) 10 30 10 −30 0.20 ◦ ◦ 22 G (0.72 mm) 10 30 5 −50 0.20 a ◦ ◦ Porcine leg/ X6-1 17 G (1.47 mm) 10 45 55 −80 0.36 ◦ ◦ 22 G (0.72 mm) 10 35 20 −65 0.36 Available from Philips Healthcare, Bothell, WA, USA make such computer-aided applications more affordable and 0.36 mm/voxel. Ground-truth data is created by manually readily accessible to the majority of ultrasound devices. annotating the voxels belonging to the needle in each volume. Testing evaluation is performed based on five-fold cross- validation separately for each transducer across its 20 ex-vivo 3D US volumes. For each fold, we use 4 subsets for training Experimental results and 1 subset for testing, to make the training and testing data completely distinct. The evaluation dataset consists of four types of ex-vivo US data acquired from chicken breast and porcine leg using a VL13-5 transducer (motorized linear-array) and a X6-1 Patch classification transducer (phased-array). Our experiments with two types of transducers and tissue types investigate the robustness We use the dataset from chicken breast to evaluate the of the proposed methods in various acquisition settings performance of the proposed patch classification method. and conditions. Properties and specifications of our dataset Capability of the network to transform the input space to are summarized in Table 1. Each volume from VL13-5 meaningful features is visualized using a multi-dimensional transducer contains on average 174 × 189 × 188 voxels scaling that projects the representation of feature space onto a (lat. × ax. × elev.), at 0.2 mm/voxel and from X6-1 trans- 2D image. For this purpose, we applied t-distributed Stochas- ducer contains 452 × 280 × 292 voxels, at approximately tic Neighbor Embedding (t-SNE) [22] to the first fully 123 International Journal of Computer Assisted Radiology and Surgery connected layer of the network. The result of the multi- channels consist of parallel cross-sections having d mm gap dimensional projection of the test set in one of the folds is between them. In this section, we investigate the contribution depicted in Fig. 5, where close points have similar charac- of our multi-slicing approach for increasing the segmentation teristics in the feature space. As shown, the two clusters are accuracy of individual cross-sections and identify the optimal clearly separated based on the features learned by the net- d for each type of data and needle. work. Figure 7 depicts the bar chart of the measured improve- Performance of our proposed methods is evaluated in the ment of the F1-scores for each dataset and choice of d full volumes and the results are shown in Table 2, listing compared to 1-channel single-slice input. The F1-scores are voxel-level recall, precision, specificity and F1-score. Recall calculated after cross-validation of the predictions on parallel is the sensitivity of detection and is defined as the number cross-sections to the lateral and elevational axes. As shown, of correctly detected needle voxels divided by the number adding extra consecutive cross-sections for segmentation of of voxels belonging to the needle. Precision or the positive the needle increases the performance in all the cases. How- predictive value is defined as the number of correctly detected ever, when the distance d is too large, the visible structures needle voxels divided by the total number of detected needle in the extracted cross-sections cannot be co-related to each voxels. Specificity is defined as the number of voxels that other any longer and therefore the performance gain will are correctly detected as non-needle divided by the number decrease. As shown in Fig. 7, the spacing values of 1.3 and of voxels that are not part of the needle. Finally, F1-score 2.0 mm gain the highest improvement in the F1-score, while is calculated as the harmonic mean between the voxel-based the results for 2.0 mm are more stable. Therefore, we choose recall and precision and is used to measure the similarity d = 2.0 mm as the optimal spacing among the consecutive between the system detections and the ground-truth labels. cross-sections and use it in the following experiments. Furthermore, we compare the results with the approach of [13], which is based on supervised classification of vox- els from their responses to hand-crafted Gabor wavelets. As Voxel segmentation performance shown, both shared CNN and independent CNN architectures outperform the Gabor features yielding a 25% improvement Our proposed method based on dense needle segmentation on F1-score. Furthermore, ShareCNN achieves higher pre- in multi-slice thick (2.5D) US planes is evaluated in terms cision than IndepCNN at approximately similar recall rate. of recall, precision and specificity, as defined in “Patch clas- The degraded performance of IndepCNN can be explained sification” section. Table 3 shows the obtained voxel-wise by the large increase in the number of network parameters in performances on both chicken breast and porcine leg datasets. our small-sized data. As shown, the proposed ShareFCN architecture, achieves Figure 6 shows examples of the classification results for 17 very high recall and precision scores in both chicken breast and 22 G needles. As shown in the left column, detected nee- and porcine leg datasets. dle voxels correctly overlap the ground-truth voxels, which To study the performance of our trained networks in seg- results in a good detection accuracy. Furthermore, example menting needle voxels, we visualize the response of the patches from true and false positives are visualized, which intermediate feature layers to needle cross-sections. For this show a very high local similarity. Most of the false negative purpose, the reconstructed patterns from the evaluation set patches belong to the regions with other nearby echogenic that cause high activations in the feature maps are visualized structures, which distorts the appearance of the needle. using the Deconvnet, as proposed by Zeiler et al. [25]. Fig- ure 8 shows the input stimuli that creates largest excitations Semantic segmentation of individual feature maps at several layers in the model, as well as their corresponding input images. As shown, both We evaluate the performance of our proposed semantic seg- networks trained for VL13-5 and X6-1 transducers improve mentation method on both datasets from chicken breast and the discriminating features of the needle and remove the porcine leg. As shown in Table 1, the data of porcine leg are background as the network depth increases. However, it is acquired using a phased-array transducer, in which the nee- interesting to notice the different modeling behavior of the dle appearance will be more inconsistent due to the varied network in convolution layers 12, 20 and 21 for the two trans- reflection angles of the backscattered beams. ducers. In the dataset acquired using the VL13-5 transducer, the higher-frequency range creates more strong shadow cast- Data representation in 2.5D ings below the needle in the data. Therefore, as it can be observed in Figure 8a, the trained network additionally mod- As discussed in “Semantic segmentation” section, we use a els the dark regions in layers 12 and 20 and fuses them to the 3-channel input to the FCN network for better modeling of shape and intensity features extracted in the shallower layers the 3D structures from 2.5D (thick slice) US data. The three of the network. 123 International Journal of Computer Assisted Radiology and Surgery Fig. 5 Multi-dimensional projection of voxels in the test set using the t-SNE algorithm. Red and blue points represent needle and non-needle voxels, respectively Latent variable 1 Table 2 Average voxel Method Recall Precision Specificity F1-score classification performances in the full volumes of chicken a Gabor transformation [13] 47.1 48.2 – 53.7 breast (%) ShareCNN 76.3 ± 5.8 83.2 ± 5.6 99.98 ± 4e−5 78.5 ± 5.3 IndepCNN 78.4 ± 5.3 64.7 ± 4.8 99.97 ± 6e−5 66.1 ± 4.9 Two models trained separately for each needle (averaged) Single model trained directly for both needles True Positives False Positives False Negatives True Positives False Positives False Negatives Fig. 6 Examples of classification results for 17 and 22 G needles. (Left) Detected needle voxels in 3D volumes shown in red and ground-truth voxels in green. (Right) Triplanar orthogonal patches classified as true positive, false positive, and false negative Figure 9 shows examples of the segmentation results in and after combining the results, the needle voxels are recov- cross-sections perpendicular to the lateral and elevational ered and detected in 3D. axes. As shown, the segmentation is very accurate for all the cases of a needle being entirely visible in a cross-section, Axis estimation accuracy partially acquired or being viewed from the out-of-the-plane cross-sections. In particular, Fig. 9d depicts a case of a needle Because of the high detection precision achieved with both with a relatively large horizontal angle with the transducers, ShareCNN and ShareFCN approaches, estimation of the nee- which results in the needle being partially acquired in all the dle axis is possible even for short needle insertions after processed cross-sections. As it can be seen, visible parts of a simple RANSAC fitting. The accuracy of our proposed the needle at each cross-section are successfully segmented ShareCNN and ShareFCN methods in localizing the needle 22G needle 17G needle Latent variable 2 International Journal of Computer Assisted Radiology and Surgery Fig. 7 Improvements of d = 0.7 mm F1-scores for each choice of d, d = 1.3 mm which is the gap between the 0.5 d = 2.0 mm consecutive slices used as input d = 2.8 mm to the three channel FCN d = 3.6 mm 0.4 network 0.3 0.2 0.1 0.0 Chicken breast Chicken breast Porcine leg Porcine leg 17G needle 22G needle 17G needle 22G needle Table 3 Average voxel Method Tissue Recall Precision Specificity F1-score classification performances of semantic segmentation approach a ShareFCN Chicken breast 89.6 ± 4.2 79.8 ± 5.5 99.97 ± 1e−4 80.0 ± 4.7 in the full volumes (%) Porcine leg 87.9 ± 4.2 83.0 ± 3.7 99.99 ± 1e−5 84.1 ± 3.4 Single model trained directly for both 17 and 22 G needles Input Layer 2 Layer 4 Layer 8 Layer 12 Layer 20 Layer 21 Layer 22 (a) Input Layer 2 Layer 4 Layer 8 Layer 12 Layer 20 Layer 21 Layer 22 (b) Fig. 8 Visualization of features projected into the input space in the 2, 4, 8, 12, 20, 21, and 22. Note the skip connections in the layers to fuse trained model a for Linear-array VL13-5 transducer and b for Phased- coarse, semantic and local features. The ground-truth needle is marked array X6-1 transducer. Reconstruction of the input image is shown by with a yellow arrowhead in input images using only the highest activated features after the convolutional layers axis is evaluated as a function of the needle length as por- lated as the point-plane distance of the ground-truth needle trayed by Fig. 10. We use two measurements for defining and tip and the detected needle plane. The orientation error (ε )is evaluating spatial accuracy. The needle tip error (ε ) is calcu- the angle between the detected and the ground-truth needle. Short axis Long axis Short axis Long axis F1-score Improvement International Journal of Computer Assisted Radiology and Surgery 17G needle Linear-array trans. 5–13 MHz (a) 22G needle Linear-array trans. 5–13 MHz (b) 17G needle Phased-array trans. 1–6 MHz (c) 22G needle Phased-array trans. 1–6 MHz (d) Fig. 9 Examples of segmentation results of a 17 G and b 22 G needles images in the bottom row are the segmentation results. Volumes on the in chicken breast acquired with a linear-array transducer, and c 17 G right side show the segmented needle voxels after combining the results and d 22 G needles in porcine leg acquired with a phased-array trans- from both lateral and elevational directions in red and the ground-truth ducer. Images in the top row are input cross-sections to the network and voxels in green As discussed earlier, we do not explicitly detect the needle voxels in the first 2 mm are undetectable, as the minimum tip, but detect the plane where the needle and its tip are max- distance of extracted 3D patches from the volume borders imally visible. This localization processing is done for every corresponds to half of a patch length. individual data point, leading to repeated detection in cases In the datasets of porcine leg, the voxel size is reduced of a 3D+time US sequence. to 0.36 mm due to the lower acquisition frequency of the As shown in Fig. 10a, b, both ShareCNN and ShareFCN phased-array transducer. Therefore, longer lengths of the methods perform accurately in the datasets of chicken breast, needle are required for accurate detections. As shown in reaching ε of less than 0.7 mm for needle lengths of approx- Fig. 10c, for needles of approximately 10 mm or longer, the imately 5 mm or larger. In both approaches, the ε shows ε reduces to 0.7 and 0.6 mm for 17 and 22 G needles, respec- v t more sensitivity to shorter needles and varies more for the tively. In contrast to the datasets of chicken breast, the ε is 22 G needle, which is more difficult to estimate, compared generally larger for short 17 G needles in porcine leg. Most to a thicker needle. Furthermore, for the ShareCNN method, importantly, in all of the experiments, the needle tip error, ε , 123 International Journal of Computer Assisted Radiology and Surgery 4 4 4 17G needle 17G needle 17G needle 3 3 3 22G needle 22G needle 22G needle 2 2 2 1 1 1 0 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 Needle length (mm) Needle length (mm) Needle length (mm) 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 Needle length (mm) Needle length (mm) Needle length (mm) (a) (b) (c) Fig. 10 Needle tip position error (ε ) and orientation error (ε )asa size ≈ 0.20 mm. b ShareFCN results in chicken breast dataset voxel t v function of needle length. Dashed lines represent standard errors of the size ≈ 0.20 mm. c ShareFCN results in porcine leg dataset voxel size measured values. a ShareCNN results in chicken breast dataset voxel ≈ 0.36 mm remains lower than 0.7 mm. This shows that after insertion of that a considerable portion of the needle shaft can be virtu- only 5 mm for higher-resolution linear-array transducers and ally invisible in the data. Therefore, the receptive field of a 10 mm for lower-resolution phased-array transducers, the tip convolutional network need to be large enough to model the will be always visible in the detected plane as their distance contextual information from the visible parts of the needle. In is less than the thickness of US planes. a patch-based classification technique, a larger receptive field can be achieved by, e.g., increasing the patch size, increas- ing the number of convolution and max-pooling layers, or employing normal or atrous convolutions with larger kernel Discussion sizes. In all of these methods, the computational complexity increases exponentially as more redundant calculations have From comparing the results reported in Tables 2 and 3, it can to be computed for adjacent patches, and the spatial accuracy be concluded that the performance of ShareFCN is compara- decreases as small shifts of patches cannot be translated to ble and only slightly better than the patch-based ShareCNN two different classes. on chicken breast data acquired from the higher-frequency range VL13-5 linear-array transducer. However, a major ben- efit of dense segmentation using ShareFCN is related to the data of the lower-frequency range X6-1 phased-array Conclusions transducer. The resulting lower spatial resolution of these transducers distorts the appearance and obscures structure Ultrasound-guided interventions are increasingly used to details of a needle. In these cases, training a discriminant minimize risks to the patient and improve health outcomes. model of the needle requires deeper and more complex However, the procedure of needle and transducer position- convolutional networks, which increases the computational ing is extremely challenging and possible external guidance complexity. Therefore, more computationally efficient net- tools would add to the complexity and costs of the procedure. works such as our proposed ShareFCN are preferred over Instead, an automated localization of the needle in 3D US can patch classification methods. overcome 2D limitations and facilitate the ease of use of such Furthermore, as discussed in “Introduction” section, the transducers, while ensuring an accurate needle guidance. In US beamsteering angle of a phased-array transducer varies this work, we have introduced a novel image processing sys- for each region in the field of view. Consequently, US reflec- tem for detecting needles in 3D US data, which achieves tions from different parts of a needle will vary largely, such very high precision at a low false negative rate. This high Orientation error ε ( ) Tip error ε (mm) Orientation error ε ( ) Tip error ε (mm) v t Orientation error ε () Tip error ε (mm) v t International Journal of Computer Assisted Radiology and Surgery precision is achieved by exploiting dedicated convolutional Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecomm networks for needle segmentation in 3D US volumes. The ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, proposed networks are based on CNN, which is improved by and reproduction in any medium, provided you give appropriate credit proposing a new update strategy to handle highly imbalanced to the original author(s) and the source, provide a link to the Creative datasets by informed resampling of non-needle voxels. Fur- Commons license, and indicate if changes were made. thermore, novel modeling of 3D US context information is introduced using 2.5D data of multi-view thick-sliced FCN. Our proposed patch classification and semantic segmen- References tation systems are evaluated on several ex-vivo datasets and outperform classification of the state-of-the-art handcrafted 1. Barva M, Uhercík ˇ M, Mari JM, Kybic J, Duhamel JR, Liebgott H, features, achieving 78 and 80% F1-scores in the chicken Hlavac V, Cachard C (2008) Parallel integral projection transform breast data, respectively. This shows the capability of CNN in for straight electrode localization in 3-D ultrasound images. IEEE modeling more semantically meaningful information in addi- Trans Ultrason Ferroelectr Freq Control (UFFC) 55(7):1559–69 2. Beigi P, Rohling R, Salcudean SE, Ng GC (2016) Spectral analysis tion to simple shape features, which substantially improves of the tremor motion for needle detection in curvilinear ultrasound needle detection in complex and noisy 3D US data. Fur- via spatiotemporal linear sampling. Int J Comput Assist Radiol thermore, our proposed needle segmentation method based Surg 11(6):1183–1192 on 2.5D US information achieves 84% F1-score in datasets 3. Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking Atrous convolution for semantic image segmentation. ArXiv e- of porcine leg that are acquired with a lower-resolution prints. ArXiv:1706.05587 phased-array transducer. These results show a strong seman- 4. Fischler MA, Bolles RC (1981) Random sample consensus: tic modeling of the needle context in challenging situations, paradigm for model fitting with applications to image analysis. where the intensity of the needle is inconsistent and even Commun ACM 24(6):381–95 5. Glorot X, Bordes A, Bengio Y (2011) In: Proceedings of the partly invisible. 14th international conference on artificial intelligence and statistics Quantitative analysis of localization error with respect to (AISTATS) 2011, vol 15. Fort Lauderdale, FL, USA. pp 315–323. the needle length shows that the tip error is less than 0.7 mm http://proceedings.mlr.press/v15/glorot11a.html for needles of only 5 mm long and 10 mm long at voxel size of 6. Kingma DP, Ba J (2015) ADAM: a method for stochastic opti- mization. In: International conference on learning representations 0.2 and 0.36 mm, respectively. Therefore, the system is able (ICLR). ArXiv:1412.6980 to accurately detect short needles, enabling the physician to 7. Long J, Shelhamer E, Darrell T (2015) Fully convolutional net- correct inaccurate insertions at early stages in both higher- works for semantic segmentation. In: Conference on computer resolution and lower-resolution datasets. Furthermore, the vision pattern recognition (CVPR). ArXiv:1411.4038 8. Mwikirize C, Nosher JL, Hacihaliloglu I (2018) Signal attenuation needle is visualized intuitively by its in-plane view while maps for needle enhancement and localization in 2D ultrasound. ensuring that the tip is always visible, which eliminates the Int J Comput Assist Radiol Surg 13:363–374 need for advanced manual coordination of the transducer. 9. Papalazarou C, de With PHN, Rongen P (2013) Sparse-plus-dense- Future work will evaluate the proposed method in even RANSAC for estimation of multiple complex curvilinear models in 2D and 3D. Pattern Recognit 46(3):925–35 more challenging in-vivo datasets with suboptimal acquisi- 10. Pourtaherian A, Ghazvinian Zanjani F, Zinger S, Mihajlovic N, Ng tion settings. Due to the complexity of data from interven- G, Korsten H, With P (2017) Improving needle detection in 3D tional settings, larger datasets need to be acquired for training ultrasound using orthogonal-plane convolutional networks. Med more sophisticated networks. Moreover, further analysis is Image Comput Comput Assist Interv (MICCAI) 2:610–618 11. Pourtaherian A, Mihajlovic N, Zinger S, Korsten HHM, de With required to limit the complexity of CNN with respect to its PHN, Huang J, Ng GC (2016) Automated in-plane visualization performance for embedding this technology as a real-time of steep needles from 3D ultrasound volumes. In: Proceedings on application. IEEE international ultrasonics symposium (IUS), pp 1–4 12. Pourtaherian A, Scholten H, Kusters L, Zinger S, Mihajlovic N, Kolen A, Zou F, Ng GC, Korsten HHM, de With PHN (2017) Medi- Compliance with ethical standards cal instrument detection in 3-dimensional ultrasound data volumes. IEEE Trans Med Imaging (TMI) 36(8):1664–75 13. Pourtaherian A, Zinger S, de With PHN, Korsten HHM, Mihajlovic N (2015) Benchmarking of State-of-the-Art needle detection algo- Conflict of interest This research was conducted in the framework of rithms in 3D ultrasound data volumes. Proc SPIE Med Imaging “Impulse-2 for the healthcare flagship—topic ultrasound” at Eindhoven 9415: 94152B–1–8 University of Technology in collaboration with Catharina Hospital 14. Prasoon A, Petersen K, Igel C, Lauze F, Dam E, Nielsen M (2013) Eindhoven and Royal Philips. Deep feature learning for knee cartilage segmentation using a tripla- nar convolution network. In: International conference on medical Ethical approval All procedures performed in studies involving ani- image computing and computer-assisted intervention (MICCAI), mals were in accordance with the ethical standards of the institution or pp 599–606 practice at which the studies were conducted. 15. Rowley H, Baluja S, Kanade T (1998) Neural network-based face detection. IEEE Trans Pattern Anal Mach Intell (PAMI) 20(1):23– Informed consent This articles does not contain patient data. 123 International Journal of Computer Assisted Radiology and Surgery 16. Shelhamer E, Long J, Darrell T (2017) Fully convolutional net- 21. Uhercík ˇ M, Kybic J, Zhao Y, Cachard C, Liebgott H (2013) Line works for semantic segmentation. IEEE Trans Pattern Anal Mach filtering for surgical tool localization in 3D ultrasound images. Intell (PAMI) 39(4):640–651 Comput Biol Med 43(12):2036–45 17. Simonyan K, Zisserman A (2015) Very deep convolutional net- 22. van der Maaten L, Hinton G (2008) Visualizing high-dimensional works for large-scale image recognition. In: International confer- data using t-SNE. J Mach Learn Res 9:2579–605 ence learning representations (ICLR). ArXiv:1409.1556 23. Yang X, Yu L, Li S, Wang X, Wang N, Qin J, Ni D, Heng PA 18. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov (2017) Towards automatic semantic segmentation in volumetric R (2014) Dropout: a simple way to prevent neural networks from ultrasound. In: Medical image computing and computer-assisted overfitting. J Mach Learn Res 15:1929–58 intervention (MICCAI), pp 711–719 19. Sundaresan V, Bridge CP, Ioannou C, Noble JA (2017) Automated 24. Yu F, Koltun V (2016) Multi-scale context aggregation by dilated characterization of the fetal heart in ultrasound images using fully convoluions. In: International conference on learning representa- convolutional neural networks. In: IEEE international conference tions (ICLR). ArXiv:1511.07122 on biomedical imaging (ISBI), pp. 671–674 25. Zeiler MD, Fergus R (2014) Visualizing and understanding con- 20. Tieleman T, Hinton G (2012) Lect. 6.5-RmsProp: divide gradient volutional networks. In: European conference on computer vision by running average of its recent magnitude. COURSERA: Neural (ECCV), Springer, New York, pp 818–833 Net. for Machine Learning http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Computer Assisted Radiology and Surgery Springer Journals

Robust and semantic needle detection in 3D ultrasound using orthogonal-plane convolutional neural networks

Free
13 pages

Loading next page...
 
/lp/springer_journal/robust-and-semantic-needle-detection-in-3d-ultrasound-using-orthogonal-1Nz7GD1pKk
Publisher
Springer Journals
Copyright
Copyright © 2018 by The Author(s)
Subject
Medicine & Public Health; Imaging / Radiology; Surgery; Health Informatics; Computer Imaging, Vision, Pattern Recognition and Graphics; Computer Science, general
ISSN
1861-6410
eISSN
1861-6429
D.O.I.
10.1007/s11548-018-1798-3
Publisher site
See Article on Publisher Site

Abstract

Purpose During needle interventions, successful automated detection of the needle immediately after insertion is necessary to allow the physician identify and correct any misalignment of the needle and the target at early stages, which reduces needle passes and improves health outcomes. Methods We present a novel approach to localize partially inserted needles in 3D ultrasound volume with high precision using convolutional neural networks. We propose two methods based on patch classification and semantic segmentation of the needle from orthogonal 2D cross-sections extracted from the volume. For patch classification, each voxel is classified from locally extracted raw data of three orthogonal planes centered on it. We propose a bootstrap resampling approach to enhance the training in our highly imbalanced data. For semantic segmentation, parts of a needle are detected in cross-sections perpendicular to the lateral and elevational axes. We propose to exploit the structural information in the data with a novel thick-slice processing approach for efficient modeling of the context. Results Our introduced methods successfully detect 17 and 22 G needles with a single trained network, showing a robust generalized approach. Extensive ex-vivo evaluations on datasets of chicken breast and porcine leg show 80 and 84% F1- scores, respectively. Furthermore, very short needles are detected with tip localization errors of less than 0.7 mm for lengths of only 5 and 10 mm at 0.2 and 0.36 mm voxel sizes, respectively. Conclusion Our method is able to accurately detect even very short needles, ensuring that the needle and its tip are maximally visible in the visualized plane during the entire intervention, thereby eliminating the need for advanced bi-manual coordination of the needle and transducer. Keywords Needle detection · 3D ultrasound · Convolutional neural networks Introduction needle and US transducer is challenging, as the limited US field of view obscures the visualization of the complete nee- Ultrasound (US) imaging is broadly used to visualize and dle and an inadequate view leads to an erroneous placement guide the interventions that involve percutaneous advancing of the needle tip. Therefore, while advancing the needle, con- of a needle to a target inside the patients’ body. However, tinuous manipulation of the transducer is necessary to search for a typical 2D US system, bi-manual coordination of the for the needle in the imaging data for the best needle plane visualization. As an alternative, 3D US transducers with an image-based needle-tracking system can overcome these B Arash Pourtaherian a.pourtaherian@tue.nl limitations and minimize the manual coordination, while pre- serving the use of a conventional needle, signal generation Eindhoven University of Technology, 5612 AJ Eindhoven, and transducers [12]. In such a system, the needle is conve- The Netherlands niently placed in the larger 3D US field of view and the Philips Research Eindhoven, 5656 AE Eindhoven, The processing unit automatically localizes and visualizes the Netherlands entire needle. Therefore, the manual skills are significantly Philips Healthcare, Bothell, WA 98021, USA simplified when the entire needle is visible in the visualized Catharina Hospital Eindhoven, 5623 EJ Eindhoven, The plane, after the needle is advanced or the transducer is moved. Netherlands 123 International Journal of Computer Assisted Radiology and Surgery Several image-based needle localization techniques have accuracy, while small patches allow the network to infer from been proposed based on maximizing the intensity over par- only parts of the needle. allel projections [1]. Due to the complexity of realistic data, As an alternative to patch training, semantic segmentation methods that solely rely on the brightness of the needle are not methods can generate dense prediction maps by omitting robust for localizing thin objects in a cluttered background. the use of fully connected layers. Examples of such net- Therefore, information regarding the line-like structure of a works are fully convolutional networks (FCN) [7,16], and needle is used by Hessian-based line filtering methods [21]. context modeling by employing atrous convolutions [3,24]. Although shown to be limited in localization accuracy [12], Although integrating atrous (or dilated) convolutions in the they can be beneficial for reducing the imaging artifacts. deep layers of the network increases the field of view while Other techniques involve exploiting the intensity changes preserving the spatial dimensions, applying convolutions in caused by needle movement to track the needle in the US a large number of high-resolution feature maps is computa- data [2]. Nevertheless, large movements of the transducer or tionally expensive. However, original FCN architectures can the patient will increase the difficulty of motion-based track- simultaneously exploit the global and local information in ing and therefore, we aim at repeated detection in static 3D the data and remain more memory efficient by introducing volumes. When realizing the real-time operation, tracking of skip connections from higher-resolution feature maps to the the needle is implemented by repeated detection with suffi- deconvolutional layers. Initial attempts of applying these net- cient time resolution. This will result in detection per volume works on US data are presented for fetal heart segmentation in in a 4D US sequence, which allows for arbitrary inter-volume 2D US [19]. Further improvement is shown for segmentation movements. of fetus, gestational sac, and placenta in 3D US volumes by More recently, attenuation of the US signal due to energy integrating the sequential information [23]. The drawback of loss beyond the US beam incident with the needle, is used using such 3D+time models are, however, the exponentially to detect the position of the needle [8,11]. However, signal increased computational complexity of the 3D convolution loss due to the presence of other attenuating structures may operations, a very large dataset is required for training the degrade the accuracy of estimation and must be explicitly increased number of network parameters, and the sequential handled. Alternatively, supervised needle-voxel classifiers modeling will be suboptimal in the early timesteps after large that employ the needle shape and its brightness have shown movements of the transducer or subject. to be superior to the traditional methods [12]. Nevertheless, In this paper, we build upon our recent contribution using as the needle is assumed to be already inserted in the volume CNN [10] and extend it to create semantic FCN models. We up to a considerable length, they typically do not achieve high modify and extend this architecture such that it models 3D detection precision and therefore cannot localize the needle needle context information and achieves high needle seg- when it is partially inserted in the volume. Moreover, when mentation precision at a low false negative rate. We propose the target structure is deep, the degraded resolution and pos- a novel multi-view thick-sliced FCN model for an efficient sible needle deflections further complicate the interpretation modeling of 3D context information. Therefore, the system of data and reduce voxel classification performance, which will successfully perform needle localization both in cases should be addressed by better modeling of both local and of invisible needle parts and when only a short part of the contextual information. needle is inserted into the patient’s body, yielding an early In our recently published work in training convolu- correction of inaccurate insertions. The needle is then visual- tional neural networks (CNN), substantial improvement has ized in an intuitive manner, eliminating the need for advanced been shown to the detection accuracy of needle voxels in manual coordination of the transducer. The main contribu- 3D US data [10]. Although this method was shown to tions of this paper are: (1) a novel approach for segmentation achieve high-performance results for ex vivo data acquired and localization of partially inserted and partly invisible from linear-array transducers, the choice of patch classifi- needles in 3D US data using CNN models, (2) an original cation in this framework can be further improved for US update strategy for CNN parameters using most-aggressive data segmentation. In US imaging using sector, curved and non-needle samples, which significantly improves the per- phased-array transducers, the insonification angle of the US formance on highly imbalanced datasets, (3) a novel method beams is changed throughout the volume, which creates vary- for modeling of 3D US context information using 2.5D (thick ing angles with different parts of the needle. Therefore, the slice) data, which enables an accurate training of the network needle can be partially invisible, due to the lack of received using limited training samples, and (4) extensive evaluation US reflections from parts of the needle. The missed data of the proposed methods on two types of ex-vivo data giv- enforces a trade-off between patch sizes for richer needle con- ing a very high average of 81.4% precision at 88.8% recall text information and localization accuracy. Larger patches rate. require more max-pooling layers that reduce the localization 123 International Journal of Computer Assisted Radiology and Surgery Patch Classification Needle-Axis Estimation and Visualization Semantic Segmentation Fig. 1 Block diagram of our proposed framework for needle detection in 3D US data Methods CNN architecture For our experiments, we evaluate two CNN architectures based on shared convolutional (ShareCNN) and The block diagram of our proposed framework consists of independent convolutional (IndepCNN) filters. In ShareCNN, three main stages as depicted in Fig. 1. In this study, we a single convolutional filter bank is trained for the three input introduce two different approaches for segmentation of nee- planes to have the same set of filters for all the planes. In Inde- dle voxels in 3D US volumes, i.e., classification of extracted pCNN, three sets of filter banks are trained independently, patches in the data using their triplanar orthogonal views each to be convolved with one of the three planes. As depicted (“Patch classification” subsection), and end-to-end dense in Fig. 2, both architectures consist of four convolutional lay- segmentation of needle voxels in multi-view thick slices ers having 32, 48, 64 and 96 filters of 3 × 3 kernel size, three (“Semantic segmentation” subsection). The segmentation fully connected layers having 128, 64 and 2 neurons, and results are then used to fit a predefined model of the needle one softmax layer. According to the given number of fil- and extract the cross-section containing the entire needle and ters, ShareCNN and IndepCNN architectures have 2160 and the tip (“Needle axis estimation and visualization” subsec- 6480 parameters in their convolutional layers, respectively. tion). For clarity, we emphasize here that we detect the plane In both architectures, extracted feature maps after the last where the needle and its tip are maximally visible, but do not convolutional layer are concatenated prior to the fully con- explicitly detect the needle tip. This localization processing nected layers [14]. is done for every data volume individually. For a 3D+time CNN training Our dataset is significantly imbalanced due to US sequence, this would effectively mean repeated detection the small size of a needle compared to the full volume, i.e., for every volume or image. approximately only 1 voxel out of 3000 voxels in a volume belongs to the needle. This is common in representation of an instrument in 3D US volumes. Therefore, in order to avoid Patch classification a prediction bias toward the majority class, we downsample the negative training data to match the number of needle sam- The block diagram of the proposed patch classification tech- ples. For an informed sampling of the negative (non-needle) nique is shown in Fig. 2. A CNN model is trained to robustly set, we propose an iterative scheme based on bootstrapping classify the needle voxels in the 3D US volumes from other [15] to achieve the maximum precision. In the first step, we echogenic structures, such as bones and muscular tissues. train our network with uniformly sampled needle and non- Our voxel classification network predicts the label of each needle patches. Training patches are rotated arbitrarily by 90 voxel from the raw voxel values of local proximity. In a 3D steps around the axial axis to improve the orientation invari- volume, this local neighborhood can simply be a 2D cross- ance. The trained network then classifies the same training section in any orientation, multiple cross-sections, or a 3D set for validation. Finally, misclassified false positives are patch. Here, we use three orthogonal cross-sections centered harvested as the most-aggressive non-needle voxels, which at the reference voxel, which is a compromise with respect are used to update the network. Figure 3 shows how the itera- to the complexity of the network. The size of triplanar cross- tive and informed sampling can increase the precision of the sections is chosen based on the diameter of a typical needle network. It is worth mentioning that commonly used meth- (0.7–1.5 mm), voxel size, and spatial resolution of the trans- ods for imbalanced data, like weighted loss function, do not ducer, to contain sufficient context information. We extract necessarily improve precision. For example, the majority of triplanar cross-sections of 21 × 21 pixels (4.2 × 4.2 mm), our negative set consists of “easy” samples that can be clas- which provides sufficient global shape information and still sified beyond the model’s margin and will influence the loss remains spatially accurate. For low-frequency transducers, function in their favor. more context is required for a discriminative modeling, as The CNN parameters are trained using stochastic gradi- the structure details of a needle will be distorted at low spa- ent descent (SGD) and the categorical cross-entropy cost tial resolutions. 123 International Journal of Computer Assisted Radiology and Surgery Fig. 2 Block diagram of the patch classification approach using CNN Latent variable 1 Latent variable 1 Latent variable 1 Feature-space of an imbalanced dataset Step 1. Random negative sampling Step 2. Most-aggressive negative sampling Fig. 3 An example of the iterative sampling strategy to increase the precision of the network. The red circles represent the positive data points, gray and blue triangles are the negative and sampled data points, respectively and the dashed line represents the decision boundary of a classifier function. All activation functions are chosen to be rectified thermore, in contrast to the patch-based methods, where linear units (ReLU) [5]. Furthermore, for optimization of the redundant processing of voxels is inevitable, FCN models network weights, we divide the learning rate by the exponen- are more computationally efficient as they exploit the one- tially weighted average of recent gradients (RMSProp) [20]. time extracted features to simultaneously label all the data −4 −5 Initial learning rates are chosen to be 10 and 10 for points using deconvolutional networks. train and update iterations, respectively. In order to prevent Figure 4 shows the architecture of the proposed seman- overfitting, we implement the dropout approach [18] with tic needle segmentation technique in 3D US volumes. Our a probability of 0.5 in the first two fully connected layers. method is based on decomposing the 3D volume into 2D The trained network computes a label per voxel indicating cross-sections for labeling the needle parts and reconstruct- whether it belongs to the needle or not. ing the 3D needle labels from the multiple views. Therefore in our approach, the number of parameters in the convolution kernels decreases exponentially compared to the 3D kernels, Semantic segmentation and consequently, the network requires fewer training sam- ples and executes faster. We will now present our strategy for As discussed in “Introduction” section, semantic segmenta- selecting the cross-sections to be processed. tion of a needle using FCN architectures are more interest- The 2D cross-sections are selected in multiple directions ing than patch classification as the context information is and perpendicular to the transducer with the step size equal to modeled, while the spatial dimensions are preserved. Fur- Latent variable 2 Latent variable 2 Latent variable 2 International Journal of Computer Assisted Radiology and Surgery the voxel size. Since in a 3D US volume, the needle can enter function. The learning rate is adaptively computed using the the field of view from either the lateral or elevational direc- ADAM optimization method [6] with an initial learning rate tions, we consider cross-sections perpendicular to these axes. equal to 1e−4. Furthermore, dropout layers with a probability The segmentation outcome of each cross-section is mapped of 0.85 are added to the layer numbers 17 and 18 of the onto its corresponding position in 3D. Afterward, the result- convolution network. ing probability volume from the two directions is combined together using multiplicative averaging to create the final Needle axis estimation and visualization labeling outcome in 3D. In order to exploit the 3D structural information in our In order to robustly detect the instrument axis in the presence model, instead of only using 2D planar data, we opt for of outliers, we fit a model of the needle to the detected voxels processing the consecutive cross-sections before and after using the RANSAC algorithm [4]. The needle model can be the processing plane as additional inputs to the network. In represented by a straight cylinder having a fixed diameter. In this study, we add two additional cross-sections and evaluate cases of large instrument deflection, the model can be adapted several spacing gaps d, between them. Therefore, as shown to define a parabolic segment, as shown in [9]. Using the in Fig. 4, a 3-channel input to the network is formed from RANSAC algorithm, the cylindrical model that contains the the 2.5D (thick slice) US data at a specific position, which highest number of voxels is chosen to be the estimated needle. is used to create a 2D segmentation map of the processing As the experimented needle diameters are less than 2 mm, cross-section. we set the cylindrical model diameter to be approximately FCN architecture Figure 4 depicts the FCN architecture 2 mm. used in our system comprising two stages of convolution After successful detection of needle axis, the 2D cross- and deconvolution networks. Inspired by ShareCNN, we use section of the volume is visualized that contains the plane shared convolution filters (ShareFCN) for both lateral and with the entire needle with maximum visibility, which is also elevational planes. The convolution network is identical to perpendicular to coronal (xy) planes. This cross-section is the the design of the VGG very deep 19 layer CNN [17]. The in-plane view of the needle that is very intuitive for physicians deconvolution network consists of three unpooling masks of to interpret. This ensures that while advancing the needle, the 2, 2 and 8 pixels, respective convolution layers having 512, entire instrument is visualized as much as it is visible and any 256 and 2 filters with 3×3 kernel size, and one softmax layer. misalignment of the needle and target is corrected without Therefore, the receptive field of the network is equal to a win- maneuvering the transducer. dow of 96 × 96 pixels, which is equivalent to approximately 19.2×19.2 mm for higher-resolution VL13-5 transducer and Implementation details 34.5×34.5 mm for lower-resolution X6-1 transducer, achiev- ing a large context modeling at the same inference resolution Our Python implementations of the proposed patch classifi- of the input data. Convolution layers are stacked together fol- cation and semantic segmentation methods take on average lowed by an activation function. As discussed, the network 74 and 0.5 µs for each voxel, respectively (1180 and 15 ms takes a 3-channel 2D input and the output layer indicates the for each 2D cross-section) on a standard PC with a GeForce class-assignment probability for each pixel in the processing GTX TITAN X GPU. Therefore, when implementing a full cross-section. scan to process all voxels and cross-sections in the volume, FCN training The training set consists of 3-channel cross- patch classification executes in 4–5 min, whereas semantic sections extracted with a gap of d mm in both elevational segmentation takes only 2–3 s. Nevertheless, further opti- and lateral directions. The training volumes are augmented mization is possible using conventional techniques such as a by 10 arbitrary rotations around the axial (z) axis prior to coarse-fine search strategy with a hierarchical grid to achieve extraction of the cross-sections. Therefore, several views of real-time performance. Furthermore, the execution time of the needle are used to train the network, including in-plane, RANSAC model fitting is negligible, as the expected num- out-of-plane and cross-sections with partial visibility of the ber of outliers is very small. needle. Similar to our approach presented in “Patch classi- The required computational power for realization of our fication” subsection, we downsample the negative training proposed methods is expected to be widely available on high- data, which are the sections that do not contain the nee- end ultrasound devices that benefit from parallel computing dle, to match the number of cross-sections from the needle. platforms, such as a GPU. However, for implementation However, since the initial training samples are not highly in mid-range and portable systems, more efficient and imbalanced, we do not perform bootstrapping for training compressed architectures should be investigated. Still, ever- the FCN parameters. increasing computational capacity of mobile processors, as We trained the network parameters using SGD update with well as fast development and availability of on-board embed- a batch size of one sample and softmax cross-entropy cost ded units with pre-programmed convolutional modules will 123 International Journal of Computer Assisted Radiology and Surgery Fig. 4 Block diagram of the semantic segmentation approach using FCN Table 1 Specifications and experimental settings of 3D US volumes used for evaluation Tissue type / transducer Needle type and diameter Experimental settings Voxel size (mm) # of vols. Maximum length (mm) Steepness angles a ◦ ◦ Chicken breast/ VL13-5 17 G (1.47 mm) 10 30 10 −30 0.20 ◦ ◦ 22 G (0.72 mm) 10 30 5 −50 0.20 a ◦ ◦ Porcine leg/ X6-1 17 G (1.47 mm) 10 45 55 −80 0.36 ◦ ◦ 22 G (0.72 mm) 10 35 20 −65 0.36 Available from Philips Healthcare, Bothell, WA, USA make such computer-aided applications more affordable and 0.36 mm/voxel. Ground-truth data is created by manually readily accessible to the majority of ultrasound devices. annotating the voxels belonging to the needle in each volume. Testing evaluation is performed based on five-fold cross- validation separately for each transducer across its 20 ex-vivo 3D US volumes. For each fold, we use 4 subsets for training Experimental results and 1 subset for testing, to make the training and testing data completely distinct. The evaluation dataset consists of four types of ex-vivo US data acquired from chicken breast and porcine leg using a VL13-5 transducer (motorized linear-array) and a X6-1 Patch classification transducer (phased-array). Our experiments with two types of transducers and tissue types investigate the robustness We use the dataset from chicken breast to evaluate the of the proposed methods in various acquisition settings performance of the proposed patch classification method. and conditions. Properties and specifications of our dataset Capability of the network to transform the input space to are summarized in Table 1. Each volume from VL13-5 meaningful features is visualized using a multi-dimensional transducer contains on average 174 × 189 × 188 voxels scaling that projects the representation of feature space onto a (lat. × ax. × elev.), at 0.2 mm/voxel and from X6-1 trans- 2D image. For this purpose, we applied t-distributed Stochas- ducer contains 452 × 280 × 292 voxels, at approximately tic Neighbor Embedding (t-SNE) [22] to the first fully 123 International Journal of Computer Assisted Radiology and Surgery connected layer of the network. The result of the multi- channels consist of parallel cross-sections having d mm gap dimensional projection of the test set in one of the folds is between them. In this section, we investigate the contribution depicted in Fig. 5, where close points have similar charac- of our multi-slicing approach for increasing the segmentation teristics in the feature space. As shown, the two clusters are accuracy of individual cross-sections and identify the optimal clearly separated based on the features learned by the net- d for each type of data and needle. work. Figure 7 depicts the bar chart of the measured improve- Performance of our proposed methods is evaluated in the ment of the F1-scores for each dataset and choice of d full volumes and the results are shown in Table 2, listing compared to 1-channel single-slice input. The F1-scores are voxel-level recall, precision, specificity and F1-score. Recall calculated after cross-validation of the predictions on parallel is the sensitivity of detection and is defined as the number cross-sections to the lateral and elevational axes. As shown, of correctly detected needle voxels divided by the number adding extra consecutive cross-sections for segmentation of of voxels belonging to the needle. Precision or the positive the needle increases the performance in all the cases. How- predictive value is defined as the number of correctly detected ever, when the distance d is too large, the visible structures needle voxels divided by the total number of detected needle in the extracted cross-sections cannot be co-related to each voxels. Specificity is defined as the number of voxels that other any longer and therefore the performance gain will are correctly detected as non-needle divided by the number decrease. As shown in Fig. 7, the spacing values of 1.3 and of voxels that are not part of the needle. Finally, F1-score 2.0 mm gain the highest improvement in the F1-score, while is calculated as the harmonic mean between the voxel-based the results for 2.0 mm are more stable. Therefore, we choose recall and precision and is used to measure the similarity d = 2.0 mm as the optimal spacing among the consecutive between the system detections and the ground-truth labels. cross-sections and use it in the following experiments. Furthermore, we compare the results with the approach of [13], which is based on supervised classification of vox- els from their responses to hand-crafted Gabor wavelets. As Voxel segmentation performance shown, both shared CNN and independent CNN architectures outperform the Gabor features yielding a 25% improvement Our proposed method based on dense needle segmentation on F1-score. Furthermore, ShareCNN achieves higher pre- in multi-slice thick (2.5D) US planes is evaluated in terms cision than IndepCNN at approximately similar recall rate. of recall, precision and specificity, as defined in “Patch clas- The degraded performance of IndepCNN can be explained sification” section. Table 3 shows the obtained voxel-wise by the large increase in the number of network parameters in performances on both chicken breast and porcine leg datasets. our small-sized data. As shown, the proposed ShareFCN architecture, achieves Figure 6 shows examples of the classification results for 17 very high recall and precision scores in both chicken breast and 22 G needles. As shown in the left column, detected nee- and porcine leg datasets. dle voxels correctly overlap the ground-truth voxels, which To study the performance of our trained networks in seg- results in a good detection accuracy. Furthermore, example menting needle voxels, we visualize the response of the patches from true and false positives are visualized, which intermediate feature layers to needle cross-sections. For this show a very high local similarity. Most of the false negative purpose, the reconstructed patterns from the evaluation set patches belong to the regions with other nearby echogenic that cause high activations in the feature maps are visualized structures, which distorts the appearance of the needle. using the Deconvnet, as proposed by Zeiler et al. [25]. Fig- ure 8 shows the input stimuli that creates largest excitations Semantic segmentation of individual feature maps at several layers in the model, as well as their corresponding input images. As shown, both We evaluate the performance of our proposed semantic seg- networks trained for VL13-5 and X6-1 transducers improve mentation method on both datasets from chicken breast and the discriminating features of the needle and remove the porcine leg. As shown in Table 1, the data of porcine leg are background as the network depth increases. However, it is acquired using a phased-array transducer, in which the nee- interesting to notice the different modeling behavior of the dle appearance will be more inconsistent due to the varied network in convolution layers 12, 20 and 21 for the two trans- reflection angles of the backscattered beams. ducers. In the dataset acquired using the VL13-5 transducer, the higher-frequency range creates more strong shadow cast- Data representation in 2.5D ings below the needle in the data. Therefore, as it can be observed in Figure 8a, the trained network additionally mod- As discussed in “Semantic segmentation” section, we use a els the dark regions in layers 12 and 20 and fuses them to the 3-channel input to the FCN network for better modeling of shape and intensity features extracted in the shallower layers the 3D structures from 2.5D (thick slice) US data. The three of the network. 123 International Journal of Computer Assisted Radiology and Surgery Fig. 5 Multi-dimensional projection of voxels in the test set using the t-SNE algorithm. Red and blue points represent needle and non-needle voxels, respectively Latent variable 1 Table 2 Average voxel Method Recall Precision Specificity F1-score classification performances in the full volumes of chicken a Gabor transformation [13] 47.1 48.2 – 53.7 breast (%) ShareCNN 76.3 ± 5.8 83.2 ± 5.6 99.98 ± 4e−5 78.5 ± 5.3 IndepCNN 78.4 ± 5.3 64.7 ± 4.8 99.97 ± 6e−5 66.1 ± 4.9 Two models trained separately for each needle (averaged) Single model trained directly for both needles True Positives False Positives False Negatives True Positives False Positives False Negatives Fig. 6 Examples of classification results for 17 and 22 G needles. (Left) Detected needle voxels in 3D volumes shown in red and ground-truth voxels in green. (Right) Triplanar orthogonal patches classified as true positive, false positive, and false negative Figure 9 shows examples of the segmentation results in and after combining the results, the needle voxels are recov- cross-sections perpendicular to the lateral and elevational ered and detected in 3D. axes. As shown, the segmentation is very accurate for all the cases of a needle being entirely visible in a cross-section, Axis estimation accuracy partially acquired or being viewed from the out-of-the-plane cross-sections. In particular, Fig. 9d depicts a case of a needle Because of the high detection precision achieved with both with a relatively large horizontal angle with the transducers, ShareCNN and ShareFCN approaches, estimation of the nee- which results in the needle being partially acquired in all the dle axis is possible even for short needle insertions after processed cross-sections. As it can be seen, visible parts of a simple RANSAC fitting. The accuracy of our proposed the needle at each cross-section are successfully segmented ShareCNN and ShareFCN methods in localizing the needle 22G needle 17G needle Latent variable 2 International Journal of Computer Assisted Radiology and Surgery Fig. 7 Improvements of d = 0.7 mm F1-scores for each choice of d, d = 1.3 mm which is the gap between the 0.5 d = 2.0 mm consecutive slices used as input d = 2.8 mm to the three channel FCN d = 3.6 mm 0.4 network 0.3 0.2 0.1 0.0 Chicken breast Chicken breast Porcine leg Porcine leg 17G needle 22G needle 17G needle 22G needle Table 3 Average voxel Method Tissue Recall Precision Specificity F1-score classification performances of semantic segmentation approach a ShareFCN Chicken breast 89.6 ± 4.2 79.8 ± 5.5 99.97 ± 1e−4 80.0 ± 4.7 in the full volumes (%) Porcine leg 87.9 ± 4.2 83.0 ± 3.7 99.99 ± 1e−5 84.1 ± 3.4 Single model trained directly for both 17 and 22 G needles Input Layer 2 Layer 4 Layer 8 Layer 12 Layer 20 Layer 21 Layer 22 (a) Input Layer 2 Layer 4 Layer 8 Layer 12 Layer 20 Layer 21 Layer 22 (b) Fig. 8 Visualization of features projected into the input space in the 2, 4, 8, 12, 20, 21, and 22. Note the skip connections in the layers to fuse trained model a for Linear-array VL13-5 transducer and b for Phased- coarse, semantic and local features. The ground-truth needle is marked array X6-1 transducer. Reconstruction of the input image is shown by with a yellow arrowhead in input images using only the highest activated features after the convolutional layers axis is evaluated as a function of the needle length as por- lated as the point-plane distance of the ground-truth needle trayed by Fig. 10. We use two measurements for defining and tip and the detected needle plane. The orientation error (ε )is evaluating spatial accuracy. The needle tip error (ε ) is calcu- the angle between the detected and the ground-truth needle. Short axis Long axis Short axis Long axis F1-score Improvement International Journal of Computer Assisted Radiology and Surgery 17G needle Linear-array trans. 5–13 MHz (a) 22G needle Linear-array trans. 5–13 MHz (b) 17G needle Phased-array trans. 1–6 MHz (c) 22G needle Phased-array trans. 1–6 MHz (d) Fig. 9 Examples of segmentation results of a 17 G and b 22 G needles images in the bottom row are the segmentation results. Volumes on the in chicken breast acquired with a linear-array transducer, and c 17 G right side show the segmented needle voxels after combining the results and d 22 G needles in porcine leg acquired with a phased-array trans- from both lateral and elevational directions in red and the ground-truth ducer. Images in the top row are input cross-sections to the network and voxels in green As discussed earlier, we do not explicitly detect the needle voxels in the first 2 mm are undetectable, as the minimum tip, but detect the plane where the needle and its tip are max- distance of extracted 3D patches from the volume borders imally visible. This localization processing is done for every corresponds to half of a patch length. individual data point, leading to repeated detection in cases In the datasets of porcine leg, the voxel size is reduced of a 3D+time US sequence. to 0.36 mm due to the lower acquisition frequency of the As shown in Fig. 10a, b, both ShareCNN and ShareFCN phased-array transducer. Therefore, longer lengths of the methods perform accurately in the datasets of chicken breast, needle are required for accurate detections. As shown in reaching ε of less than 0.7 mm for needle lengths of approx- Fig. 10c, for needles of approximately 10 mm or longer, the imately 5 mm or larger. In both approaches, the ε shows ε reduces to 0.7 and 0.6 mm for 17 and 22 G needles, respec- v t more sensitivity to shorter needles and varies more for the tively. In contrast to the datasets of chicken breast, the ε is 22 G needle, which is more difficult to estimate, compared generally larger for short 17 G needles in porcine leg. Most to a thicker needle. Furthermore, for the ShareCNN method, importantly, in all of the experiments, the needle tip error, ε , 123 International Journal of Computer Assisted Radiology and Surgery 4 4 4 17G needle 17G needle 17G needle 3 3 3 22G needle 22G needle 22G needle 2 2 2 1 1 1 0 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 Needle length (mm) Needle length (mm) Needle length (mm) 40 40 30 30 30 20 20 20 10 10 10 0 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 Needle length (mm) Needle length (mm) Needle length (mm) (a) (b) (c) Fig. 10 Needle tip position error (ε ) and orientation error (ε )asa size ≈ 0.20 mm. b ShareFCN results in chicken breast dataset voxel t v function of needle length. Dashed lines represent standard errors of the size ≈ 0.20 mm. c ShareFCN results in porcine leg dataset voxel size measured values. a ShareCNN results in chicken breast dataset voxel ≈ 0.36 mm remains lower than 0.7 mm. This shows that after insertion of that a considerable portion of the needle shaft can be virtu- only 5 mm for higher-resolution linear-array transducers and ally invisible in the data. Therefore, the receptive field of a 10 mm for lower-resolution phased-array transducers, the tip convolutional network need to be large enough to model the will be always visible in the detected plane as their distance contextual information from the visible parts of the needle. In is less than the thickness of US planes. a patch-based classification technique, a larger receptive field can be achieved by, e.g., increasing the patch size, increas- ing the number of convolution and max-pooling layers, or employing normal or atrous convolutions with larger kernel Discussion sizes. In all of these methods, the computational complexity increases exponentially as more redundant calculations have From comparing the results reported in Tables 2 and 3, it can to be computed for adjacent patches, and the spatial accuracy be concluded that the performance of ShareFCN is compara- decreases as small shifts of patches cannot be translated to ble and only slightly better than the patch-based ShareCNN two different classes. on chicken breast data acquired from the higher-frequency range VL13-5 linear-array transducer. However, a major ben- efit of dense segmentation using ShareFCN is related to the data of the lower-frequency range X6-1 phased-array Conclusions transducer. The resulting lower spatial resolution of these transducers distorts the appearance and obscures structure Ultrasound-guided interventions are increasingly used to details of a needle. In these cases, training a discriminant minimize risks to the patient and improve health outcomes. model of the needle requires deeper and more complex However, the procedure of needle and transducer position- convolutional networks, which increases the computational ing is extremely challenging and possible external guidance complexity. Therefore, more computationally efficient net- tools would add to the complexity and costs of the procedure. works such as our proposed ShareFCN are preferred over Instead, an automated localization of the needle in 3D US can patch classification methods. overcome 2D limitations and facilitate the ease of use of such Furthermore, as discussed in “Introduction” section, the transducers, while ensuring an accurate needle guidance. In US beamsteering angle of a phased-array transducer varies this work, we have introduced a novel image processing sys- for each region in the field of view. Consequently, US reflec- tem for detecting needles in 3D US data, which achieves tions from different parts of a needle will vary largely, such very high precision at a low false negative rate. This high Orientation error ε ( ) Tip error ε (mm) Orientation error ε ( ) Tip error ε (mm) v t Orientation error ε () Tip error ε (mm) v t International Journal of Computer Assisted Radiology and Surgery precision is achieved by exploiting dedicated convolutional Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecomm networks for needle segmentation in 3D US volumes. The ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, proposed networks are based on CNN, which is improved by and reproduction in any medium, provided you give appropriate credit proposing a new update strategy to handle highly imbalanced to the original author(s) and the source, provide a link to the Creative datasets by informed resampling of non-needle voxels. Fur- Commons license, and indicate if changes were made. thermore, novel modeling of 3D US context information is introduced using 2.5D data of multi-view thick-sliced FCN. Our proposed patch classification and semantic segmen- References tation systems are evaluated on several ex-vivo datasets and outperform classification of the state-of-the-art handcrafted 1. Barva M, Uhercík ˇ M, Mari JM, Kybic J, Duhamel JR, Liebgott H, features, achieving 78 and 80% F1-scores in the chicken Hlavac V, Cachard C (2008) Parallel integral projection transform breast data, respectively. This shows the capability of CNN in for straight electrode localization in 3-D ultrasound images. IEEE modeling more semantically meaningful information in addi- Trans Ultrason Ferroelectr Freq Control (UFFC) 55(7):1559–69 2. Beigi P, Rohling R, Salcudean SE, Ng GC (2016) Spectral analysis tion to simple shape features, which substantially improves of the tremor motion for needle detection in curvilinear ultrasound needle detection in complex and noisy 3D US data. Fur- via spatiotemporal linear sampling. Int J Comput Assist Radiol thermore, our proposed needle segmentation method based Surg 11(6):1183–1192 on 2.5D US information achieves 84% F1-score in datasets 3. Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking Atrous convolution for semantic image segmentation. ArXiv e- of porcine leg that are acquired with a lower-resolution prints. ArXiv:1706.05587 phased-array transducer. These results show a strong seman- 4. Fischler MA, Bolles RC (1981) Random sample consensus: tic modeling of the needle context in challenging situations, paradigm for model fitting with applications to image analysis. where the intensity of the needle is inconsistent and even Commun ACM 24(6):381–95 5. Glorot X, Bordes A, Bengio Y (2011) In: Proceedings of the partly invisible. 14th international conference on artificial intelligence and statistics Quantitative analysis of localization error with respect to (AISTATS) 2011, vol 15. Fort Lauderdale, FL, USA. pp 315–323. the needle length shows that the tip error is less than 0.7 mm http://proceedings.mlr.press/v15/glorot11a.html for needles of only 5 mm long and 10 mm long at voxel size of 6. Kingma DP, Ba J (2015) ADAM: a method for stochastic opti- mization. In: International conference on learning representations 0.2 and 0.36 mm, respectively. Therefore, the system is able (ICLR). ArXiv:1412.6980 to accurately detect short needles, enabling the physician to 7. Long J, Shelhamer E, Darrell T (2015) Fully convolutional net- correct inaccurate insertions at early stages in both higher- works for semantic segmentation. In: Conference on computer resolution and lower-resolution datasets. Furthermore, the vision pattern recognition (CVPR). ArXiv:1411.4038 8. Mwikirize C, Nosher JL, Hacihaliloglu I (2018) Signal attenuation needle is visualized intuitively by its in-plane view while maps for needle enhancement and localization in 2D ultrasound. ensuring that the tip is always visible, which eliminates the Int J Comput Assist Radiol Surg 13:363–374 need for advanced manual coordination of the transducer. 9. Papalazarou C, de With PHN, Rongen P (2013) Sparse-plus-dense- Future work will evaluate the proposed method in even RANSAC for estimation of multiple complex curvilinear models in 2D and 3D. Pattern Recognit 46(3):925–35 more challenging in-vivo datasets with suboptimal acquisi- 10. Pourtaherian A, Ghazvinian Zanjani F, Zinger S, Mihajlovic N, Ng tion settings. Due to the complexity of data from interven- G, Korsten H, With P (2017) Improving needle detection in 3D tional settings, larger datasets need to be acquired for training ultrasound using orthogonal-plane convolutional networks. Med more sophisticated networks. Moreover, further analysis is Image Comput Comput Assist Interv (MICCAI) 2:610–618 11. Pourtaherian A, Mihajlovic N, Zinger S, Korsten HHM, de With required to limit the complexity of CNN with respect to its PHN, Huang J, Ng GC (2016) Automated in-plane visualization performance for embedding this technology as a real-time of steep needles from 3D ultrasound volumes. In: Proceedings on application. IEEE international ultrasonics symposium (IUS), pp 1–4 12. Pourtaherian A, Scholten H, Kusters L, Zinger S, Mihajlovic N, Kolen A, Zou F, Ng GC, Korsten HHM, de With PHN (2017) Medi- Compliance with ethical standards cal instrument detection in 3-dimensional ultrasound data volumes. IEEE Trans Med Imaging (TMI) 36(8):1664–75 13. Pourtaherian A, Zinger S, de With PHN, Korsten HHM, Mihajlovic N (2015) Benchmarking of State-of-the-Art needle detection algo- Conflict of interest This research was conducted in the framework of rithms in 3D ultrasound data volumes. Proc SPIE Med Imaging “Impulse-2 for the healthcare flagship—topic ultrasound” at Eindhoven 9415: 94152B–1–8 University of Technology in collaboration with Catharina Hospital 14. Prasoon A, Petersen K, Igel C, Lauze F, Dam E, Nielsen M (2013) Eindhoven and Royal Philips. Deep feature learning for knee cartilage segmentation using a tripla- nar convolution network. In: International conference on medical Ethical approval All procedures performed in studies involving ani- image computing and computer-assisted intervention (MICCAI), mals were in accordance with the ethical standards of the institution or pp 599–606 practice at which the studies were conducted. 15. Rowley H, Baluja S, Kanade T (1998) Neural network-based face detection. IEEE Trans Pattern Anal Mach Intell (PAMI) 20(1):23– Informed consent This articles does not contain patient data. 123 International Journal of Computer Assisted Radiology and Surgery 16. Shelhamer E, Long J, Darrell T (2017) Fully convolutional net- 21. Uhercík ˇ M, Kybic J, Zhao Y, Cachard C, Liebgott H (2013) Line works for semantic segmentation. IEEE Trans Pattern Anal Mach filtering for surgical tool localization in 3D ultrasound images. Intell (PAMI) 39(4):640–651 Comput Biol Med 43(12):2036–45 17. Simonyan K, Zisserman A (2015) Very deep convolutional net- 22. van der Maaten L, Hinton G (2008) Visualizing high-dimensional works for large-scale image recognition. In: International confer- data using t-SNE. J Mach Learn Res 9:2579–605 ence learning representations (ICLR). ArXiv:1409.1556 23. Yang X, Yu L, Li S, Wang X, Wang N, Qin J, Ni D, Heng PA 18. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov (2017) Towards automatic semantic segmentation in volumetric R (2014) Dropout: a simple way to prevent neural networks from ultrasound. In: Medical image computing and computer-assisted overfitting. J Mach Learn Res 15:1929–58 intervention (MICCAI), pp 711–719 19. Sundaresan V, Bridge CP, Ioannou C, Noble JA (2017) Automated 24. Yu F, Koltun V (2016) Multi-scale context aggregation by dilated characterization of the fetal heart in ultrasound images using fully convoluions. In: International conference on learning representa- convolutional neural networks. In: IEEE international conference tions (ICLR). ArXiv:1511.07122 on biomedical imaging (ISBI), pp. 671–674 25. Zeiler MD, Fergus R (2014) Visualizing and understanding con- 20. Tieleman T, Hinton G (2012) Lect. 6.5-RmsProp: divide gradient volutional networks. In: European conference on computer vision by running average of its recent magnitude. COURSERA: Neural (ECCV), Springer, New York, pp 818–833 Net. for Machine Learning

Journal

International Journal of Computer Assisted Radiology and SurgerySpringer Journals

Published: May 31, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off