TY - JOUR AU - Lisani,, Jose-Luis AB - Abstract The dynamics of fish length distribution is a key input for understanding the fish population dynamics and taking informed management decisions on exploited stocks. Nevertheless, in most fisheries, the length of landed fish is still made by hand. As a result, length estimation is precise at fish level, but due to the inherent high costs of manual sampling, the sample size tends to be small. Accordingly, the precision of population-level estimates is often suboptimal and prone to bias when properly stratified sampling programmes are not affordable. Recent applications of artificial intelligence to fisheries science are opening a promising opportunity for the massive sampling of fish catches. Here, we present the results obtained using a deep convolutional network (Mask R-CNN) for unsupervised (i.e. fully automatic) European hake length estimation from images of fish boxes automatically collected at the auction centre. The estimated mean of fish lengths at the box level is accurate; for average lengths ranging 20–40 cm, the root-mean-square deviation was 1.9 cm, and maximum deviation between the estimated and the measured mean body length was 4.0 cm. We discuss the challenges and opportunities that arise with the use of this technology to improve data acquisition in fisheries. Introduction Ensuring fish stocks sustainability while maximizing fishers profitability is an elusive and still not solved topic (Hilborn, 2007; Iudicello et al., 2012). Solving this puzzle is specially urgent in the case of the Mediterranean fleets because they have been going through a deep crisis for decades, which has been attributed to the continuous decrease in the sale price of fish, that translates in a continuous decrease in the number of boats and in a very low recruitment rate of young fishermen (Palmer et al., 2017). At least in the case of the Balearic Islands, the root of the problem seems to be more related to the commercialization of the product that to the state of conservation of the stocks (Reglero and Morales-Nin, 2008; Morales-Nin et al., 2010; Maynou et al., 2013). Comanagement is one of the strategies aimed to match stocks sustainability and fisher profits (d’Armengol et al., 2018). Several small-scale fisheries are currently comanaged in the Balearic Islands (e.g. Aphia minuta and Coryphaena hippurus) suggesting that fishers are prone to adopt this type of strategy. Comanagement delivers both ecological and social benefits (d’Armengol et al., 2018), but periodically updated reviews of the results are mandatory in order to adopt short-term, operational management decisions. Moreover, those decisions must be informed by accurate and precise data. Similarly, conventional fishery models may inform on the mid- and long-term trends of the exploited stocks but must be fed with accurate and precise data too. Fish length is one of the key variables needed for both taking short-term management decisions and modelling stock trends. Nevertheless, in almost all fisheries, the estimation of the length of landed fish is still made by hand. Length measures are precise enough for those purposes, but since the observers cost is high, the sample size used for estimating length at population level tends to be relatively small. Accordingly, the estimation of the length distribution at any given time may be imprecise and may be prone to bias when properly stratified sampling programmes are not affordable. In contrast with the relatively low efficiency of observers, a massive amount of images can be processed by computer vision. Hake (Merluccius merluccius) is considered overfished in the Mediterranean, with an alarming 20% reduction in catches in the last 20 years. Overfishing in the Balearic islands has been considered moderate (FAO, 2016), but the overall status in the Mediterranean is considered critical (FAO, 2018). Moreover, this species represents an economically relevant fraction of the landings in the Balearic Islands (Palmer et al., 2009, 2017). Accordingly, in this study, we propose the hake as case study species for implementing computer vision techniques for massively estimating fish length from images. In particular, we propose to adapt an existing convolutional neural network (CNN, Mask R-CNN; He et al., 2017) to the problem at hand. This strategy is technically feasible in Mallorca (Balearic Islands) because images of fish boxes are routinely obtained at the conveyor belt, just at the bidding, in the auction centre. Therefore, at any port with similar facilities (fish on a conveyor belt is common practice elsewhere), length estimates for all the fish boxes sold in a day, all the days of the year, could be obtained at affordable cost, thus fully fulfilling the data requirements that would enable taking informed operational decisions at the short-term scale needed for comanagement and, at the same time, monitoring the mid- and long-term trends of the stocks. Some of the earliest attempts at using computer vision techniques for length measurement of fish were reported by Arnarson et al. (1991) and Strachan (1993). In both cases, a camera was placed on top of a conveyor belt where fishes passed by, one at a time. The illumination conditions were controlled in such a way that fishes were much darker than the background. Therefore, a simple illumination threshold was used to detect fish. Once detected, their orientation was determined and normalized and the (possibly curved) line from nose to tail fork was computed. The length of this line was used to estimate the actual length of the fish. More complex versions included edge detection, color calibration and the distinction between roundfish and flatfish (White et al., 2006). However, the system setting (conveyor belt with controlled lighting) remained similar. In a similar way, in Abdullah et al. (2009) pictures of individual fishes were used as input and edge-and-corner detection methods were applied to estimate the position of head and tail from which the length was computed. Detection and measurement of live fish in underwater images is more challenging. Concerning detection, body silhouettes have been extracted using edge detection techniques under controlled illumination conditions (Hardin, 2006; Zion et al., 2007; Miranda and Romero, 2017). Stereo methods and 3D models have been proposed to concurrently estimate the fish length and the distance of the free-swimming fish from the camera (Petrell et al., 1997; Tillett et al., 2000; Díaz-Gil et al., 2017). Image enhancement techniques for the correction of color and illumination have also been implemented (Martinez-de Dios et al., 2003; Costa et al., 2006; Al-Jubouri et al., 2017). A common characteristic of these methods is that, once the distance to the camera and the illumination have been normalized, they use classical image processing techniques (segmentation by thresholding or edge/corner detection) to extract the fish’s features of interest. However, those conventional image processing techniques have been progressively replaced by methods based on machine learning for the tasks of detection and classification. One of the first applications of machine learning was the detection of human faces (Viola and Jones, 2004). Subsequently, techniques based on learning algorithms called support vectors machines, and the use of local image descriptors (Dalal and Triggs, 2005) obtained notable results in the detection of various types of objects (faces, vehicles, pedestrians, etc.). The use of CNNs for pattern recognition received a strong boost in 1990 with the use of new optimization techniques for network training (Lecun et al., 1998). In 2012, the use of graphical processing units (GPUs) allowed the implementation of CNNs with many layers (deep networks) and trained on large amounts of data that exceeded human performance in image classification tasks (Krizhevsky et al., 2012), giving rise to the current boom of deep learning (GPUs are hardware devices that speed-up the computations needed to train a neural network). Since then, increasingly deep networks have been proposed and have been applied to detection, classification and segmentation. Some of the most popular deep learning models for detection are YOLO (Redmon et al., 2016) and Mask R-CNN (He et al., 2017). In the case of fish detection, the use of deep learning techniques is incipient and faces the additional problem that fish are not rigid objects and networks must learn how to adapt to changes in posture, position and scale. Nevertheless, fish recognition has been achieved using a binary classifier (Marini et al., 2018) or a neural network with only two convolutional layers (Qin et al., 2016). In French et al. (2015, 2019), a CNN was designed for counting fish in video. In addition, existing network architectures (e.g. LeNet, AlexNet, GoogLeNet, and YOLO) have been used for fish classification (Chen et al., 2017; Meng et al., 2018; Villon et al., 2018). In Monkman et al. (2019), the authors describe a system the measurement of fish detected using R-CNNs. The goal of the present study is to automatically obtain the fish length from fish box images obtained in the ports. In our case study, hakes are arranged inside a fish box in such a way that in most cases the tails are occluded and a complete view is available for a few fish only (see Figure 2). Accordingly, in the case of hake, the target object to be detected cannot be the whole fish but only a part. Fortunately, many complete heads are visible on the images; thus, fish heads have been the target object with which the network has been trained. In our implementation, we use a similar network architecture than recently published papers (French et al., 2019; Monkman et al., 2019), the main difference being the set of pictures used for training. This set of pictures must necessarily be different for each application, since the network must be fine-tuned for each specific task. Another important difference is the final goal of each system. In our case, we want to measure the fish, even when they are partially occluded. In (Monkman et al., 2019) they seek a similar goal, but they use pictures from online sources for training (not pictures from the auction centre, as we do), their CNN does not provide a segmentation of the images and, more importantly, the developed system cannot cope with occlusions. In French et al. (2019), the goal is to classify fish in video from CCTV cameras installed on fishing trawlers. Since their goal is identification and not measurement, they can use either partial detections or full detections, as long as the detected parts permit to identify the fish. Their system does not need to deal with the problem of inferring the whole length of the fish from the partial detections. Contrarily to previous published works on the subject of fish length estimation, the system proposed in the current paper is able to deal with partial occlusions and develops different statistical models that permit to estimate the total fish length from the length of the detected heads. Material and methods Images Three sets of images of hake boxes were used for the study. The photos were obtained with the same webcam (pixel resolution of 1280 × 760). The first set (562 images) of hake boxes was obtained at the conveyor belt in the auction centre of Palma. The camera was placed top-down, just over the fish boxes and the images were taken at the bidding moment, when the conveyor belt stops for a while. The second set (56 images) was obtained at the laboratory with the same camera setting. For the network implementation, 163 randomly selected images from the first set and 14 randomly selected images from the second set were used. Of these total 177 images of both sets used in the network, up to 2112 heads for the training steps and 490 heads for the validation steps of the network (a total of 2602 heads) were manually annotated using the LABELBOX software (https://labelbox.com/). The head has been defined here as the area from the mouth to the pelvic fin (Figure 1). Figure 1. Open in new tabDownload slide Head definition: from the mouth to the pelvic fin (modified from European Commission data; https://mare.istc.cnr.it/fisheriesv2/javax.faces.resource/images/species/HKE_l.jpg.xhtml). Figure 1. Open in new tabDownload slide Head definition: from the mouth to the pelvic fin (modified from European Commission data; https://mare.istc.cnr.it/fisheriesv2/javax.faces.resource/images/species/HKE_l.jpg.xhtml). The detection performance of the trained network was assessed using 42 images from the second set, containing 200 visible heads that had not been used for training. These 200 fish were also used to implement a statistical model relating the total fish length (in centimetres) to the head length (in pixels) measured from the output of the network. Finally, the third set (10 images) was also obtained at the conveyor belt in the auction centre. These images were neither used for training the network nor for building the statistical model. The model-based estimates of fish length (centimetres) obtained for the fish heads detected by the network on these images were compared with the actually measured fish lengths (centimetres) for assessing the accuracy and precision of the whole system. Figure 4 gives an overview of the sampling protocol used to train the neural network, build the statistical model and measure the performance of the proposed system. Provided that fishermen sort hake and most of the landings in boxes by species, there is no need for any preliminary classification task. Some examples of the images used as input of the system are displayed in Figure 2. Figure 2. Open in new tabDownload slide Examples of input images. Note that whereas the head is visible from a ventral side, the body tends to be partly hidden. Figure 2. Open in new tabDownload slide Examples of input images. Note that whereas the head is visible from a ventral side, the body tends to be partly hidden. Network implementation The network used (Mask R-CNN; He et al., 2017) is a simple, flexible, and general purpose network for object instance segmentation, which implies not only recognizing all the objects (or instances) from the target category in a given image but also accurately segmenting them. Mask R-CNN is based on Faster R-CNN (Ren et al., 2015), which focuses in object detection (i.e. each target object is enclosed into a rectangular bounding box), with an extension for creating a segmentation mask of the target object within the bounding box. Mask R-CNN consists of two CNNs that work in parallel: a “backbone architecture” for the extraction of features over the entire image and a “head architecture” for recognizing regions of interest and producing a mask over them. A scheme of the network architecture is displayed in Figure 3. The developers of Mask R-CNN have demonstrated that the proposed architecture outperforms more complex networks and that the best results are obtained with a ResNet-FPN (Lin et al., 2017) backbone with 101 layers and a fully convolutional network head consisting of six convolutional layers. Figure 3. Open in new tabDownload slide MASK R-CNN Architecture [modified from He et al. (2017)]. Figure 3. Open in new tabDownload slide MASK R-CNN Architecture [modified from He et al. (2017)]. The implementation of Mask R-CNN used is available at Github (https://github.com/matterport/Mask_RCNN). Moreover, the network uses pre-trained weights from the COCO dataset (a public dataset of images available at http://cocodataset.org/). However, these weights must be fine-tuned for each case-specific target using a user-defined dataset. The network has been fine-tuned using 2602 heads (Figure 4). Concerning the training process, the setting was 100 epochs, with 200 steps per epoch, and 50 validation steps. The learning rate was 0.002 and only weights of the head branch of the network were learned using the training set. Figure 4. Open in new tabDownload slide General overview of sampling design and analyses workflow. The figures in the top row describe the set of samples used for training and validation of the neural network (a) and for computing the parameters of the statistical model (b). The figure at the bottom (c) depicts the process for computing the system performance both at fish level and at box level (see text for details). Figure 4. Open in new tabDownload slide General overview of sampling design and analyses workflow. The figures in the top row describe the set of samples used for training and validation of the neural network (a) and for computing the parameters of the statistical model (b). The figure at the bottom (c) depicts the process for computing the system performance both at fish level and at box level (see text for details). Evaluation metrics The performance of the system is assessed in two ways. First, the detection performance of the network is evaluated in terms of the percentage of false positives among all the detected objects. A detection is deemed a false positive if the detected object is not a fish head; conversely, a false negative is a fish head that goes undetected. Note that in the specific context of our research, false positives are more relevant than false negatives since we have at our disposal a huge amount of data (a large number of boxes may be photographed each day, each box containing several fish) and, even if we miss some correct fish heads, provided that most of the detections are correct we shall be able to build an accurate statistical model of the fish length distribution. Secondly, the measurement performance of the system is assessed by comparing the obtained length estimations with the actual values of length, after manual measurement of the fish. The system computes the length of a fish as follows: (i) the length (in pixels) of the detected head is computed (let us denote this value HLpix), (ii) the length (in centimetres) of the detected head is computed from HLpix using a statistical model (let us denote this value HLcm), and (iii) the total length (TL, in centimetres) of the fish is inferred from HLcm using a second statistical model. The sampling protocol and the statistical models have been designed to deal with three sources of variability, namely (i) variability between the repeated estimates of HLpix from the same fish (i.e. repeated measures of HLpix for the same fish from different images after changing fish posture), (ii) variability related to the relationship between HLpix and HLcm, and (iii) variability related to the relationship between HLcm and TL. The statistical model is structured into three submodels. The first submodel assumes that the j repeated measurement for HLpix of the same fish i from different images after changing fish posture is normally distributed around a mean value HLpix¯ ⁠, with a standard deviation σHLpix (hereafter, overlined quantities refer to the expected value): HLpixi,j∼N(HLpix¯i,σHLpix). (1) To estimate the parameters of this submodel 135 measures of HLpix corresponding to 55 fish (i = 1–55) were obtained. Specifically, fish were visually labeled with an ID and were placed on fish boxes in groups of three to five fish. Several images from each group were taken after changing the posture of all fish in the box. Those images were submitted to the unsupervised routine for head detection described in the previous section. The median number of repeated measures of HLpix per fish was 3. Concerning the relationship between HLpix and HLcm (second submodel), a linear model with zero intercept was considered: HLcm¯i=βheadHLpix¯iHLcmi∼N(HLcm¯i,σHLcm) (2) where βhead is the slope of the linear relationship. To estimate the parameters of the submodel, HLcm values of the same 55 fish were measured with a ruler by the same observer. In a preliminary analysis, four repeated measures from 70 fish showed that the standard deviation of the observer’s measurement error of either HLcm or TL was 0.2 cm and that this measurement error was independent of the fish size. Accordingly, this uncertainty source was considered negligible and hereafter ignored. Concerning the relationship between HLcm and TL (third submodel), four linear models resulting from log-transforming or not these variables were compared. The model finally selected (see results in the next section) was as follows:  log TL¯i=αbody+βbody log(HLcmi) log TLi∼N(log TL¯i,σTL) (3) where αbody and βbody are the intercept and the slope of the linear relationship, respectively. To estimate the parameters of the submodel, TL values of the same 55 fish were measured with a ruler by the same observer (Figure 4). However, since that uncertainty at this level was larger than expected (see Figure 6), TL and HLcm were measured for 143 additional fish. Therefore, the sample size for the third submodel was 198 fish. The parameters of the integrated model (i.e. combining the three submodels into a single analysis) were estimated using a Bayesian approach and Markov chain Monte Carlo (MCMC) methods (Kruschke, 2010). Three independent chains were run. The convergence of the MCMC chains was assessed by visual inspection of the chains and was evaluated using the Gelman–Rubin statistic (Plummer et al., 2006). Virtually flat priors were used: normal distribution with zero mean and a huge variance was assumed for HLpix¯, βhead, αbody and βbody ⁠. Gamma distributions (rate = 0.01, scale = 0.01) were assumed for the tolerances of the three standard deviations (⁠ σHLpix, σHLcm and σTL ⁠). The posterior distribution was estimated from at least 30 000 valid iterations after appropriate burning (the first 10 000 iterations were not included) and thinning (only one of the ten iterations were kept because at this thinning level MCMC did not show autocorrelation). Additional technical details are available at the R script provided in the Supplementary material, which, along with the input data, allow reproducing the results reported here. The accuracy of the TL predictions obtained from new HLpix measures was assessed in two ways. First, randomly selected measures of HLpix for each one of the 55 fish available for submodels 1 and 2 were used to predict TL after properly propagating uncertainty at the three considered levels. The predicted value of TL was then compared with the actually measured TL by the observer (fish-level performance in Figure 4). Second, ten new images of hake boxes were obtained at the auction centre and were submitted to the unsupervised routine for fish head segmentation described in the previous section. Moreover, a random sample of the fish in each box was measured (TL, centimetres) by an observer. Provided that the fish for which HLpix was available may be different from the fish for which TL was available, accuracy of the mean fish size at the box level was assessed instead of fish-level accuracy (box-level performance in Figure 4). Results The Mask R-CNN was successfully implemented according with the developers specifications and fine-tuned with a data set composed of 2602 manually segmented heads. Concerning the detection performance of the implemented system, in the 42 photos used as input a total of 200 visible hake heads were identified by an observer. Assuming this figure as ground truth, the network correctly identified 175 hake heads, which represents a success rate of 87%. Concerning the false positives, two cases were detected (1%). Some examples of the output of the network are displayed in Figure 5. Figure 5. Open in new tabDownload slide Some segmentation results. Figure 5. Open in new tabDownload slide Some segmentation results. Regarding the measurement performance of the system (accuracy and precision attained when estimating the fish length itself), the relationship between HLcm and TL showed that the four linear models considered in the previous section (either using log-transformed values or not), had an excellent explanatory power, with r (Pearson correlation coefficient) larger than 0.9 (remark that throughout the article the terms bias and (in)accuracy and the terms variability and (im)precision are used interchangeably). However, log(TL) vs. log(HLcm) was finally selected because it showed the smallest deviance information criterion and normally distributed residuals. All the parameters of the model have been successfully estimated (Table 1) using the Bayesian approach described in the previous section. Table 1. Median and 95% Bayesian credibility interval of the posterior distribution for all the model parameters. Parameters 2.5% Median 97.5% R^ Neff αbody 1.387 1.468 1.548 1.001 30 000 βbody 0.955 0.998 1.041 1.001 30 000 βhead 0.109 0.110 0.112 1.001 30 000 σTL 0.080 0.088 0.098 1.001 30 000 σHLcm 0.082 0.168 0.266 1.001 10 000 σHLpix 2.724 3.153 3.674 1.001 16 000 Parameters 2.5% Median 97.5% R^ Neff αbody 1.387 1.468 1.548 1.001 30 000 βbody 0.955 0.998 1.041 1.001 30 000 βhead 0.109 0.110 0.112 1.001 30 000 σTL 0.080 0.088 0.098 1.001 30 000 σHLcm 0.082 0.168 0.266 1.001 10 000 σHLpix 2.724 3.153 3.674 1.001 16 000 Values of R^ close to 1 denote convergence of the MCMC chains. Neff is a measure of the effective sample size on the posterior distribution. Note that σ values are at different scales and are not directly comparable. Open in new tab Table 1. Median and 95% Bayesian credibility interval of the posterior distribution for all the model parameters. Parameters 2.5% Median 97.5% R^ Neff αbody 1.387 1.468 1.548 1.001 30 000 βbody 0.955 0.998 1.041 1.001 30 000 βhead 0.109 0.110 0.112 1.001 30 000 σTL 0.080 0.088 0.098 1.001 30 000 σHLcm 0.082 0.168 0.266 1.001 10 000 σHLpix 2.724 3.153 3.674 1.001 16 000 Parameters 2.5% Median 97.5% R^ Neff αbody 1.387 1.468 1.548 1.001 30 000 βbody 0.955 0.998 1.041 1.001 30 000 βhead 0.109 0.110 0.112 1.001 30 000 σTL 0.080 0.088 0.098 1.001 30 000 σHLcm 0.082 0.168 0.266 1.001 10 000 σHLpix 2.724 3.153 3.674 1.001 16 000 Values of R^ close to 1 denote convergence of the MCMC chains. Neff is a measure of the effective sample size on the posterior distribution. Note that σ values are at different scales and are not directly comparable. Open in new tab For assessing the accuracy of the measures at the fish level (degree of closeness of estimates of a quantity to that quantity’s true value), TLtrue of 55 fish ranging from 20 to 27.5 cm (i.e. the actually measured length) was compared with TLest ⁠, the value estimated from HLpix by the model. The obtained results are displayed in Figure 6. The root-mean-square deviation (RMSD) was 1.7 cm, and the median of the unsigned deviations was 1.1 cm, suggesting that the system is accurate. However, precision (dispersion of predicted values for a given observed value) should be improved because the averaged interquartile range was 10.0 cm. Note that one random repeated measure of HLpix was used for assessing the precision and that the uncertainty at the three levels considered (posture-related error when measuring HLpix, imperfect relationship between HLpix and HLcm, and imperfect relationship between HLcm and TL) has been properly propagated. Thus, this precision estimate is the expected when a new value of HLpix will be used for estimating TL. Figure 6. Open in new tabDownload slide Variability at the three levels of uncertainty considered: (a) posture-related variability of the repeated measures of HLpix, horizontal lines connecting the repeated measures of the same fish head; (b) relationship between HLpix and HLcm; and (c) relationship between HLcm and TL. Figure 6. Open in new tabDownload slide Variability at the three levels of uncertainty considered: (a) posture-related variability of the repeated measures of HLpix, horizontal lines connecting the repeated measures of the same fish head; (b) relationship between HLpix and HLcm; and (c) relationship between HLcm and TL. Finally, the system performance for estimating TL from HLpix at the box level was assessed using ten new fish boxes sampled at the auction centre (Figure 4). Two independent samples of fish from each box were used for estimating HLpix and manually measured (observer) to obtain TLtrue ⁠. Again, the total fish length was estimated from HLpix using the model described in the previous section. The observed vs. estimated box-level mean fish lengths are shown in Figure 7. In that case, RMSD was 1.9 cm, the median of the unsigned deviations was 0.5 cm, and the maximum deviation reported was 4.0 cm, suggesting that the system is accurate at the box level too. Figure 7. Open in new tabDownload slide Observed vs. estimated box-level mean fish length. Each point represents the mean of fish length of a sample of fish in that box. The dashed lines around a point denote the between-fish standard deviation in each box. The numbers denote the sample size (number of fish) used for estimating those mean and standard deviation. Note that the fish measured are not necessarily the same fish detected by the network; thus, sample sizes may differ. The thick line denotes perfect agreement between observed and estimated fish length. Figure 7. Open in new tabDownload slide Observed vs. estimated box-level mean fish length. Each point represents the mean of fish length of a sample of fish in that box. The dashed lines around a point denote the between-fish standard deviation in each box. The numbers denote the sample size (number of fish) used for estimating those mean and standard deviation. Note that the fish measured are not necessarily the same fish detected by the network; thus, sample sizes may differ. The thick line denotes perfect agreement between observed and estimated fish length. Discussion In line with many other successful applications of deep learning in a wide range of domains, in this study, we implemented an automatic system that uses images of captured fish at landing for identifying fish heads on those images and estimating fish length from head length. The core component of the system is a deep neural network that permits to detect and delineate the contour of the objects of interest, or instances in the deep learning jargon. Here, instead of developing a new network from scratch, a pre-trained Mask R-CNN network was successfully implemented for identifying hake heads. This strategy implies that the network training must be fine-tuned with a relatively large data base of examples of hake heads. In this case, the contours of 2.602 hake heads have been manually segmented from images. The performance of the Mask R-CNN network implemented in this way for detecting hake heads is noteworthy. The majority of the heads in an image are properly detected (87%) but more interestingly for the specific case of study here, the ratio of false positives is negligible (1%). The specificities of the case prevent whole fish contours form being efficiently detected on the images, as fish tails are systematically occluded when fishermen prepare the fish boxes. Certainly, other species are not sorted in this careful way, but even when the contour of the whole fish is visible on an image, the flexible nature of fish would complicate the performance of the detection step for the Mask R-CNN because the neural network should be taught with differently bent fish. Conversely, the rigid nature of fish heads alleviates this problem but introduces a new handicap because the final objective here is not to detect fish heads but to estimate fish size. The former output of Mask R-CNN (segmentation mask of the pixels belonging to a given head) was first used to extract head length in pixels. The second step was to transform this head measure from pixels to cm and the final step was to infer fish length from head length. Provided that each one of these steps may introduce some uncertainty, a validation protocol has been implemented for assessing the overall performance of the system in terms of accuracy and precision, as well as for providing a reliable confidence interval for fish length estimates when new measures of head length were provided by the net. A sample of fish has been measured by an observer with a ruler and these empirical measurements have been compared with the estimates provided by the system developed here. The median of the unsigned measured-estimated differences was 1.1 cm at the fish level and 1.9 cm at the box level (i.e. mean fish length), suggesting that the system should be considered accurate at least at the mid-range of the considered sizes. However, our results show that there is room for improving the precision of the system. Individual level precision (measured as the interquartile range) for a newly measured fish length was around ±10.0 cm for fish in the range of 20–27.5 cm. The estimates of the standard deviation related to the three variability sources considered suggest that they are similarly contributing to this suboptimal precision at the fish level. Uncertainty related to the head posture may be specially relevant. It is plausible that precise delineation of fish head contour may depend on the fish posture, thus increasing the number of examples of heads in different postures when training the network may alleviate the problem. The morphometric relationship between head size and fish length reported in this study shows larger uncertainty than the one reported elsewhere for the same species and similar length range [Pearson product-moment correlation coefficient, r = 0.92 in our study and r = 0.95 and 0.97 in Šantić et al. (2011) and Philips (2014)]. Certainly, these contributions suggest some sex-related effects, which have not been accounted for in the current context. As stated earlier, conventional, observer-based assessment of fish length is precise at fish level but, due to the inherent high costs of manual sampling, sample size will be by far smaller than the massive sample size that can be potentially processed using deep learning. Proper comparison of the effects of using observer-based data vs. deep learning data when assessing fish stock dynamics is out of the scope of this contribution but certainly deserves further attention. The hierarchical Bayesian framework proposed here is not only appropriate for providing reliable confidence intervals at fish level. Moreover, it can be expanded for properly propagating such a fish level uncertainty to the fish box level, the boat level, the day level, or any other relevant scale that might be of interest in other case studies. Specifically, in the context of understanding fish population dynamics and taking informed management decisions on exploited stocks, a relatively low precision at the fish level may be largely compensated with a massive amount of data at upper scales. It is in this context that the advancement in marine science is foreseen to be boosted in the next few years thanks to the capacity of generating massive amounts of data from automatic sensors coupled to high-power computation capabilities (Danovaro et al., 2017; Lowerre-Barbieri et al., 2019). Many techniques associated to the Artificial Intelligence are not new in marine science (e.g. simple neural networks, decision trees, and Bayesian networks). Many of these techniques are used for ecosystem modelling purposes, spatial planning, decision-making, etc. (e.g. Fernandes et al., 2010). However, the field of deep learning is advancing at a greater pace, as in particular those applications related to image processing. Until now, most applications of image classification in marine ecology were semi-supervised or supervised (e.g. Marini et al. 2016, 2018; Díaz-Gil et al., 2017). Through deep learning, we exploit the structural characteristics of data and make use of computation capabilities (Hu et al., 2014), which, in our case study, may offer a better performance than other, more conventional ways of data extraction. We clearly demonstrate that by using this method, even when only a percentage of the fish in each box can be correctly identified, opens the opportunity to massive fish length sampling of many commercially valuable species, without interfering with wharf or fishing operations and activity. Provided that an image of each fish box can be easily obtained and stored when the conveyor belt stops for bidding, the estimate number of pictures (each box) per day that currently are arriving to our system are in the order of thousands. This knowledge may enable to improve the current biological evaluation models based in size, to explore short-term effects of the environment on the species, the control of undersized individuals, or even the analysis of price dynamics within the season in relation to size. To this end, we have detected a very positive attitude from the fishery sector. Both fishermen associations and the wharf owners have facilitated and supported the initiative for extracting lengths automatically from boxes. This suggests that further development of these techniques in the near future is guaranteed. According to the above, several near-future improvements are envisaged, including (i) the detection of different species in the same image (some boxes contain a mixture of species), (ii) the automatic calibration of cameras for the conversion from pixel unit lengths to centimeters, and (iii) an improvement of the precision of the estimation of total fish lengths from pelvic lengths. Supplementary data Supplementary material is available at the ICESJMS online version of the manuscript. Acknowledgements This work has been funded by the projects FOTOPEIX and FOTOPEX2 (2017/2279 and 2018/2002) from Fundación Biodiversidad, through the Pleamar Program. We specially thank OPMALLORCAMAR and Direcció General de Pesca del Govern de les Illes Balears for supporting these projects. The work of J-LL was partially supported by grants TIN2017-85572-P and DPI2017-86372-C3-3-R (MINECO/AEI/FEDERUE). This is a contribution of the Unitat Associada IMEDEA-LIMIA. References Abdullah N. , Shafry M. , Rahim M. , Amin I. M. 2009 . Measuring fish length from digital images (FileDi). In Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, pp. 38 – 43 . ACM. Al-Jubouri Q. , Al-Nuaimy W. , Al-Taee M. , Young I. 2017 . An automated vision system for measurement of zebrafish length using low-cost orthogonal web cameras . Aquacultural Engineering , 78 : 155 – 162 . Google Scholar Crossref Search ADS WorldCat Arnarson H. , Bengoetxea K. , Pau L. 1991 . Vision applications in the fishing and fish product industries . International Journal of Pattern Recognition and Artificial Intelligence , 2 : 657 – 671 . Google Scholar Crossref Search ADS WorldCat Chen G. , Sun P. , Shang Y. 2017 . Automatic fish classification system using deep learning. In 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 24 – 29 . Costa C. , Loy A. , Cataudella S. , Davis D. , Scardi M. 2006 . Extracting fish size using dual underwater cameras . Aquacultural Engineering , 35 : 218 – 227 . Google Scholar Crossref Search ADS WorldCat Dalal N. , Triggs B. 2005 . Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 1, pp. 886 – 893 . Danovaro R. , Aguzzi J. , Fanelli E. , Billett D. , Gjerde K. , Jamieson A. , Ramirez-Llodra E. , et al. 2017 . An ecosystem-based deep-ocean strategy . Science , 355 : 452 – 454 . Google Scholar Crossref Search ADS PubMed WorldCat d’Armengol L. , Castillo M. P. , Ruiz-Mallén I. , Corbera E. 2018 . A systematic review of co-managed small-scale fisheries: social diversity and adaptive management improve outcomes . Global Environmental Change , 52 : 212 – 225 . Google Scholar Crossref Search ADS WorldCat Díaz-Gil C. , Smee S. L. , Cotgrove L. , Follana-Berná G. , Hinz H. , Marti-Puig P. , Grau A. , et al. 2017 . Using stereoscopic video cameras to evaluate seagrass meadows nursery function in the Mediterranean . Exported , 164 : 137. https://app.dimensions.ai (last accessed 3 May 2019). WorldCat FAO. 2016 . The State of Mediterranean and Black Sea Fisheries. Technical Report, FAO. FAO. 2018 . The State of World Fisheries and Aquaculture (SOFIA). Technical Report, FAO. Fernandes J. A. , Irigoien X. , Goikoetxea N. , Lozano J. A. , Inza I. , Pérez A. , Bode A. 2010 . Fish recruitment prediction, using robust supervised classification methods . Ecological Modelling , 221 : 338 – 352 . Google Scholar Crossref Search ADS WorldCat French G. , Fisher M. , Mackiewicz M. , Needle C. 2015 . Convolutional neural networks for counting fish in fisheries surveillance video. In Proceedings of the Machine Vision of Animals and Their Behaviour (MVAB), pp. 7.1 – 7.10 . BMVA Press. French G. , Mackiewicz M. , Fisher M. , Holah H. , Kilburn R. , Campbell N. , Needle C. 2019 . Deep neural networks for analysis of fisheries surveillance video and automated monitoring of fish discards . ICES Journal of Marine Science , doi: 10.1093/icesjms/fsz149. WorldCat Hardin R. W. 2006 . Vision system monitors fish populations . Vision Systems Design , 11 : 43 – 45 . WorldCat He K. , Gkioxari G. , Dollár P. , Girshick R. 2017 . Mask R-CNN. In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2980 – 2988 . Hilborn R. 2007 . Managing fisheries is managing people: what has been learned? Fish and Fisheries , 8 : 285 – 296 . Google Scholar Crossref Search ADS WorldCat Hu H. , Wen Y. , Chua T. , Li X. 2014 . Toward scalable systems for big data analytics: a technology tutorial . IEEE Access , 2 : 652 – 687 . Google Scholar Crossref Search ADS WorldCat Iudicello S. , Weber M. L. , Wieland R. 2012 . Fish, Markets, and Fishermen: The Economics of Overfishing . Island Press, Washington, DC . Google Preview WorldCat COPAC Krizhevsky A. , Sutskever I. , Hinton G. E. 2012 . ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems—Volume 1 , NIPS’12, pp. 1097 – 1105 . Curran Associates Inc., USA. Kruschke J. 2010 . Doing Bayesian Data Analysis: A Tutorial Introduction with R . Academic Press, Cambridge, MA, USA . Google Preview WorldCat COPAC Lecun Y. , Bottou L. , Bengio Y. , Haffner P. 1998 . Gradient-based learning applied to document recognition . Proceedings of the IEEE , 86 : 2278 – 2324 . Google Scholar Crossref Search ADS WorldCat Lin T. , Dollár P. , Girshick R. , He K. , Hariharan B. , Belongie S. 2017 . Feature pyramid networks for object detection . In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 936 – 944 . WorldCat Lowerre-Barbieri S. K. , Catalán I. A. , Frugård Opdal A. , Jørgensen C. 2019 . Preparing for the future: integrating spatial ecology into ecosystem-based management . ICES Journal of Marine Science , 76 : 467 – 476 . Google Scholar Crossref Search ADS WorldCat Marini S. , Azzurro E. , Coco S. , Del Rio J. , Enguídanos S. , Fanelli E. , Nogueras M. , et al. 2016 . Automatic fish counting from underwater video images: performance estimation and evaluation. In 7th International Workshop on Marine Technology. Marini S. , Fanelli E. , Sbragaglia V. , Azzurro E. , Del Rio Fernandez J. , Aguzzi J. 2018 . Tracking fish abundance by underwater image recognition . Scientific Reports , 8 : 13748. Google Scholar Crossref Search ADS PubMed WorldCat Martinez-de Dios J. R. , Serna C. , Ollero A. 2003 . Computer vision and robotics techniques in fish farms . Robotica , 21 : 233 – 243 . Google Scholar Crossref Search ADS WorldCat Maynou F. , Morales-Nin B. , Cabanellas-Reboredo M. , Palmer M. , García E. , Grau A. 2013 . Small-scale fishery in the Balearic Islands (W Mediterranean): a socio-economic approach . Fisheries Research , 139 : 11 – 17 . Google Scholar Crossref Search ADS WorldCat Meng L. , Hirayama T. , Oyanagi S. 2018 . Underwater-drone with panoramic camera for automatic fish recognition based on deep learning . IEEE Access , 6 : 17880 – 17886 . Google Scholar Crossref Search ADS WorldCat Miranda J. M. , Romero M. 2017 . A prototype to measure rainbow trout’s length using image processing . Aquacultural Engineering , 76 : 41 – 49 . Google Scholar Crossref Search ADS WorldCat Monkman G. G. , Hyder K. , Kaiser M. J. , Vidal F. P. 2019 . Using machine vision to estimate fish length from images using regional convolutional neural networks . Methods in Ecology and Evolution , doi: 10.1111/2041-210X.13282. WorldCat Morales-Nin B. , Grau A. , Palmer M. 2010 . Managing coastal zone fisheries: a mediterranean case study . Ocean & Coastal Management , 53 : 99 – 106 . Google Scholar Crossref Search ADS WorldCat Palmer M. , Quetglas A. , Guijarro B. , Moranta J. , Ordines F. , Massutí E. 2009 . Performance of artificial neural networks and discriminant analysis in predicting fishing tactics from multispecific fisheries . Canadian Journal of Fisheries and Aquatic Sciences , 66 : 224 – 237 . Google Scholar Crossref Search ADS WorldCat Palmer M. , Tolosa B. , Grau A. , Mar Gil C. d. , Obregón M. , Morales-Nin B. 2017 . Combining sale records of landings and fishers knowledge for predicting métiers in a small-scale, multi-gear, multispecies fishery . Fisheries Research , 195 : 59 – 70 . Google Scholar Crossref Search ADS WorldCat Petrell R. , Shi X. , Ward R. , Naiberg A. , Savage C. 1997 . Determining fish size and swimming speed in cages and tanks using simple video techniques . Aquacultural Engineering , 16 : 63 – 84 . Google Scholar Crossref Search ADS WorldCat Philips A. E. 2014 . Comparison of some biological aspects between the two sexes of the European hake Merluccius merluccius from the Egyptian Mediterranean waters . The Egyptian Journal of Aquatic Research , 40 : 309 – 315 . Google Scholar Crossref Search ADS WorldCat Plummer M. , Best N. , Cowles K. , Vines K. 2006 . CODA: convergence diagnosis and output analysis for MCMC . R News , 6 : 7 – 11 . WorldCat Qin H. , Li X. , Liang J. , Peng Y. , Zhang C , Recent Developments on Deep Big Vision. 2016 . Deepfish: accurate underwater live fish recognition with a deep architecture . Neurocomputing , 187 : 49 – 58 . Google Scholar Crossref Search ADS WorldCat Redmon J. , Divvala S. , Girshick R. , Farhadi A. 2016 . You only look once: unified, real-time object detection, Proceddings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779 – 788 . Reglero P. , Morales-Nin B. 2008 . Relationship between first sale price, body size and total catch of trammelnet target species in Majorca (NW Mediterranean) . Fisheries Research , 92 : 102 – 106 . Google Scholar Crossref Search ADS WorldCat Ren S. , He K. , Girshick R. , Sun J. 2015 . Faster R-CNN: towards real-time object detection with region proposal networks . IEEE Transactions on Pattern Analysis and Machine Intelligence , 39 : WorldCat Šantić M. , Rad¯a B. , Paladin A. , Čurić A. 2011 . Biometric properties of the European hake, Merluccius merluccius (Osteichthyes: Merlucciidae), from the central Adriatic Sea . Archives of Biological Sciences , 3 : 259 – 267 . Google Scholar Crossref Search ADS WorldCat Strachan N. 1993 . Length measurement of fish by computer vision . Computers and Electronics in Agriculture , 8 : 93 – 104 . Google Scholar Crossref Search ADS WorldCat Tillett R. , McFarlane N. , Lines J. 2000 . Estimating dimensions of free-swimming fish using 3d point distribution models . Computer Vision and Image Understanding , 79 : 123 – 141 . Google Scholar Crossref Search ADS WorldCat Villon S. , Mouillot D. , Chaumont M. , Darling E. S. , Subsol G. , Claverie T. , Villéger S. 2018 . A deep learning method for accurate and fast identification of coral reef fishes in underwater images . Ecological Informatics , 48 : 238 – 244 . Google Scholar Crossref Search ADS WorldCat Viola P. , Jones M. J. 2004 . Robust real-time face detection . International Journal of Computer Visio , 57 : 137 – 154 . Google Scholar Crossref Search ADS WorldCat White D. , Svellingen C. , Strachan N. 2006 . Automated measurement of species and length of fish by computer vision . Fisheries Research , 80 : 203 – 210 . Google Scholar Crossref Search ADS WorldCat Zion B. , Alchanatis V. , Ostrovsky V. , Barki A. , Karplus I. 2007 . Real-time underwater sorting of edible fish species . Computers and Electronics in Agriculture , 56 : 34 – 45 . Google Scholar Crossref Search ADS WorldCat © International Council for the Exploration of the Sea 2019. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Image-based, unsupervised estimation of fish size from commercial landings using deep learning JF - ICES Journal of Marine Science: Journal du Conseil DO - 10.1093/icesjms/fsz216 DA - 1999-02-01 UR - https://www.deepdyve.com/lp/oxford-university-press/image-based-unsupervised-estimation-of-fish-size-from-commercial-0W0wzrFwSb SP - 1 VL - Advance Article IS - DP - DeepDyve ER -