TY - JOUR AU - Ruan,, Chengzhi AB - Abstract Swift, non-destructive detection approaches should address the problem of insufficient sensitivity when attempting to obtain and perceive live crab information in low-light environments caused by the crab’s phototaxis. We propose a learning-based low-illumination image enhancer (LigED) for effective enhanced lighting and elimination of darkness in images. The camera response function was combined with the reflectance ground-truth mechanism of image decomposition. Self-attention units were then introduced in the reflectance restoration network to adjust the illumination to avoid visual defects, thus jointly strengthening the adaptability of dark-light enhancement and ability to perceive crab information. Convolutional neural network (CNN)-based detection methods can further enhance the algorithm’s robustness to light and adaptability to different environments, which motivated the development of a scalable lightweight live crab detector (EfficientNet-Det0) utilizing the two-stage compound scaling CNN approach. The lightness order error and natural image quality evaluator based on the proposed methods were 251.26 and 11.60, respectively. The quality of average precision detection increased by 13.84–95.40%. The fastest detection speed of a single image was 91.74/28.41 f·s−1 using a common GPU/CPU, requiring only 15.1 MB of storage, which advocates for the utilization of LigED and EfficientNet-Det0 for the efficient detection of underwater live crabs. Introduction Eriocheir sinensis is the most productive freshwater crab owing to its edibility, medicinal properties, and industrial value (Zhao et al., 2019). However, the uneven distribution (underwater ecological environment, strong territorial awareness) of river crabs lead to significant differences in the required feeding density in the pond (Li et al., 2017; Shi et al., 2018; Cao et al., 2021). If this feedback was not considered, the disadvantages of insufficient culture feeding management, low bait utilization, and low automation cannot be addressed. Thus, it is necessary to detect the density distribution of river crabs under natural living conditions and scientifically determine the feeding amount and feeding suggestions. However, it is difficult to estimate the river crab biomass and density distribution without human intervention because river crabs are sensitive and move freely in an unconstrained underwater environment in which the visibility and lighting are uncontrollable. So far, river crab biomass estimation has been mainly based on manual experience and manual sampling, which generally are invasive, time-consuming, and labour-intensive (Li et al., 2020a). Therefore, it is imperative and highly desirable to develop a non-invasive, rapid, cost-effective, and light-robust method, which is suitable for low-illumination environments. Related literature Machine vision can be used to develop non-intrusive, faster, and cheaper methods for the in situ estimation of the river crab biomass and provides a solution for the scientific analysis of the growth status of underwater organisms (Terayama et al., 2019). In machine vision, cameras are used to automatically acquire videos/images and analyse and extract key information (Li et al., 2020b). Most studies involve a two-dimensional (2D) approach for the application to fishery cultures and marine biological observations (Li et al., 2020c). The most common red/green/blue (RGB) camera sensors are utilized to analyse 2D images based on the colour, geometry, texture, and other visual characteristics of fish organisms to identify, distinguish, and locate fish (Gunnam and Shin, 2016; Costa et al., 2019). However, RGB cameras have two notable disadvantages, that is, they are sensitive to lighting conditions and only provide 2D information (without expensive stereo technology; Gené-Mola et al., 2019). Other more expensive cameras, including thermal imagers, multispectral cameras, and hyperspectral cameras, recognize and locate fish by utilizing the temperature or reflectivity of different wavelengths; however, they cannot provide three-dimensional (3D) information (Pettersen et al., 2019). Based on triangulation techniques, laser rangefinders, light detection and ranging (LiDAR)-based systems, and RGB-Depth (RGB-D) cameras, more abundant features can be extracted from acquired 3D data to overcome various difficulties of 2D imaging (Saberioon and Cisar, 2016). Although there are many potential advantages in fish classification and marine life monitoring, such methods have not yet been popularized due to their high cost, complex computations, and slow operation speed (compared with 2D methods; Kawahara et al., 2016; Zhang and Gao, 2020). Furthermore, machine vision-based crab biomass detection tasks largely depend on a good image quality. However, restricted by the sensitivity of 2D image acquisition equipment to light (low illumination) and limitations with respect to the culture pond environment (poor water quality) during the image acquisition, acquired images (Figure 1) have a low contrast, low brightness, noise, colour shift, and artefacts (Ji et al., 2018; Cao et al., 2020), which leads to challenges with respect to subsequent recognition tasks (Atoum et al., 2020). In particular, the auxiliary light utilized for the collection of river crabs must be low illumination because crabs prefer to gather in low-light environments, while they fear bright light and will flee. Thereby, the crab images captured by underwater 2D cameras, which reflect the natural living state, are low-illumination images, which aggravates difficulties with respect to crab recognition and detection (Ruan et al., 2016; Zhao et al., 2019). Consequently, a simple, fast, and effective low-illumination enhancement algorithm must be developed to provide sufficient and accurate image feature information. Figure 1. Open in new tabDownload slide Example images acquired in the low-light environment, with the various challenges associated with detecting crabs in these conditions: (a) different postures, (b) various occlusion, and (c) complex background. Figure 1. Open in new tabDownload slide Example images acquired in the low-light environment, with the various challenges associated with detecting crabs in these conditions: (a) different postures, (b) various occlusion, and (c) complex background. Various techniques have been developed to improve the quality of low-illumination images. Histogram equalization (HE) is an approach that is widely utilized for less distorted images due to its simplicity; however, it is not ideal for seriously distorted low-illumination images. Other examples are Retinex-based methods, which assume that an image is composed of reflectance and illumination, eliminating its effect on imaging by performing prior parameter estimation for ambient illumination. However, colour distortion and excessive enhancement often occur, which do not reflect the real situation (Zhang et al., 2017). An alternative to address issues related to non-uniform illumination is to analyse the physical lighting of low-light images. Multiple exposure ratio maps obtained by using illumination estimation techniques and camera response function model (CRF) are incorporated to address the light distortion of ordinary RGB cameras. High-quality images can be obtained with professional cameras, but they are also limited by intermediate fusion sources (Ying et al., 2017b; Hao et al., 2018). Most recently, supervised learning-based methods have been developed, which can be used to directly restore low- to normal-light images in a data-driven way. In contrast to traditional low-light enhancement algorithms, these methods do not heavily rely on prior parameter estimation and tuning but reflect the illumination and noise distribution of low-light images. Lore et al. (2017) and Shen et al. (2017) directly learned end-to-end mapping of low- to normal-light images. Wei et al. (2018) and Wang et al. (2019b) learned the illumination estimation model that contains specialized low-level vision tasks between low- and normal-light image pairs. However, the above-mentioned learning-based methods require that the real normal-light image corresponds to the low-light image for supervised learning, which is difficult to obtain. Jiang et al. (2019) proposed an effective unsupervised generative adversarial network that can be trained without low-/normal-light image pairs. Although it generalizes various real-world images, its complexness and network scale are unsuitable for high-integration applications of mobile devices. In addition, the small sizes of crabs, large variations in limb postures, and similar joints and abnormal shapes of crab claws and crab legs (Figure 1) make it difficult for traditional target recognition methods to meet the demands of underwater detection of free live crabs (Zhao et al., 2020a). There has been a gradual shift from traditional methods, such as the histogram of oriented gradient (HOG), to CNNs, which can extract abstract high-dimensional features and are light-robust and adaptable to different environments. Therefore, CNNs are used for more complex object detection (Duan et al., 2019; Guo et al., 2020) such as classification, positioning, and detection applications for fish (Álvarez-Ellacuría et al., 2020; Salman et al., 2020), sea cucumber (Qiao et al., 2019), fruit (Ji et al., 2019; Liu et al., 2019), livestock (Chen et al., 2020), and fishing nets (Zhao et al., 2020b). Most of the above-mentioned detection targets are relatively regular in shape and do not involve the detection of live crabs with irregularities. However, the storage and computation of the above-mentioned neural network models using mobile devices remain great challenges due to the limitations of the storage space and power consumption. Thus, an industrial-grade lightweight neural network model must be designed that ensures (or slightly sacrifices) the accuracy of crab detection. The design of small lightweight neural network models for mobile devices must consider four aspects (Tan and Le, 2019; Xiong et al., 2020): manual design of network computation methods, neural architecture search, deep compression of the CNN, and automatic model compression based on automated machine learning. Contributions To consider the above-mentioned factors and deficiencies of crab farming, a systematic approach was established in this study, which is based on a simple yet practical image enhancement network (called LigED) and small-sized efficient detection network (called EfficientNet-Det0), which are used to eliminate the darkness in low-illumination images and detect underwater free live crabs, respectively. The approach is mainly based on the convolutional low-light enhancer KinD (Zhang et al., 2019) and compound scaling neural network EfficientNet (Tan and Le, 2019). In addition, a dataset consisting of 4120 low-illumination underwater live crab samples was constructed by shooting in the pond using 2D cameras. The main contributions of this study can be summarized as follows: (i) LigED trained the dark-light model utilizing a low-light image and fused strong-light image, which was generated from a low-light image through exposure simulation of CRF. The reflectance obtained from the fused normal-illumination image coupling decomposition was then utilized as the ground-truth (GT), which completely differed from the usage of the real normal-light image as the low-light image GT. Thus, this jointly overcomes the limitation where the learning-based low-illumination enhancement algorithm must use distinct image pairs (real low-light/normal-light image) for supervised learning. (ii) The multi-scale illumination self-attention units were introduced to the reflectance restoration network to avoid visual defects caused by the blind removal of dark light. (3) A series of more efficient and accurate scalable, industrial-grade, lightweight crab detectors were built utilizing the local 3D scaling coefficient and global composite scaling factor within the constraints of various resources. Material and methods Dataset acquisition Underwater live crab videos/images reflecting the natural living state of pond crabs were collected in three ponds in the crab-breeding demonstration base of Changzhou City, Jiangsu Province, China (Figure 2a and b). Each pond has a water depth of 0.5–2.0 m, covers an area of 16.7 km2, and hosts ∼1430 crabs per km2. Based on the foraging habits of crabs in the evening and early morning, videos were typically acquired by an automatic feeding boat equipped with a turbid underwater 2D camera system and auxiliary low lighting (Figure 2c and d) in accordance with the internal-spiral baiting trajectory (Figure 2e). Second, a handheld underwater 2D monitoring system was used to conduct a secondary manual supplementary collection in areas that may not have been photographed using artificial pole-vaulting boats (Figure 2f and g). Figure 2. Open in new tabDownload slide Video acquisition system for underwater live crab. (a) Experimental site; (b) Experimental ponds; (c, d) video acquisition and related equipment equipped on automatic feeding boat; (e) internal-spiral motion trajectory of automatic feeding boat; (f, g) video acquisition and related equipment when artificial pole-vaulting boats. Figure 2. Open in new tabDownload slide Video acquisition system for underwater live crab. (a) Experimental site; (b) Experimental ponds; (c, d) video acquisition and related equipment equipped on automatic feeding boat; (e) internal-spiral motion trajectory of automatic feeding boat; (f, g) video acquisition and related equipment when artificial pole-vaulting boats. To ensure that the images were rich and diverse, the river crab videos were sampled every ten frames. Each colour image had a resolution of 1920 × 1080 pixels. For image filtering, it is not necessary to build a dataset with a very balanced sample size because only targets (e.g. river crabs) must be detected. Only images with targets were selected, but differences in the crab sample pixels and representativeness were considered. Thus, a total of 4120 images were selected as the original sample dataset (Figure 3a). To obtain the enhanced image with the desired visual characteristics corresponding to the low-light image, 4120 low-light/fused-exposure image pairs were generated for the training and testing of the dark-light enhancer LigED. Finally, the image annotation tool, LabelImg (MIT), was utilized to label the crabs using rectangular-bounding annotation. The labelled dataset was used for the training and testing of the object detector EfficientNet-Det0 (Figure 3a and b). Challenges in crab detection are due to distinctive crab claws and similar crab legs. If all of them are framed, the area marked by the rectangle box will contain a large number of complex and useless background features, which will interfere with the training and affect the performance of the model. Therefore, in this study, mainly the body of the crab and one pair of distinctive crab claws were considered during the manual labelling of the crab and appropriately includes part of the crab legs. The detailed labelling procedure is shown in Figure 3. Figure 3. Open in new tabDownload slide Datasets used for the study. (a) Individual frames were extracted from videos and low-light enhanced and labelled before applying them to train and test detection models; (b) datasets for low-light enhancement (paired) and object detection (labelled). Figure 3. Open in new tabDownload slide Datasets used for the study. (a) Individual frames were extracted from videos and low-light enhanced and labelled before applying them to train and test detection models; (b) datasets for low-light enhancement (paired) and object detection (labelled). Overview of the methodology To effectively detect the distribution of live crabs in ponds under low-illumination conditions in real-time, an automatic feeding boat was used for scientific and accurate feeding. We established a novel method to achieve this goal. As schematically illustrated in Figure 4, the method consists of two functions handling low-illumination conditions (Figure 4a) and detecting underwater free live crabs (Figure 4b), respectively. In the following two sections, the functionality and architecture of the method are explained in detail. Figure 4. Open in new tabDownload slide Systematic overview and purpose of the proposed method. (a) Handling the influence of low-illumination (LigED); (b) real-time object detector for underwater free live crabs (EfficientNet-Det0); (c) potential-industrial application: guide the remote monitoring and feeding of automatic feeding boats utilizing LigED and EfficientNet-Det0. Figure 4. Open in new tabDownload slide Systematic overview and purpose of the proposed method. (a) Handling the influence of low-illumination (LigED); (b) real-time object detector for underwater free live crabs (EfficientNet-Det0); (c) potential-industrial application: guide the remote monitoring and feeding of automatic feeding boats utilizing LigED and EfficientNet-Det0. Handling the low-illumination influence Because of the different underwater lighting scenes and various imaging factors in the crab pond, it is difficult to guarantee real-time, flexible camera parameters conforming to the photographing rules. Therefore, the crab foraging videos obtained under low-light pond conditions are characterized by a low visibility and poor quality, which are partly due to video collection demands to meet the time requirements of crab foraging and feeding (evening and early morning) and underwater scene demands for the proper low-light supplementation using illuminating equipment. Accordingly, unnatural light is introduced to video shooting, which inevitably leads to problems with respect to a low brightness and low contrast caused by various unnatural effects. In addition, cameras must use an appropriate sensitivity and short exposure (long exposure requires the scene to be stationary) considering the water quality of the turbid pond and speed of the automatic feeding boat (∼1 m/s), which inevitably increases the noise and generates artefacts, resulting in hidden visual information. Although it is challenging, the image contrast can be enhanced by tone mapping and multi-exposure fusion. Unfortunately, this requires a camera with an advanced imaging mechanism that allows the timely capture of multiple images with different exposures (Hao et al., 2018). Notably, ordinary RGB cameras, which are selected for their cost-effectiveness and practical reasons, do not meet these requirements. Overall, these practical problems aggravate the difficulties in the processing and enhancement of low-light crab images. Consequently, based on the CRF fusion mechanism (Ying et al., 2017b; Hao et al., 2018) and kindling the darkness (KinD; Zhang et al., 2019), we established a simple yet practical image enhancement network (LigED) to ensure lighting as well as the elimination of darkness in crab images. The CRF and its luminance transfer function were used to simulate the exposure information source and obtain a high-quality normal-light image (just as is the case with images shot using professional cameras) via coordination and fusion. This solves the problem posed by the absence of any corresponding real normal-light image for low-light images during the KinD supervised learning. However, even when high-quality images were used as GT for the training of the dataset, the results obtained are often characterized by a low contrast. In reality, based on the analysis of the scene reflectance as well as the mutual consistency of the illumination structure, the degradation of low-light images was more serious than that of normal-light images, and this degradation was transferred to the reflectance component: Il=Rl×Ll+El=Rpl×Ll=(Rl+Epl)×Ll=Rl×Ll+Epl×Ll,(1) where Il represents the degraded low-light image; Ll represents the lighting image; Rl and Rlp represent the reflectance and pollution reflectance, respectively; and El and Elp represent the degradation component and the degradation component that leads to decoupled illumination, respectively. Based on this rule, the reflectance obtained from the fused normal-illumination image coupling decomposition could be used as the true RGT of the degraded low-light image (Figure 5b). The usage of this type of reflectance completely differed from the usage of the normal-light image as the low-light image false GT (Figure 5b). Therefore, the network training only required a pair of images with different illumination degrees. In addition, the reflectance recovery could not be processed uniformly throughout the image owing to the imbalance of the illumination change in the network training process. However, it can be guided by introducing an illumination map and a self-attention mechanism so as to effectively address the influence of low light. More importantly, various visual defects (e.g. amplified noise) caused by blindly removing dark light could be avoided. Figure 5. Open in new tabDownload slide Systematic structure of the lightening and eliminating darkness method (LigED). (a) The single dark-light image generates a pair of training samples using the Paired Images Fused Net; (b) the false GT and sample are decomposed into reflectance and illumination maps us-ing the Double-Layer Decomposition Net, respectively; (c) Reflectance Restoration Net (c1) including four multi-scale illumination self-attention units (c2) is utilized to obtain a high-quality reflectance map; (d) the low-lighting conditions are adjusted to target-lighting conditions using the Illumination Adjustment Net. Figure 5. Open in new tabDownload slide Systematic structure of the lightening and eliminating darkness method (LigED). (a) The single dark-light image generates a pair of training samples using the Paired Images Fused Net; (b) the false GT and sample are decomposed into reflectance and illumination maps us-ing the Double-Layer Decomposition Net, respectively; (c) Reflectance Restoration Net (c1) including four multi-scale illumination self-attention units (c2) is utilized to obtain a high-quality reflectance map; (d) the low-lighting conditions are adjusted to target-lighting conditions using the Illumination Adjustment Net. Given that the LigED method performs supervised learning without true GT image guidance, the network structure (Figure 5), and loss function LED (Equation 2) must be carefully designed. First, the low-light image (input) utilizes the CRF model to simulate multiple exposures and coordinately fuses to realize a relatively high-quality image (false GT) so as to generate image pairs for training (Figure 5a). Second, using the double-layer decomposition network based on the Retinex theory (Wei et al., 2018), image pairs are decoupled into reflectance and illumination maps, which were realized using the classic five-layer U-Net (Ronneberger et al., 2015) and two-layer cascaded feature map (Figure 5b), respectively. In this process, the loss function of the double-layer decomposition network was set to LPLD (Equation 3) so as to adjust/smoothen the reflectance/illumination of the image pair after double-layer decomposition to be consistent with each other (ideally). Third, the illumination information and illumination decoupling-degraded reflectance are jointly introduced into the reflectance restoration network (Figure 5c). The loss function LPRR is mainly composed of the structural similarity, recovery reflectance, and texture tightness realized using the improved 15-layer U-Net (Figure 5c1) and the multi-scale illumination self-attention unit (Shaw et al., 2018; Figure 5c2). Fourth, a lighting adjustment network is constructed to flexibly realize the conversion of low-lighting to target-lighting conditions (Figure 5d). The loss function that constrains the conversion mechanism of this illumination image pair is LPIA and the degree of conversion is adjustable. The exact relationship between the paired illumination images is determined by roughly computing the intensity ratio between the target illumination image and the original illumination image, i.e. α is the average of the target illumination image divided by the original illumination image (the division is element-wise). The ratio α serves as an indicator to train the adjustment function from the source light to the target light. When α > 1, the lower level light is adjusted to the higher-level light, in contrast, when α ≤ 1, reverse adjustment is performed. Ultimately, the adjusted illumination map and the restored reflectance are multiplied to realize images (output) with desired visual features. This facilitates the mobile processor to understand, analyse, perceive, and subsequently recognize the realized images. LED=LPLD+LPIA+LPRR(2) LPLD=LPLDref+0.01LPLDprs+0.08LPLDpis+0.1LPLDpmic,(3) where LrefPLD is used to constrain the error of the reconstructed reflectance, LprsPLD is used to adjust the similarity between the image pair reflectances (⁠ LprsPLD weights are small because reflectance of strong-light image and low-light images are approximately the same), LpisPLD is used to constrain the illumination smoothness of the image pair, and LpmicPLD is used to punish the relative consistency of the image pair illumination (⁠ LpisPLD and LpmicPLD need not have much constraint weights and penalties because they can be guided by strong-light image). Live crab detector with two-stage compound scaling Current advanced detection networks are often unable to achieve real-time detection in mobile embedded platforms with limited computing capacities. Only very few Tiny you only look once (YOLO)-like networks can meet the real-time requirements, but the detection accuracy of such networks is low and does not meet the demand of practical applications. Under various resource constraints, the detector in this study locally scales the lightweight EfficientNet network by balancing the dimensional information and globally scales the detector’s backbone, feature, and class/bounding box networks to more efficiently and accurately explore the scalable detector EfficientNet-Det (Zhao et al., 2020a). Figure 6 presents the specific implementation process of EfficientNet-Det0. The low-light-enhanced image and corresponding labels are first mapped onto the EfficientNet-B0 backbone network, which uses local compound scaling of the network depth, width, and resolution to achieve accurate and efficient coordination, and feature maps with different scales are extracted. Subsequently, five feature map layers are selected as inputs for the stacked bidirectional feature pyramid network (BiFPN; Tan and Le, 2019) to obtain higher-level fusion feature maps. These feature maps are then fed into the class/bounding box prediction network of the stacked three-layer convolution module to generate various anchor boxes. Simultaneously, the coordinate position offsets of the anchor boxes with various sizes and aspect ratios and their associated confidences are predicted using Softmax to obtain bounding boxes predictions. All generated bounding box predictions are then assembled and redundant bounding box predictions are filtered using the non-maximal suppression method by setting an overlap rate threshold. Finally, the bounding boxes predictions with higher confidence than the threshold are selected and retained to obtain the detection results including the crab class, confidence probability, and bounding box coordinates, in real-time and accurately. Figure 6. Open in new tabDownload slide Systematic structure of the live crab detector EfficientNet-Det0. Figure 6. Open in new tabDownload slide Systematic structure of the live crab detector EfficientNet-Det0. EfficientNet overcomes the shortcomings of scaling models using a single dimension by introducing local scaling coefficients to link the depth, width, and resolution (Equation 4) of the uniformly scaled network (the accuracy gain becomes smaller as the model size increases): {d=αθ, w=βθ, r=χθs. t. α⋅β2⋅γ2≈2, α≥1, β≥1, γ≥1,(4) where d, w, and r are the network depth, width, and resolution, respectively; α, β, and γ are the depth, width, and resolution coefficients of the network resources, respectively, which are the optimal solutions obtained from the network search for violent enumeration; and θ is the user-specified model resource scaling coefficient, which also indicates the available range of the resources. Thus, based on the local compound scaling operation, the floating-point of operations (FLOPs) of the model (α⋅β2⋅γ2)θ≈2θ times the original. This is followed by MBConv built optimally with a reverse residual structure in MobileNetV2 (Sandler et al., 2018) and squeeze-and-excitation (Hu et al., 2018) as the basic module. By utilizing the neural network architecture search approach (Tan et al., 2019), the search yields a series of EfficientNet B0–B6 networks while limiting the target FLOPs (measuring algorithm/model complexity) and storage space. In this study, the EfficientNet-B0 network is used as the backbone network of the model on an as-needed basis. The network parameters are shown in Table 1. Table 1. Network parameters of the lightweight EfficientNet-B0 backbone. Stage type . EfficientNet-B0 . Resolution (pixel × pixel) . Stem Stage Conv3*3, channel = 32, stride = 2 224 × 224 MBConv Stage1 MBConv1, kernel = 3, channel = 16 112 × 112 MBConv Stage2 [MBConv6, kernel = 3, channel = 24] × 2 112 × 112 MBConv Stage3 [MBConv6, kernel = 5, channel = 40] × 2 56 × 56 MBConv Stage4 [MBConv6, kernel = 3, channel = 80] × 3 28 × 28 MBConv Stage5 [MBConv6, kernel = 5, channel = 112] × 3 28 × 28 MBConv Stage7 [MBConv6, kernel = 5, channel = 192] × 4 14 × 14 MBConv Stage8 MBConv6, kernel = 3, channel = 320 7 × 7 Pooling Stage Conv1*1 and pooling and FC 7 × 7 Stage type . EfficientNet-B0 . Resolution (pixel × pixel) . Stem Stage Conv3*3, channel = 32, stride = 2 224 × 224 MBConv Stage1 MBConv1, kernel = 3, channel = 16 112 × 112 MBConv Stage2 [MBConv6, kernel = 3, channel = 24] × 2 112 × 112 MBConv Stage3 [MBConv6, kernel = 5, channel = 40] × 2 56 × 56 MBConv Stage4 [MBConv6, kernel = 3, channel = 80] × 3 28 × 28 MBConv Stage5 [MBConv6, kernel = 5, channel = 112] × 3 28 × 28 MBConv Stage7 [MBConv6, kernel = 5, channel = 192] × 4 14 × 14 MBConv Stage8 MBConv6, kernel = 3, channel = 320 7 × 7 Pooling Stage Conv1*1 and pooling and FC 7 × 7 Open in new tab Table 1. Network parameters of the lightweight EfficientNet-B0 backbone. Stage type . EfficientNet-B0 . Resolution (pixel × pixel) . Stem Stage Conv3*3, channel = 32, stride = 2 224 × 224 MBConv Stage1 MBConv1, kernel = 3, channel = 16 112 × 112 MBConv Stage2 [MBConv6, kernel = 3, channel = 24] × 2 112 × 112 MBConv Stage3 [MBConv6, kernel = 5, channel = 40] × 2 56 × 56 MBConv Stage4 [MBConv6, kernel = 3, channel = 80] × 3 28 × 28 MBConv Stage5 [MBConv6, kernel = 5, channel = 112] × 3 28 × 28 MBConv Stage7 [MBConv6, kernel = 5, channel = 192] × 4 14 × 14 MBConv Stage8 MBConv6, kernel = 3, channel = 320 7 × 7 Pooling Stage Conv1*1 and pooling and FC 7 × 7 Stage type . EfficientNet-B0 . Resolution (pixel × pixel) . Stem Stage Conv3*3, channel = 32, stride = 2 224 × 224 MBConv Stage1 MBConv1, kernel = 3, channel = 16 112 × 112 MBConv Stage2 [MBConv6, kernel = 3, channel = 24] × 2 112 × 112 MBConv Stage3 [MBConv6, kernel = 5, channel = 40] × 2 56 × 56 MBConv Stage4 [MBConv6, kernel = 3, channel = 80] × 3 28 × 28 MBConv Stage5 [MBConv6, kernel = 5, channel = 112] × 3 28 × 28 MBConv Stage7 [MBConv6, kernel = 5, channel = 192] × 4 14 × 14 MBConv Stage8 MBConv6, kernel = 3, channel = 320 7 × 7 Pooling Stage Conv1*1 and pooling and FC 7 × 7 Open in new tab The BiFPN introduces feature weights for the learning of different input features according to the differences of the direct additive fusion of features at different resolutions in the contribution to the output. Therefore, fast normalized fusion methods (Equation 5) and depthwise separable convolution (DWConv) are used for fusion and batch normalization and activation functions are added after each convolution layer to ensure an improved fusion efficiency (Sandler et al., 2018). {P7out=DWConv(W71⋅P7in+W72⋅Resize(P6out)W71+W72+τ){P6tr=DWConv(W611⋅P6in+W612⋅Resize(P7in)W611+W612+τ)P6out=DWConv(W621⋅P6in+W622⋅P6tr+W623⋅Resize(P5out)W621+W622+W623+τ) ⋅⋅⋅⋅⋅⋅{P3out=DWConv(W31⋅P3in+W32⋅Resize(P4tr)W31+W32+τ),(5) where τ is a very small value of 0.0001 to avoid numerical instability and W corresponds to the detailed feature weights in Figure 6. Note that BiFPN is not a network feature structure but rather a feature network layer that can be iterated to obtain higher-level fusion features. This way, the class/bounding box prediction network can share all levels of features; thus, even irregularly shaped crab targets can be detected. To achieve a higher detection accuracy, the scaling of most general detectors exclusively focuses on the backbone network and often neglects the scaling of feature and class/bounding box prediction networks. Thus, EfficientNet-Det introduces a simple global composite factor ψ to secondarily unify the three components of the compound scaling detector (Tan et al., 2020). The smallest EfficientNet-B0 is directly adopted as the backbone network after local compound scaling. Secondly, the depth of the BiFPN structured network is linearly increased (Equation 6) and the width of the BiFPN structured network is exponentially increased (Equation 7). Finally, the width of the class/bounding box prediction network is kept consistent with that of the BiFPN structured network (Equation 8) and only the depth of the class/bounding box prediction network is linearly increased (Equation 9). In this study, a global composite factor of zero was used, that is, the BiFPN layers and convolutional modules in the class/bounding prediction network are stacked twice and three times, respectively, based on the need for efficient real-time detection: dbi=2+ψ(6) wbi=64×(1.35ψ)(7) wc/b=wbi(8) dclass=dbox=3+ψ/3,(9) where dbi and wbi represent the depth and width of the BiFPN; wc/b represents the width of the class/bounding box prediction network; and dclass and dbox represent the depths of the class and bounding box prediction network, respectively. Evaluation protocol and foci Commonly used evaluation metrics (Table 2) were utilized to comprehensively evaluate the denoising and enhancement effect of the dark-light enhancement algorithm. For all metrics, a larger number is better, except for the average brightness (AB; absolute value), lightness order error (LOE), and natural image quality evaluator (NIQE; Lv and Lu, 2019; Zhang et al., 2019). Table 2. List of objective evaluation metrics for the denoising and enhancement of dark-light images. Metric abbreviation . Metric name . Description . PSNR Peak signal-to-noise ratio Measures the image noise intensity by calculating the similarity between the two images. SSIM Structural similarity Measures the image structure distortion. VIF Visual information fidelity Measures the fidelity of visual image information by determining the mutual information of the two images. AB Average brightness Average brightness of the image itself. LOE Lightness order error Objectively measures the brightness distortion of the enhanced result. NIQE Natural image quality evaluator Totally blind, distortion-free, and reference-free image quality evaluation indicators. Metric abbreviation . Metric name . Description . PSNR Peak signal-to-noise ratio Measures the image noise intensity by calculating the similarity between the two images. SSIM Structural similarity Measures the image structure distortion. VIF Visual information fidelity Measures the fidelity of visual image information by determining the mutual information of the two images. AB Average brightness Average brightness of the image itself. LOE Lightness order error Objectively measures the brightness distortion of the enhanced result. NIQE Natural image quality evaluator Totally blind, distortion-free, and reference-free image quality evaluation indicators. Open in new tab Table 2. List of objective evaluation metrics for the denoising and enhancement of dark-light images. Metric abbreviation . Metric name . Description . PSNR Peak signal-to-noise ratio Measures the image noise intensity by calculating the similarity between the two images. SSIM Structural similarity Measures the image structure distortion. VIF Visual information fidelity Measures the fidelity of visual image information by determining the mutual information of the two images. AB Average brightness Average brightness of the image itself. LOE Lightness order error Objectively measures the brightness distortion of the enhanced result. NIQE Natural image quality evaluator Totally blind, distortion-free, and reference-free image quality evaluation indicators. Metric abbreviation . Metric name . Description . PSNR Peak signal-to-noise ratio Measures the image noise intensity by calculating the similarity between the two images. SSIM Structural similarity Measures the image structure distortion. VIF Visual information fidelity Measures the fidelity of visual image information by determining the mutual information of the two images. AB Average brightness Average brightness of the image itself. LOE Lightness order error Objectively measures the brightness distortion of the enhanced result. NIQE Natural image quality evaluator Totally blind, distortion-free, and reference-free image quality evaluation indicators. Open in new tab To objectively analyse the performance of the detector and the effect of dark light enhancement on the detection, factors including the precision, recall, average precision (AP), F1-score, area under the curve (AUC), and frame per second (FPS) were used to evaluate the detection model. Table 3 presents these measures and related equations and evaluation foci (Yang et al., 2020). Table 3. Statistical measures with equations and evaluation foci. Measure . Equation . Evaluation focus . IoU OiOu Used to measure the accuracy of the detection. If the value is >0.5, the detection result is considered to be credible. Precision TPnTPn+FPn Assesses the accuracy of the detector’s prediction. Recall TPnTPn+FNn Assesses the completeness of the detector's target search. AP APC=∑PrecisionN(images)C AP represents the mean of the precision corresponding to the selected 11 different confidence thresholds (Recall values). The overall performance of the detector. F1 2 ⋅ Precision ⋅ RecallPrecision + Recall Harmonic mean of precision and recall for the evaluation of the standardized measurement of both completeness and accuracy of the detected targets. AUC 12(TPnTPn+FNn+TNnTNn+FPn) Ability of the detector to distinguish different object classes. FPS 1N Evaluates the operation efficiency, that is, the number of images (N) processed per second. Measure . Equation . Evaluation focus . IoU OiOu Used to measure the accuracy of the detection. If the value is >0.5, the detection result is considered to be credible. Precision TPnTPn+FPn Assesses the accuracy of the detector’s prediction. Recall TPnTPn+FNn Assesses the completeness of the detector's target search. AP APC=∑PrecisionN(images)C AP represents the mean of the precision corresponding to the selected 11 different confidence thresholds (Recall values). The overall performance of the detector. F1 2 ⋅ Precision ⋅ RecallPrecision + Recall Harmonic mean of precision and recall for the evaluation of the standardized measurement of both completeness and accuracy of the detected targets. AUC 12(TPnTPn+FNn+TNnTNn+FPn) Ability of the detector to distinguish different object classes. FPS 1N Evaluates the operation efficiency, that is, the number of images (N) processed per second. Oi and Ou represent the intersection and union between the marked GT box and the detection bounding box, respectively; TPn, FPn, FPn, and TNn represent the number of correctly detected positive crabs, misdetected negative crabs, misdetected positive crabs (within the specified range), and correctly detected negative labels, respectively. For a specific class C, the formula for APC is given above. However, we only need to detect one class (crabs), such that the C value could be considered as 1; APC is denoted as AP. Open in new tab Table 3. Statistical measures with equations and evaluation foci. Measure . Equation . Evaluation focus . IoU OiOu Used to measure the accuracy of the detection. If the value is >0.5, the detection result is considered to be credible. Precision TPnTPn+FPn Assesses the accuracy of the detector’s prediction. Recall TPnTPn+FNn Assesses the completeness of the detector's target search. AP APC=∑PrecisionN(images)C AP represents the mean of the precision corresponding to the selected 11 different confidence thresholds (Recall values). The overall performance of the detector. F1 2 ⋅ Precision ⋅ RecallPrecision + Recall Harmonic mean of precision and recall for the evaluation of the standardized measurement of both completeness and accuracy of the detected targets. AUC 12(TPnTPn+FNn+TNnTNn+FPn) Ability of the detector to distinguish different object classes. FPS 1N Evaluates the operation efficiency, that is, the number of images (N) processed per second. Measure . Equation . Evaluation focus . IoU OiOu Used to measure the accuracy of the detection. If the value is >0.5, the detection result is considered to be credible. Precision TPnTPn+FPn Assesses the accuracy of the detector’s prediction. Recall TPnTPn+FNn Assesses the completeness of the detector's target search. AP APC=∑PrecisionN(images)C AP represents the mean of the precision corresponding to the selected 11 different confidence thresholds (Recall values). The overall performance of the detector. F1 2 ⋅ Precision ⋅ RecallPrecision + Recall Harmonic mean of precision and recall for the evaluation of the standardized measurement of both completeness and accuracy of the detected targets. AUC 12(TPnTPn+FNn+TNnTNn+FPn) Ability of the detector to distinguish different object classes. FPS 1N Evaluates the operation efficiency, that is, the number of images (N) processed per second. Oi and Ou represent the intersection and union between the marked GT box and the detection bounding box, respectively; TPn, FPn, FPn, and TNn represent the number of correctly detected positive crabs, misdetected negative crabs, misdetected positive crabs (within the specified range), and correctly detected negative labels, respectively. For a specific class C, the formula for APC is given above. However, we only need to detect one class (crabs), such that the C value could be considered as 1; APC is denoted as AP. Open in new tab Experiment and discussion Implementation and training details The LigED training process (utilizing 90% of the fused image pairs) was optimized using a stochastic gradient descent (SGD) method and a few hyperparameters were set. The batch and patch sizes of the double-layer decomposition network were set to 8 and 48 × 48, respectively. The batch and patch sizes of the reflectance restoration and illumination adjustment networks were set to 4 and 384 × 384, respectively. The LigED network was trained using four NVIDIA GPUs and an Intel Core i5 CPU with a PyTorch framework. Regarding the EfficientNet-Det0, the tenfold cross-validation method (Yadav and Shukla, 2016) was adopted to train and predict the model to ensure generalization because the training set and various random uncertainties in the training process may result in different results. The dataset was divided evenly into ten folds. At a time, nine folds are used for training while onefold is used in testing. In mentioned nine folds, picked 20% of the data as the validation set, which were used to adjust the hyper-parameters. The mean value of the results of the ten folds was used to estimate the algorithm performance (Cao et al., 2021). For other detailed training hyperparameter settings, please refer to Table 4. Furthermore, considering the loads of abundant parameters, the risk of network overfitting is inevitable. Thus, image processing methods implementing a slight alteration (e.g. position augmentation: flipping, angle rotation, cropping, scaling, and channel transformation; colour-jittering augmentation: brightness, contrast, hue, and saturation) were used for the training of the images with the same annotation (Zoph et al., 2019). Based on the larger the number of epochs and total number of sample batches per training iteration, new images were generated from each input image using data augmentation, making the training image dataset dozens of times larger than the original dataset. Thus, the network strength and generalizability significantly increased. Note that it is not recommended to train the LigED and EfficientNet-Det0 from scratch when the data amount is low. The CNN models pretrained with the LOL and ImageNet datasets can be used to fine-tune the LigED and EfficientNet-Det0, respectively (Wei et al., 2018; He et al., 2019). This way, the information generated by LOL and ImageNet can be conveyed to solve the existing problem accordingly. Table 4. Training the hyper-parameters of EfficientNet-Det. Parameter . Description . Value . SGD with momentum Minimizes the loss function by taking smaller steps on the negative gradient of loss so as to update the network parameters, where momentum is used to accelerate the speed of gradient descent. 0.9 Maximum Epoch Specifies the maximum number of complete passes through a complete dataset. 10 Number of iterations per Epoch Specifies the number of parameter updates for each Epoch. 1290 Batch size Specifies the number of observations per iteration. 16 Weight delay coefficient Effectively avoids over-fitting the L2 regularization coefficient. 1e−4 Transfer learning rate The global learning rate of the fine-tuning process. 1e−5 Anchor ratio Aspect ratio of the predicted rectangular-bounding box. {1/2, 1, 2} Initial learning rate Sets the speed of weight update: learning rate warm up by 5% steps. 0–0.08 Cosine learning rate decay. 0.997 Focal loss Novel classification loss: proportional equilibrium coefficient of positive and negative samples. 0.25 Weight coefficient of easily separated samples. 1.5 Parameter . Description . Value . SGD with momentum Minimizes the loss function by taking smaller steps on the negative gradient of loss so as to update the network parameters, where momentum is used to accelerate the speed of gradient descent. 0.9 Maximum Epoch Specifies the maximum number of complete passes through a complete dataset. 10 Number of iterations per Epoch Specifies the number of parameter updates for each Epoch. 1290 Batch size Specifies the number of observations per iteration. 16 Weight delay coefficient Effectively avoids over-fitting the L2 regularization coefficient. 1e−4 Transfer learning rate The global learning rate of the fine-tuning process. 1e−5 Anchor ratio Aspect ratio of the predicted rectangular-bounding box. {1/2, 1, 2} Initial learning rate Sets the speed of weight update: learning rate warm up by 5% steps. 0–0.08 Cosine learning rate decay. 0.997 Focal loss Novel classification loss: proportional equilibrium coefficient of positive and negative samples. 0.25 Weight coefficient of easily separated samples. 1.5 Open in new tab Table 4. Training the hyper-parameters of EfficientNet-Det. Parameter . Description . Value . SGD with momentum Minimizes the loss function by taking smaller steps on the negative gradient of loss so as to update the network parameters, where momentum is used to accelerate the speed of gradient descent. 0.9 Maximum Epoch Specifies the maximum number of complete passes through a complete dataset. 10 Number of iterations per Epoch Specifies the number of parameter updates for each Epoch. 1290 Batch size Specifies the number of observations per iteration. 16 Weight delay coefficient Effectively avoids over-fitting the L2 regularization coefficient. 1e−4 Transfer learning rate The global learning rate of the fine-tuning process. 1e−5 Anchor ratio Aspect ratio of the predicted rectangular-bounding box. {1/2, 1, 2} Initial learning rate Sets the speed of weight update: learning rate warm up by 5% steps. 0–0.08 Cosine learning rate decay. 0.997 Focal loss Novel classification loss: proportional equilibrium coefficient of positive and negative samples. 0.25 Weight coefficient of easily separated samples. 1.5 Parameter . Description . Value . SGD with momentum Minimizes the loss function by taking smaller steps on the negative gradient of loss so as to update the network parameters, where momentum is used to accelerate the speed of gradient descent. 0.9 Maximum Epoch Specifies the maximum number of complete passes through a complete dataset. 10 Number of iterations per Epoch Specifies the number of parameter updates for each Epoch. 1290 Batch size Specifies the number of observations per iteration. 16 Weight delay coefficient Effectively avoids over-fitting the L2 regularization coefficient. 1e−4 Transfer learning rate The global learning rate of the fine-tuning process. 1e−5 Anchor ratio Aspect ratio of the predicted rectangular-bounding box. {1/2, 1, 2} Initial learning rate Sets the speed of weight update: learning rate warm up by 5% steps. 0–0.08 Cosine learning rate decay. 0.997 Focal loss Novel classification loss: proportional equilibrium coefficient of positive and negative samples. 0.25 Weight coefficient of easily separated samples. 1.5 Open in new tab Image quality assessment of low-illumination enhancement The quality of underwater dark-light images is seriously degraded. Compared with normally exposed images, many details are lost, colours are distorted, and there is a lot of noise, seriously affecting the performance of advanced visual tasks such as object detection and instance segmentation. Aiming at the difficulty in underwater low-light crab image enhancement, the improved dark-light enhancement algorithm LigED utilizes the illumination attention guidance scheme to restore the reflectance and enhance the contrast while suppressing noise. In addition, different exposure images obtained by the brightness transfer function in the CRF model are weighted fused as the false GT, which meets the quality requirements of GT training images. To verify the processing performance of the algorithm in terms of the colour, brightness, contrast, and noise, the HE (Kim, 1997), multi-scale Retinex with colour restoration (MSRCR; Jobson et al., 1997), bioinspired multi-exposure fusion (BIMEF; Ying et al., 2017a), and learning-based LLNet without considering the noise (Lore et al., 2017) were used. The dark-light enhancement effect was analysed in detail based on the subjective visual effect (Figure 7) and objective metrics, respectively. Table 5 presents the metric numerical results for various dark-image enhancement methods (for the test set). The HE simply expands the dynamic range of the image, the measured AB is relatively bigger, and the LOE is large. Focusing on the contrast of the images without considering the illumination factors leads to an excessive enhancement and the overall processing effect is very poor (Figure 7b). The MSRCR relies on the manual adjustment of parameters, and LOE is the largest and it is difficult to avoid excessive and insufficient local enhancement. The values of the peak signal-to-noise ratio (PSNR) and visual information fidelity (VIF) are small, unsatisfactory for noise processing, and accompanied by colour distortion (Figure 8c). Although the target appears to be clearer, it is an illusion (the entire image is distorted) caused by excessive enhancement, which notably destroys the fidelity of the visual information. This is not conducive to the overall understanding of the image in advanced visual tasks, which will cause abundant false and misdetections, as well as reduce the accuracy of the target detection. The BIMEF automatically coordinates different image sources; the VIF and NIQE are appropriate, effectively avoiding unnatural effects (Figure 7d). However, note that the weight map may introduce too much noise in the pixel-by-pixel fusion process. The index values of LLNet and LigED are ideal and the overall and local contrasts of the images were effectively improved, with a good visual effect (Figure 7e and f). Although the visual differences between the LLNet and LigED algorithms are not noticeable, LigED has notable advantages: it overcomes the limitation where the learning-based low-illumination enhancement algorithm must use distinct image pairs (real low-light/normal-light image) for supervised learning. Moreover, compared with LLNet, LigED still has a growth rate of at least 10% with respect to various indicators. In particular, LigED has notable advantages over LLNet in terms of VIF and LOE. This is mainly due to the fact that LigED uses reflectance as the low-light image GT and multi-scale illumination attention units are introduced during the reflectance restoration process. Therefore, the amplification of visual defects (e.g. noise and artefacts) can be avoided and the illumination can be flexibly adjusted, such that it is more suitable for the global understanding of the image to classify and locate the target. Taken together, LigED easily achieves better visual effects than other methods under any illuminance and is suitable for single low-light image enhancement. In addition, LigED also has a better real-time performance. The time required to process an image using a 1080Ti GPU is <75 ms, and the model size is only ∼1.64 MB. The use of technologies such as quantization or MobileNet can further reduce the number of parameters and speed up the computation and reduce the run time. Figure 7. Open in new tabDownload slide Visual effect displays of different dark-light enhancement methods under arbitrary illumination: (a) Input, (b) HE, (c) MSRCR, (d) BIMEF, (e) LLNet, and (f) LigED. Figure 7. Open in new tabDownload slide Visual effect displays of different dark-light enhancement methods under arbitrary illumination: (a) Input, (b) HE, (c) MSRCR, (d) BIMEF, (e) LLNet, and (f) LigED. Figure 8. Open in new tabDownload slide Rectangular-bounding box detection results of underwater live crabs (a) before and (b) after LigED low-light enhancement. Figure 8. Open in new tabDownload slide Rectangular-bounding box detection results of underwater live crabs (a) before and (b) after LigED low-light enhancement. Table 5. Quantitative comparison of different methods in terms of six metrics. Methods . PSNR↑ . SSIM↑ . VIF↑ . AB↓ . LOE↓ . NIQE↓ . Input 6.18 0.26 0.19 −82.97 457.49 14.17 HE 9.72 0.45 0.22 25.83 780.22 13.01 MSRCR 9.22 0.39 0.23 −9.61 1 074.80 13.21 BIMEF 12.98 0.58 0.29 −41.13 473.60 12.79 LLNet 14.87 0.62 0.24 −6.56 547.92 12.73 LigED 16.13 0.68 0.31 5.41 251.26 11.60 Methods . PSNR↑ . SSIM↑ . VIF↑ . AB↓ . LOE↓ . NIQE↓ . Input 6.18 0.26 0.19 −82.97 457.49 14.17 HE 9.72 0.45 0.22 25.83 780.22 13.01 MSRCR 9.22 0.39 0.23 −9.61 1 074.80 13.21 BIMEF 12.98 0.58 0.29 −41.13 473.60 12.79 LLNet 14.87 0.62 0.24 −6.56 547.92 12.73 LigED 16.13 0.68 0.31 5.41 251.26 11.60 Bold values represent the optimal result of each indicator parameter. Open in new tab Table 5. Quantitative comparison of different methods in terms of six metrics. Methods . PSNR↑ . SSIM↑ . VIF↑ . AB↓ . LOE↓ . NIQE↓ . Input 6.18 0.26 0.19 −82.97 457.49 14.17 HE 9.72 0.45 0.22 25.83 780.22 13.01 MSRCR 9.22 0.39 0.23 −9.61 1 074.80 13.21 BIMEF 12.98 0.58 0.29 −41.13 473.60 12.79 LLNet 14.87 0.62 0.24 −6.56 547.92 12.73 LigED 16.13 0.68 0.31 5.41 251.26 11.60 Methods . PSNR↑ . SSIM↑ . VIF↑ . AB↓ . LOE↓ . NIQE↓ . Input 6.18 0.26 0.19 −82.97 457.49 14.17 HE 9.72 0.45 0.22 25.83 780.22 13.01 MSRCR 9.22 0.39 0.23 −9.61 1 074.80 13.21 BIMEF 12.98 0.58 0.29 −41.13 473.60 12.79 LLNet 14.87 0.62 0.24 −6.56 547.92 12.73 LigED 16.13 0.68 0.31 5.41 251.26 11.60 Bold values represent the optimal result of each indicator parameter. Open in new tab Effects of low-illumination enhancement on efficient live crab detection Despite the dark-light enhancement process, the image quality of the crabs significantly improved, especially in PSNR (noise), VIF (visual defect), and NIQE (domain shift). However, we were concerned about the impact of the dark-light enhancement on the performance of the object detection. Thus, we trained and tested the EfficientNet-Det0 model with data before and after LigED processing, respectively. The F1 and AP of the EfficientNet-Det0 trained with enhanced images improved by 12.5 and 13.84% (Table 6) compared with the model trained with low-light images, respectively. This is due to the fact that the model trained using low-light images contains many missed and false detections (Figure 8a) in contrast to the model trained using enhanced images (Figure 8b). This demonstrates that LigED can indeed provide relatively sufficient and accurate feature information for advanced object detection tasks and is an enhancement algorithm suitable for underwater low-illumination crab images. It can effectively reduce the domain shift of the enhanced image on the lab distribution (CNN has domain selectivity), and improve the accuracy of object detection. Table 6. Comparison of the performance parameters before and after LigED low-light enhancement. Detector . LigED . Precision . Recall . F1 . AUC . AP . EfficientNet-Det0 – 84.92 79.93 83.35 84.60 81.56 EfficientNet-Det0 ✓ 96.33 95.49 95.91 97.50 95.40 Detector . LigED . Precision . Recall . F1 . AUC . AP . EfficientNet-Det0 – 84.92 79.93 83.35 84.60 81.56 EfficientNet-Det0 ✓ 96.33 95.49 95.91 97.50 95.40 Open in new tab Table 6. Comparison of the performance parameters before and after LigED low-light enhancement. Detector . LigED . Precision . Recall . F1 . AUC . AP . EfficientNet-Det0 – 84.92 79.93 83.35 84.60 81.56 EfficientNet-Det0 ✓ 96.33 95.49 95.91 97.50 95.40 Detector . LigED . Precision . Recall . F1 . AUC . AP . EfficientNet-Det0 – 84.92 79.93 83.35 84.60 81.56 EfficientNet-Det0 ✓ 96.33 95.49 95.91 97.50 95.40 Open in new tab In addition, score-weighted class activation mapping (Score-CAM) was employed as a visual method in debugging the prediction process to create a coarse localization map highlighting the important regions in the images and to extract features for the prediction of the image class (Wang et al., 2019a). In Score-CAM, the saliency map is obtained by weighting the activation map and weight. The activation map shows the features learned by the model, while the weight indicates the importance of the features for the target category. In this study, it is simultaneously utilized to judge whether the feature information provided by dark-light enhancement can be easily understood and analysed by microcomputers and whether those features improve the accuracy of crab recognition. Figure 9 displays the results of the application of the method to randomly selected images of underwater live crabs after LigED dark-light enhancement. The Score-CAM map and Score-CAM heatmap (Figure 9b and c) show that the class activation intensities of the crab claws and back are high, while those in other parts of the river crab are constant. This indicates that the model mainly classifies the crab target based on the pixels of the crab claw and back part in the image and further verifies that it is a reasonable approach to focus on the crab claws and back during the creation of rectangular-bounding box labels. The Score-CAM heatmap overlain on the image (Figure 9d) illustrates that the EfficientNet-Det0 model focuses on the details of more targets and image regions of more relevant targets, even when the number of parameters and FLOPs is an order of magnitude smaller after two-stage compound scaling. Even similar crab legs receive significant attention and are indeed easy for microcomputer understanding, analysis, and information perception of river crab, which effectively alleviates the difficulty in extracting irregularly shaped features of live crabs. However, when the crabs excessively overlap each other, the results present different degrees of false detection (Figure 10a; multiple overlapping live crabs are mistakenly detected as one crab, parts of overlapping live crabs are also mistakenly detected as whole crabs). This is caused by the predicted overlapping targets whose anchor box centre is offset, which can be avoided by increasing the IOU but may result in misdetection in other scenarios. In addition, misdetection will occur when the detection targets are seriously occluded (Figure 10b; crabs that are at the extreme edge of the image or are seriously blocked by water plants, etc., such that missed recall occurs). This is attributed to the incompleteness of the extracted crab classification contour, which makes it difficult to identify the attributes; however, the introduction of pixel-level segmentation in this case can effectively regulate the misdetection. Figure 9. Open in new tabDownload slide Score-CAM visualizations for the original class of river crabs. (a) Input, (b) Score-CAM, (c) Score-CAM heatmap, and (d) Score-CAM heatmap on Image. Figure 9. Open in new tabDownload slide Score-CAM visualizations for the original class of river crabs. (a) Input, (b) Score-CAM, (c) Score-CAM heatmap, and (d) Score-CAM heatmap on Image. Figure 10. Open in new tabDownload slide Examples of false detections and missed detections for (a) mutual overlap and (b) severe occlusion. Figure 10. Open in new tabDownload slide Examples of false detections and missed detections for (a) mutual overlap and (b) severe occlusion. Practical analysis of the dark-light enhancer and live crab detector To test the practicability of the dark-light enhancer LigED and live crab detector EfficientNet-Det0 for the detection of underwater live crabs, YOLOv3 (Mahmood et al., 2020), Faster RCNN (Zhao et al., 2019), SSD (Cao et al., 2020), as well as the traditional HOG+SVM (Radman et al., 2017) detection algorithms were used (both training and testing were performed using LigED-enhanced image data). The comparison is presented in Table 7. Although there is a gap between the detection accuracy of EfficientNet-Det0 and Faster RCNN, the accuracy and model performance of EfficientNet-Det0 can be effectively optimized by utilizing the local scaling coefficient of the EfficientNet backbone network and global scaling compound factor of the detector. The specific adjustments are as follows: (i) adjusting the scaling coefficient of the EfficientNet backbone network to comprehensively balance all dimensional information; and (ii) adjusting the global scaling compound factor of the detector to integrate and balance the detector backbone, feature, and class/bounding box networks. By co-adjusting the above-mentioned two parameters, a series of accurate and efficient scalable detectors can be constructed. The detection performance is shown in Table 8. The EfficientNet-Det3-6 has a detection speed of <10 f·s−1 using a CPU with a normal configuration when the compound scaling factor ψ is >3 and its detection accuracy is better than that of Faster RCNN (EfficientNet-Det3 onwards). Although the detection speed of EfficientNet-Det3 is approximately ten times faster than that of Faster RCNN, these models still do not meet the real-time demands of aquaculture systems for the 10 Hz processing frequency. Only EfficientNet-Det0–Det2 are capable of meeting the real-time processing frequency requirements of modern aquaculture systems and the detection accuracy of EfficientNet-Det2 is comparable to that of the Faster RCNN. This demonstrates that EfficientNet-Det can be efficiently tuned to optimize the detection accuracy and efficiency by adjusting the neural network parameters, providing flexibility for mobile devices with different limited resources. The EfficientNet-Det0 was chosen as the final model in this study because it meets the processing accuracy demand of the crab feeding system when the detection accuracy of river crab reaches 95%. Further enhancement of the accuracy will greatly reduce the detection speed, increase the number of parameters, and expand the model storage memory. However, considering that the automatic feeding boat using float16/int8 may not be able to accurately describe a wide range of values, the lower the number of parameters, computation volume, storage space, and power consumption of the model are, the better. Table 7. Comparison of the detection performances of different methods. Backbone network . F1 (%) . AUC (%) . AP (%) . Parameters (million) . FLOPs (billion) . Model (MB) . Speed (f·s−1) . GPU CPU . EfficientNet-Det0 95.91 97.50 95.40 3.78 2.53 15.1 91.74 28.41 YOLOV3 90.37 91.69 90.85 59.25 36.97 237.0 48.83 9.52 SSD 92.28 93.74 91.76 34.25 15.50 137.0 9.35 2.35 Faster RCNN 96.77 98.49 96.35 136.5 202.10 546.0 5.67 1.07 HOG+SVM 72.26 73.34 69.48 – – – 3.10 0.63 Backbone network . F1 (%) . AUC (%) . AP (%) . Parameters (million) . FLOPs (billion) . Model (MB) . Speed (f·s−1) . GPU CPU . EfficientNet-Det0 95.91 97.50 95.40 3.78 2.53 15.1 91.74 28.41 YOLOV3 90.37 91.69 90.85 59.25 36.97 237.0 48.83 9.52 SSD 92.28 93.74 91.76 34.25 15.50 137.0 9.35 2.35 Faster RCNN 96.77 98.49 96.35 136.5 202.10 546.0 5.67 1.07 HOG+SVM 72.26 73.34 69.48 – – – 3.10 0.63 Bold values represent the optimal result of each indicator parameter. Open in new tab Table 7. Comparison of the detection performances of different methods. Backbone network . F1 (%) . AUC (%) . AP (%) . Parameters (million) . FLOPs (billion) . Model (MB) . Speed (f·s−1) . GPU CPU . EfficientNet-Det0 95.91 97.50 95.40 3.78 2.53 15.1 91.74 28.41 YOLOV3 90.37 91.69 90.85 59.25 36.97 237.0 48.83 9.52 SSD 92.28 93.74 91.76 34.25 15.50 137.0 9.35 2.35 Faster RCNN 96.77 98.49 96.35 136.5 202.10 546.0 5.67 1.07 HOG+SVM 72.26 73.34 69.48 – – – 3.10 0.63 Backbone network . F1 (%) . AUC (%) . AP (%) . Parameters (million) . FLOPs (billion) . Model (MB) . Speed (f·s−1) . GPU CPU . EfficientNet-Det0 95.91 97.50 95.40 3.78 2.53 15.1 91.74 28.41 YOLOV3 90.37 91.69 90.85 59.25 36.97 237.0 48.83 9.52 SSD 92.28 93.74 91.76 34.25 15.50 137.0 9.35 2.35 Faster RCNN 96.77 98.49 96.35 136.5 202.10 546.0 5.67 1.07 HOG+SVM 72.26 73.34 69.48 – – – 3.10 0.63 Bold values represent the optimal result of each indicator parameter. Open in new tab Table 8. Comparison of the detection performances of different compound scaling methods for EfficientNet-Det. EfficientNet-Det . ψ . θ . BiFPN . Class/box layers . F1 (%) . Parameters (million) . FLOPs (billion) . Model (MB) . CPU (f·s−1) . Channels Layers . EfficientNet-Det0 0 0 (B0) 64 2 3 95.91 3.8 2.53 15.1 28.60 EfficientNet-Det1 1 1 (B1) 88 3 3 96.36 6.6 6.08 25.7 20.40 EfficientNet-Det2 2 2 (B2) 112 4 3 96.70 8.1 10.96 31.4 15.70 EfficientNet-Det3 3 3 (B3) 160 5 4 97.05 12.0 24.80 85.1 9.60 EfficientNet-Det6 6 6 (B6) 388 8 5 98.44 51.9 324.00 199.0 2.26 EfficientNet-Det . ψ . θ . BiFPN . Class/box layers . F1 (%) . Parameters (million) . FLOPs (billion) . Model (MB) . CPU (f·s−1) . Channels Layers . EfficientNet-Det0 0 0 (B0) 64 2 3 95.91 3.8 2.53 15.1 28.60 EfficientNet-Det1 1 1 (B1) 88 3 3 96.36 6.6 6.08 25.7 20.40 EfficientNet-Det2 2 2 (B2) 112 4 3 96.70 8.1 10.96 31.4 15.70 EfficientNet-Det3 3 3 (B3) 160 5 4 97.05 12.0 24.80 85.1 9.60 EfficientNet-Det6 6 6 (B6) 388 8 5 98.44 51.9 324.00 199.0 2.26 The depth, width, and resolution of the backbone networks EfficientNet B0–B6 in the table can be obtained by using a local scaling coefficient θ and Equation (4); the number of channels and BiFPN stacking layers of the feature network and the number of convolutional module stacking layers of the class/bounding box network in the EfficientNet Det0–Det6 detectors can be calculated by using the compound scaling factor ψ and Equations 6 (even-numbered integer approximation), 7, and 9, respectively. Bold values represent the optimal result of each indicator parameter. Open in new tab Table 8. Comparison of the detection performances of different compound scaling methods for EfficientNet-Det. EfficientNet-Det . ψ . θ . BiFPN . Class/box layers . F1 (%) . Parameters (million) . FLOPs (billion) . Model (MB) . CPU (f·s−1) . Channels Layers . EfficientNet-Det0 0 0 (B0) 64 2 3 95.91 3.8 2.53 15.1 28.60 EfficientNet-Det1 1 1 (B1) 88 3 3 96.36 6.6 6.08 25.7 20.40 EfficientNet-Det2 2 2 (B2) 112 4 3 96.70 8.1 10.96 31.4 15.70 EfficientNet-Det3 3 3 (B3) 160 5 4 97.05 12.0 24.80 85.1 9.60 EfficientNet-Det6 6 6 (B6) 388 8 5 98.44 51.9 324.00 199.0 2.26 EfficientNet-Det . ψ . θ . BiFPN . Class/box layers . F1 (%) . Parameters (million) . FLOPs (billion) . Model (MB) . CPU (f·s−1) . Channels Layers . EfficientNet-Det0 0 0 (B0) 64 2 3 95.91 3.8 2.53 15.1 28.60 EfficientNet-Det1 1 1 (B1) 88 3 3 96.36 6.6 6.08 25.7 20.40 EfficientNet-Det2 2 2 (B2) 112 4 3 96.70 8.1 10.96 31.4 15.70 EfficientNet-Det3 3 3 (B3) 160 5 4 97.05 12.0 24.80 85.1 9.60 EfficientNet-Det6 6 6 (B6) 388 8 5 98.44 51.9 324.00 199.0 2.26 The depth, width, and resolution of the backbone networks EfficientNet B0–B6 in the table can be obtained by using a local scaling coefficient θ and Equation (4); the number of channels and BiFPN stacking layers of the feature network and the number of convolutional module stacking layers of the class/bounding box network in the EfficientNet Det0–Det6 detectors can be calculated by using the compound scaling factor ψ and Equations 6 (even-numbered integer approximation), 7, and 9, respectively. Bold values represent the optimal result of each indicator parameter. Open in new tab Compared with the YOLOv3 algorithm, EfficientNet-Det0 achieves a 95.40% AP and 97.50% AUC when the FLOPs are reduced to 1/15 and the operation speed is increased by a factor of 3. Compared with the SSD algorithm, EfficientNet-Det0 is ∼1/9 and 1/6 in terms of the number of parameters and FLOPs, respectively, because the SSD algorithm only improves the performance of the network with respect to the multi-scale feature networks and artificially designed backbone network computation methods. However, EfficientNet-Det0 updates the performances of the three main constitutive components of the model together and adopts three types of lightweight design ideas for the improvement: artificial design network computation (MBConv unit), neural network architecture search (MnasNet search), and deep compression of the convolutional neural network (pruning). EfficientNet-Det0 has significant advantages over the HOG+SVM method because the directional gradient histogram features extracted by the HOG+SVM method are only suitable for normative and directional targets, whereas river crabs have different postures and irregular shapes. In addition, such features of river crabs further exacerbate the difficulties with respect to SVM high-dimensional classification. EfficientNet-Det uses CNNs, which can learn and extract the local area features of input data such as the colour, texture, shape, and other low-dimensional features. More importantly, it is capable of learning and extracting deep abstract features of input data such as attributes, contours, positions, and other high-dimensional information about the target in the image. Such deep features are insensitive to the shape, position, size, pose, and orientation of the target in the image, ensuring the high recognition and detection rate of independent and irregular complex crab targets, which proves the robustness of EfficientNet-Det0. Therefore, in terms of the detection accuracy, computational speed, model memory, and robustness of the detector trained with the LigED dark-light-enhanced images, EfficientNet-Det0 is suitable for mobile device detection with limited computational resources. It is a reliable algorithm suitable for the application on automatic feeding boats that can be used to detect underwater live crabs and guide the precise feeding of river crabs. The AP of the detector is 95.40%, requiring a storage memory of only 15.1 MB. The fastest detection speed of a single image is 91.74 and 28.41 f·s−1 based on a common configured GPU (an inexpensive GPU with a tensor core capable of 16-bit floating-point computations) and CPU [an inexpensive multi-core (2 or more) Intel CPU], respectively, and thus meets the real-time requirements of aquaculture systems (e.g. remotely monitoring a crab expert feeding system) for a 10 Hz processing frequency of modern equipment platforms. Potential-industrial application utilizing LigED and EfficientNet-Det0 Potential-industrial applications, based on the LigED and EfficientNet-Det0, may guide the remote monitoring and feeding of automatic feeding boats (Figure 4c). The cloud platform-based remote monitoring expert feeding system of river crab aquacultures utilizes automatic feeding boats for bait feeding. Users can remotely monitor, instruct, and control the feeding operation of automatic feeding boats utilizing mobile phones, tablets, and personal computers with the help of cloud platform technology (Figure 11). This system overcomes the shortcomings of feeding using artificial pole-vaulting boats and fixed base stations and improves the efficiency, accuracy, and flexibility of the feeding. The implementation process flow is as follows: (i) the processor of the automatic feeding boat terminal utilizes the LigED and EfficientNet-Det0 algorithms to detect live crab biomass information in the underwater video/image taken below the boat in real time and transmits the crab distribution information to the cloud server via the network (Figures 11b, c and e). The automatic feeding boat terminal uses high-precision sensors to accurately collect the quality parameters of the aquaculture water (Wei et al., 2019) as well as the bait boat’s operating parameters, which are also transmitted to the cloud server via the network; (ii) real-time position information of the automatic feeding boat is obtained and synchronously fed back to the cloud sever using GPS devices installed on the on boat (Figure 11a and b); (iii) The expert system on the cloud server side deduces the scientific and reasonable feeding amount and feeding recommendations online based on the crab culture information, farming environment parameters, and GPS location fed back from the feeding boat terminal (Hassan et al., 2016; Figure 11e); (iv) the data are interactively transmitted to the feeding boat and mobile terminals via the GPRS communication device (Figure 11d, f, and g); and (v) the processor of the feeding boat terminal (Figure 11b) analyses and processes the received feedback data to achieve real-time remote control of the feeding track (Figure 11f), navigation speed, throwing amplitude, feeding speed (Figure 11g), and other actions for the purpose of automatic and precise feeding. Figure 11. Open in new tabDownload slide The potential-industrial application for remote monitoring and feeding. (a) GPS, (b) Automatic feeding boat, (c) Communication base station, (d) Internet, (e) Could Sever, (f) PC terminal, and (g) Mobile terminal. Figure 11. Open in new tabDownload slide The potential-industrial application for remote monitoring and feeding. (a) GPS, (b) Automatic feeding boat, (c) Communication base station, (d) Internet, (e) Could Sever, (f) PC terminal, and (g) Mobile terminal. In addition, the proposed method is equally applicable to provide sufficient, correct, and easily perceptible feature information for advanced tasks such as classification, localization, behavioural analysis, size, or biomass measurement of marine organisms. The proposed method can thereby facilitate the application of machine vision technology to solve more practical problems in aquaculture and marine culture. Conclusion and future work The results of this study show that the low-illumination enhancer LigED and small-sized efficient detector EfficientNet-Det0 can jointly and efficiently detect the distribution of live crabs in ponds under low-illumination conditions in real time. Thus, the automatic feeding boat can realize scientific and accurate feeding, overcoming the limitations of pond cultures and improving the efficiency, accuracy, and flexibility of bait feeding. The significant contributions of this work can be summarized as follows: The LigED solves problems with respect to the fact that learning-based dark-light image enhancement must be equipped with real GTs, which are extremely difficult to obtain. Based on an image pair with different exposures, it easily trains the model that can flexibly adjust the illumination and eliminate dark areas without introducing visual defects. The model can significantly improve the image quality, with LOE and NIQE values reaching up to 251.26 and 11.60, respectively. The model ensures that the images provided for object detection contain sufficient and correct feature information, resulting in improvement to F1, AP, and AUC of more than 12.5, 13.84, and 14%, respectively. The three components of the EfficientNet-Det0 detector, that is, the EfficientNet-B0 backbone network, BiFPN feature network for fast feature fusion, and anchor-based class/bounding box prediction network, are scaled in a local and global manner to build a series of efficient detectors suitable for limited computational resources. These detectors provide a swift, non-destructive, and accurate algorithm that can be used to detect underwater free live crabs, which benefits the management of crab aquaculture and marine organism. The superiority of the learning-based low-light image enhancement in addressing the problem of insufficient sensitivity to perceive (crab) information in low-light environments encourages us to continue this research. However, single-light intensity modelling in LigED remains a challenge with respect to the imbalanced visual photosensitivity and weak adaptivity. In future work, the light source should be decomposed into light intensity and spatial distribution to describe the perception process of the vision system and refine the illumination and reflectance for crab low-light image enhancement. Furthermore, the practicability of the learning-based low-light enhancer should be improved such that it can be applied to aquaculture/mariculture image processing and underwater object detection. Data availability Data cannot be shared for ethical/privacy reasons. The data will be shared on reasonable request to the corresponding author. Acknowledgements This study was partly funded by the National Natural Science Foundation of China (Grant No.61903288, 61973141); the Guangdong Province Key Field R&D Project of China (Grant No. 2020B0202010009); the Jiangsu Province Natural Science Fund Project of China (Grant No.BK20170536); the Fujian Province Natural Science Fund Project of China (Grant No.2018J01471); the Changzhou Modern Agricultural Science and Technology Project of China (Grant No. CE20192006); the Priority Academic Program Development of Jiangsu Higher Education Institutions of China (Grant No. PAPD-2018-87). We also would like to thank Editage (www.editage.com) for English language editing. References Álvarez-Ellacuría A. , Palmer M. , Catalán I. A. , Lisani J. L. 2020 . Image-based, unsupervised estimation of fish size from commercial landings using deep learning . ICES Journal of Marine Science , 77 : 1330 – 1339 . Google Scholar Crossref Search ADS WorldCat Atoum Y. , Ye M. , Ren L. , Tai Y. , Liu X. 2020 . Color-wise attention network for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 506 – 507 . Cao S. , Zhao D. , Liu X. , Sun Y. 2020 . Real-time robust detector for underwater live crabs based on deep learning . Computers and Electronics in Agriculture , 172 : 105339 . Google Scholar Crossref Search ADS WorldCat Cao S. , Zhao D. , Sun Y. , Liu X. , Ruan C. 2021 . Automatic coarse-to-fine joint detection and segmentation of underwater non-structural live crabs for precise feeding . Computers and Electronics in Agriculture , 180 : 105905 . Google Scholar Crossref Search ADS WorldCat Chen C. , Zhu W. , Oczak M. , Maschat K. , Baumgartner J. , Larsen M. L. V. , Norton T. 2020 . A computer vision approach for recognition of the engagement of pigs with different enrichment objects . Computers and Electronics in Agriculture , 175 : 105580 . Google Scholar Crossref Search ADS WorldCat Costa C. S. , Tetila E. C. , Astolfi G. , Sant’Ana D. A. , Brito Pache M. C. , Gonçalves A. B. , Garcia Zanoni V. A. et al. 2019 . A computer vision system for oocyte counting using images captured by smartphone . Aquacultural Engineering , 87 : 102017 . Google Scholar Crossref Search ADS WorldCat Duan Y. , Li D. , Stien L. H. , Fu Z. , Wright D. W. , Gao Y. 2019 . Automatic segmentation method for live fish eggs microscopic image analysis . Aquacultural Engineering , 85 : 49 – 55 . Google Scholar Crossref Search ADS WorldCat Gené-Mola J. , Gregorio E. , Guevara J. , Auat F. , Sanz-Cortiella R. , Escolà A. , Llorens J. et al. 2019 . Fruit detection in an apple orchard using a mobile terrestrial laser scanner . Biosystems Engineering , 187 : 171 – 184 . Google Scholar Crossref Search ADS WorldCat Gunnam L. C. , Shin K. J. 2016 . Realization of fish robot position recognition object using the color segment. In Proceedings of KIIT Conference, pp. 141 – 144 . Guo P. , Zeng D. , Tian Y. , Liu S. , Liu H. , Li D. 2020 . Multi-scale enhancement fusion for underwater sea cucumber images based on human visual system modelling . Computers and Electronics in Agriculture , 175 : 105608 . Google Scholar Crossref Search ADS WorldCat Hao S. , Feng Z. , Guo Y. 2018 . Low-light image enhancement with a refined illumination map . Multimedia Tools and Applications , 77 : 29639 – 29650 . Google Scholar Crossref Search ADS WorldCat Hassan S. G. , Hasan M. , Li D. 2016 . Information fusion in aquaculture: a state-of the art review . Frontiers of Agricultural Science and Engineering , 3 : 206 – 221 . Google Scholar Crossref Search ADS WorldCat He K. , Girshick R. , Dollár P. 2019 . Rethinking imagenet pre-training. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4918 – 4927 . Hu J. , Shen L. , Sun G. 2018 . Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132 – 7141 . Ji W. , Chen G. , Xu B. , Meng X. , Zhao D. 2019 . Recognition method of green pepper in greenhouse based on least-squares support vector machine optimized by the improved particle swarm optimization . IEEE Access , 7 : 119742 – 119754 . Google Scholar Crossref Search ADS WorldCat Ji W. , Qian Z. , Xu B. , Tang W. , Zhao D. 2018 . A nighttime image enhancement method based on Retinex and guided filter for object recognition of apple harvesting robot . International Journal of Advanced Robotic Systems , 15 : 1 – 9 . Google Scholar Crossref Search ADS WorldCat Jiang Y. , Gong X. , Liu D. , Cheng Y. , Fang C. , Shen X. , Wang Z. et al. 2019 . Enlightengan: deep light enhancement without paired supervision. arXiv preprint arXiv:1906.06972. Jobson D. J. , Rahman Z. U. , Woodell G. A. 1997 . A multiscale retinex for bridging the gap between color images and the human observation of scenes . IEEE Transactions on Image Processing , 6 : 965 – 976 . Google Scholar Crossref Search ADS PubMed WorldCat Kawahara R. , Nobuhara S. , Matsuyama T. 2016 . Dynamic 3D capture of swimming fish by underwater active stereo . Methods in Oceanography , 17 : 118 – 137 . Google Scholar Crossref Search ADS WorldCat Kim Y. T. 1997 . Contrast enhancement using brightness preserving bi-histogram equalization . IEEE Transactions on Consumer Electronics , 43 : 1 – 8 . Google Scholar Crossref Search ADS WorldCat Li D. , Hao Y. , Duan Y. 2020a . Nonintrusive methods for biomass estimation in aquaculture with emphasis on fish: a review . Reviews in Aquaculture , 12 : 1390 – 1411 . Google Scholar Crossref Search ADS WorldCat Li D. , Wang Z. , Wu S. , Miao Z. , Du L. , Duan Y. 2020b . Automatic recognition methods of fish feeding behavior in aquaculture: a review . Aquaculture , 528 : 735508 . Google Scholar Crossref Search ADS WorldCat Li D. , Xu L. , Liu H. 2017 . Detection of uneaten fish food pellets in underwater images for aquaculture . Aquacultural Engineering , 78 : 85 – 94 . Google Scholar Crossref Search ADS WorldCat Li Q. , Sun X. , Dong J. , Song S. , Zhang T. , Liu D. , Zhang H. et al. 2020c . Developing a microscopic image dataset in support of intelligent phytoplankton detection using deep learning . ICES Journal of Marine Science , 77 : 1427 – 1439 . Google Scholar Crossref Search ADS WorldCat Liu X. , Zhao D. , Jia W. , Ji W. , Ruan C. , Sun Y. 2019 . Cucumber fruits detection in greenhouses based on instance segmentation . IEEE Access , 7 : 139635 – 139642 . Google Scholar Crossref Search ADS WorldCat Lore K. G. , Akintayo A. , Sarkar S. 2017 . LLNet: a deep autoencoder approach to natural low-light image enhancement . Pattern Recognition , 61 : 650 – 662 . Google Scholar Crossref Search ADS WorldCat Lv F. , Lu F. 2019 . Attention-guided low-light image enhancement. arXiv preprint arXiv:1908.00682, 1(5). Mahmood A. , Bennamoun M. , An S. , Sohel F. , Boussaid F. , Hovey R. , Kendrick G. 2020 . Automatic detection of Western rock lobster using synthetic data . ICES Journal of Marine Science , 77 : 1308 – 1317 . Google Scholar Crossref Search ADS WorldCat Pettersen R. , Braa H. L. , Gawel B. A. , Letnes P. A. , Sæther K. , Aas L. M. S. 2019 . Detection and classification of Lepeophterius salmonis (Krøyer, 1837) using underwater hyperspectral imaging . Aquacultural Engineering , 87 : 102025 . Google Scholar Crossref Search ADS WorldCat Qiao X. , Bao J. , Zhang H. , Wan F. , Li D. 2019 . fvUnderwater sea cucumber identification based on principal component analysis and support vector machine . Measurement , 133 : 444 – 455 . Google Scholar Crossref Search ADS WorldCat Radman A. , Zainal N. , Suandi S. A. 2017 . Automated segmentation of iris images acquired in an unconstrained environment using HOG-SVM and GrowCut . Digital Signal Processing , 64 : 60 – 70 . Google Scholar Crossref Search ADS WorldCat Ronneberger O. , Fischer P. , Brox T. 2015 . U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention , pp. 234 – 241 . Springer , Cham . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Ruan C. , Zhao D. , Chen X. , Jia W. , Liu X. 2016 . Aquatic image segmentation method based on hs-PCNN for automatic operation boat in crab farming . Journal of Computational and Theoretical Nanoscience , 13 : 7366 – 7374 . Google Scholar Crossref Search ADS WorldCat Saberioon M. M. , Cisar P. 2016 . Automated multiple fish tracking in three-dimension using a structured light sensor . Computers and Electronics in Agriculture , 121 : 215 – 221 . Google Scholar Crossref Search ADS WorldCat Salman A. , Siddiqui S. A. , Shafait F. , Mian A. , Shortis M. R. , Khurshid K. , Ulges A. et al. 2020 . Automatic fish detection in underwater videos by a deep neural network-based hybrid motion learning system . ICES Journal of Marine Science , 77 : 1295 – 1307 . Google Scholar Crossref Search ADS WorldCat Sandler M. , Howard A. , Zhu M. , Zhmoginov A. , Chen L. C. 2018 . Mobilenetv2: inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510 – 4520 . Shaw P. , Uszkoreit J. , Vaswani A. 2018 . Self-attention with relative position representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 2 , pp. 464 – 468 . Shen L. , Yue Z. , Feng F. , Chen Q. , Liu S. and Ma, J. 2017 . Msr-net: low-light image enhancement using deep convolutional network. arXiv preprint arXiv:1711.02488. Shi B. , Sreeram V. , Zhao D. , Duan S. , Jiang J. 2018 . A wireless sensor network-based monitoring system for freshwater fishpond aquaculture . Biosystems Engineering , 172 : 57 – 66 . Google Scholar Crossref Search ADS WorldCat Tan M. , Chen B. , Pang R. , Vasudevan V. , Sandler M. , Howard A , Le Q. V. , 2019 . Mnasnet: platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2820 – 2828 . Tan M. , Le Q. V. 2019 . Efficientnet: rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning, pp. 6105 – 6114 . Tan M. , Pang R. , Le Q. V. 2020 . Efficientdet: scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781 – 10790 . Terayama K. , Shin K. , Mizuno K. , Tsuda K. 2019 . Integration of sonar and optical camera images using deep neural network for fish monitoring . Aquacultural Engineering , 86 : 102000 . Google Scholar Crossref Search ADS WorldCat Wang H. , Du M. , Yang F. , Zhang Z. 2019a . Score-CAM: improved visual explanations via score-weighted class activation mapping. arXiv preprint arXiv:1910.01279. Wang R. , Zhang Q. , Fu C. W. , Shen X. , Zheng W. S. , Jia J. 2019b . Underexposed photo enhancement using deep illumination estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6849 – 6857 . Wei C. , Wang W. , Yang W. , Liu J. 2018 . Deep retinex decomposition for low-light enhancement. In British Machine Vision Conference, p. 155 . Wei Y. , Jiao Y. , An D. , Li D. , Li W. , Wei Q. 2019 . Review of dissolved oxygen detection technology: from laboratory analysis to online intelligent detection . Sensors , 19 : 3995 . Google Scholar Crossref Search ADS WorldCat Xiong Y. , Liu H. , Gupta S. , Akin B. , Bender G. , Kindermans P. J. , Chen B. et al. 2020 . MobileDets: searching for object detection architectures for mobile accelerators. arXiv preprint arXiv:2004.14525. Yadav S. , Shukla S. 2016 . Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. In 2016 IEEE 6th International Conference on Advanced Computing (IACC), pp. 78 – 83 . IEEE. Yang X. , Zhang S. , Liu J. , Gao Q. , Dong S. , Zhou C. 2020 . Deep learning for smart fish farming: applications, opportunities and challenges . Reviews in Aquaculture , 13 : 66 – 90 . Google Scholar Crossref Search ADS WorldCat Ying Z. , Li G. , Gao W. 2017a . A bio-inspired multi-exposure fusion framework for low-light image enhancement. arXiv preprint arXiv:1711.00591. Ying Z. , Li G. , Ren Y. , Wang R. , Wang W. 2017b . A new low-light image enhancement algorithm using camera response model. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 3015 – 3022 . Zhang Q. , Gao G. 2020 . Prioritizing robotic grasping of stacked fruit clusters based on stalk location in RGB-D images . Computers and Electronics in Agriculture , 172 : 105359 . Google Scholar Crossref Search ADS WorldCat Zhang S. , Wang T. , Dong J. , Yu H. 2017 . Underwater image enhancement via extended multi-scale Retinex . Neurocomputing , 245 : 1 – 9 . Google Scholar Crossref Search ADS WorldCat Zhang Y. , Zhang J. , Guo X. 2019 . Kindling the darkness: a practical low-light image enhancer. In Proceedings of the 27th ACM International Conference on Multimedia, pp. 1632 – 1640 . Zhao D. , Cao S. , Sun Y. , Qi H. , Ruan C. 2020a . Small-sized efficient detector for underwater freely live crabs based on compound scaling neural network . Transactions of the Chinese Society for Agricultural Machinery , 51 : 163 – 174 . Google Scholar OpenURL Placeholder Text WorldCat Zhao D. , Liu X. , Sun Y. , Wu R. , Hong J. , Ruan C. 2019 . Detection of underwater crabs based on machine vision . Transactions of the Chinese Society for Agricultural Machinery , 50 : 151 – 158 . Google Scholar OpenURL Placeholder Text WorldCat Zhao Y. P. , Niu L. J. , Du H. , Bi C. W. 2020b . An adaptive method of damage detection for fishing nets based on image processing technology . Aquacultural Engineering , 102071 : Google Scholar OpenURL Placeholder Text WorldCat Zoph B. , Cubuk E. D. , Ghiasi G. , Lin T. Y. , Shlens J. , Le Q. V. 2019 . Learning data augmentation strategies for object detection. arXiv preprint arXiv:1906.11172. Cao, S., Zhao, D., Sun, Y., and Ruan, C. 2021. Learning-based low-illumination image enhancer for underwater live crab detection. – ICES Journal of Marine Science, 00: 000–000. © International Council for the Exploration of the Sea 2021. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Learning-based low-illumination image enhancer for underwater live crab detection JF - ICES Journal of Marine Science DO - 10.1093/icesjms/fsaa250 DA - 0024-07-13 UR - https://www.deepdyve.com/lp/oxford-university-press/learning-based-low-illumination-image-enhancer-for-underwater-live-ccjNeIXcXE SP - 1 EP - 1 VL - Advance Article IS - DP - DeepDyve ER -