Home
doi: 10.1186/s41074-020-00066-8pmid: N/A
We address a 3D human pose estimation for equirectangular images taken by a wearable omnidirectional camera. The equirectangular image is distorted because the omnidirectional camera is attached closely in front of a person’s neck. Furthermore, some parts of the body are disconnected on the image; for instance, when a hand goes out to an edge of the image, the hand comes in from another edge. The distortion and disconnection of images make 3D pose estimation challenging. To overcome this difficulty, we introduce the location-maps method proposed by Mehta et al.; however, the method was used to estimate 3D human poses only for regular images without distortion and disconnection. We focus on a characteristic of the location-maps that can extend 2D joint locations to 3D positions with respect to 2D-3D consistency without considering kinematic model restrictions and optical properties. In addition, we collect a new dataset that is composed of equirectangular images and synchronized 3D joint positions for training and evaluation. We validate the location-maps’ capability to estimate 3D human poses for distorted and disconnected images. We propose a new location-maps-based model by replacing the backbone network with a state-of-the-art 2D human pose estimation model (HRNet). Our model is a simpler architecture than the reference model proposed by Mehta et al. Nevertheless, our model indicates better performance with respect to accuracy and computation complexity. Finally, we analyze the location-maps method from two perspectives: the map variance and the map scale. Therefore, some location-maps characteristics are revealed that (1) the map variance affects robustness to extend 2D joint locations to 3D positions for the 2D estimation error, and (2) the 3D position accuracy is related to the 2D locations relative accuracy to the map scale.
Nakane, Takumi; Bold, Naranchimeg; Sun, Haitian; Lu, Xuequan; Akashi, Takuya; Zhang, Chao
doi: 10.1186/s41074-020-00065-9pmid: N/A
Evolutionary algorithms (EAs) and swarm algorithms (SAs) have shown their usefulness in solving combinatorial and NP-hard optimization problems in various research fields. However, in the field of computer vision, related surveys have not been updated during the last decade. In this study, inspired by the recent development of deep neural networks in computer vision, which embed large-scale optimization problems, we first describe a literature survey conducted to compensate for the lack of relevant research in this area. Specifically, applications related to the genetic algorithm and differential evolution from EAs, as well as particle swarm optimization and ant colony optimization from SAs and their variants, are mainly considered in this survey.
Kushida, Takahiro; Tanaka, Kenichiro; Aoto, Takahito; Funatomi, Takuya; Mukaigawa, Yasuhiro
doi: 10.1186/s41074-020-00063-xpmid: N/A
Phase ambiguity is a major problem in the depth measurement in either time-of-flight or phase shifting. Resolving the ambiguity using a low frequency pattern sacrifices the depth resolution, and using multiple frequencies requires a number of observations. In this paper, we propose a phase disambiguation method that combines temporal and spatial modulation so that the high depth resolution is preserved while the number of observation is kept. A key observation is that the phase ambiguities of temporal and spatial domains appear differently with respect to the depth. Using this difference, the phase can disambiguate for a wider range of interest. We develop a prototype to show the effectiveness of our method through real-world experiments.
Yao, Yasuhiro; Xu, Katie; Murasaki, Kazuhiko; Ando, Shingo; Sagata, Atsushi
doi: 10.1186/s41074-020-00064-wpmid: N/A
Manually labelling point cloud scenes for use as training data in machine learning applications is a time- and labour-intensive task. In this paper, we aim to reduce the effort associated with learning semantic segmentation tasks by introducing a semi-supervised method that operates on scenes with only a small number of labelled points. For this task, we advocate the use of pseudo-labelling in combination with PointNet, a neural network architecture for point cloud classification and segmentation. We also introduce a method for incorporating information derived from spatial relationships to aid in the pseudo-labelling process. This approach has practical advantages over current methods by working directly on point clouds and not being reliant on predefined features. Moreover, we demonstrate competitive performance on scenes from three publicly available datasets and provide studies on parameter sensitivity.
Showing 1 to 4 of 4 Articles