January 27, 2020

3437 words 17 mins read

Paper Group ANR 1122

Software Engineering Practices for Machine Learning. Towards Making the Most of BERT in Neural Machine Translation. Asymmetric GAN for Unpaired Image-to-image Translation. Improving a Quality of 3D Object Detection by Spatial Transformation Mechanism. Fully Automated Pancreas Segmentation with Two-stage 3D Convolutional Neural Networks. Object-Cent …

Software Engineering Practices for Machine Learning


Title	Software Engineering Practices for Machine Learning
Authors	Peter Kriens, Tim Verbelen
Abstract	In the last couple of years we have witnessed an enormous increase of machine learning (ML) applications. More and more program functions are no longer written in code, but learnt from a huge amount of data samples using an ML algorithm. However, what is often overlooked is the complexity of managing the resulting ML models as well as bringing these into a real production system. In software engineering, we have spent decades on developing tools and methodologies to create, manage and assemble complex software modules. We present an overview of current techniques to manage complex software, and how this applies to ML models.
Tasks
Published	2019-06-25
URL	https://arxiv.org/abs/1906.10366v1
PDF	https://arxiv.org/pdf/1906.10366v1.pdf
PWC	https://paperswithcode.com/paper/software-engineering-practices-for-machine
Repo
Framework

Towards Making the Most of BERT in Neural Machine Translation


Title	Towards Making the Most of BERT in Neural Machine Translation
Authors	Jiacheng Yang, Mingxuan Wang, Hao Zhou, Chengqi Zhao, Yong Yu, Weinan Zhang, Lei Li
Abstract	GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various natural language processing tasks. However, LM fine-tuning often suffers from catastrophic forgetting when applied to resource-rich tasks. In this work, we introduce a concerted training framework (\method) that is the key to integrate the pre-trained LMs to neural machine translation (NMT). Our proposed Cnmt consists of three techniques: a) asymptotic distillation to ensure that the NMT model can retain the previous pre-trained knowledge; b) a dynamic switching gate to avoid catastrophic forgetting of pre-trained knowledge; and c) a strategy to adjust the learning paces according to a scheduled policy. Our experiments in machine translation show \method gains of up to 3 BLEU score on the WMT14 English-German language pair which even surpasses the previous state-of-the-art pre-training aided NMT by 1.4 BLEU score. While for the large WMT14 English-French task with 40 millions of sentence-pairs, our base model still significantly improves upon the state-of-the-art Transformer big model by more than 1 BLEU score.
Tasks	Machine Translation
Published	2019-08-15
URL	https://arxiv.org/abs/1908.05672v4
PDF	https://arxiv.org/pdf/1908.05672v4.pdf
PWC	https://paperswithcode.com/paper/towards-making-the-most-of-bert-in-neural
Repo
Framework

Asymmetric GAN for Unpaired Image-to-image Translation


Title	Asymmetric GAN for Unpaired Image-to-image Translation
Authors	Yu Li, Sheng Tang, Rui Zhang, Yongdong Zhang, Jintao Li, Shuicheng Yan
Abstract	Unpaired image-to-image translation problem aims to model the mapping from one domain to another with unpaired training data. Current works like the well-acknowledged Cycle GAN provide a general solution for any two domains through modeling injective mappings with a symmetric structure. While in situations where two domains are asymmetric in complexity, i.e., the amount of information between two domains is different, these approaches pose problems of poor generation quality, mapping ambiguity, and model sensitivity. To address these issues, we propose Asymmetric GAN (AsymGAN) to adapt the asymmetric domains by introducing an auxiliary variable (aux) to learn the extra information for transferring from the information-poor domain to the information-rich domain, which improves the performance of state-of-the-art approaches in the following ways. First, aux better balances the information between two domains which benefits the quality of generation. Second, the imbalance of information commonly leads to mapping ambiguity, where we are able to model one-to-many mappings by tuning aux, and furthermore, our aux is controllable. Third, the training of Cycle GAN can easily make the generator pair sensitive to small disturbances and variations while our model decouples the ill-conditioned relevance of generators by injecting aux during training. We verify the effectiveness of our proposed method both qualitatively and quantitatively on asymmetric situation, label-photo task, on Cityscapes and Helen datasets, and show many applications of asymmetric image translations. In conclusion, our AsymGAN provides a better solution for unpaired image-to-image translation in asymmetric domains.
Tasks	Image-to-Image Translation
Published	2019-12-25
URL	https://arxiv.org/abs/1912.11660v1
PDF	https://arxiv.org/pdf/1912.11660v1.pdf
PWC	https://paperswithcode.com/paper/asymmetric-gan-for-unpaired-image-to-image
Repo
Framework

Improving a Quality of 3D Object Detection by Spatial Transformation Mechanism


Title	Improving a Quality of 3D Object Detection by Spatial Transformation Mechanism
Authors	Kiwoo Shin, Masayoshi Tomizuka
Abstract	We present an endpoint box regression module(epBRM), which is designed for predicting precise 3D bounding boxes using raw LiDAR 3D point clouds. The proposed epBRM is built with sequence of small networks and is computationally lightweight. Our approach can improve a 3D object detection performance by predicting more precise 3D bounding box coordinates. The proposed approach requires 40 minutes of training to improve the detection performance. Moreover, epBRM imposes less than 12ms to network inference time for up-to 20 objects. The proposed approach utilizes a spatial transformation mechanism to simplify the box regression task. Adopting spatial transformation mechanism into epBRM makes it possible to improve the quality of detection with a small sized network. We conduct in-depth analysis of the effect of various spatial transformation mechanisms applied on raw LiDAR 3D point clouds. We also evaluate the proposed epBRM by applying it to several state-of-the-art 3D object detection systems. We evaluate our approach on KITTI dataset, a standard 3D object detection benchmark for autonomous vehicles. The proposed epBRM enhances the overlaps between ground truth bounding boxes and detected bounding boxes, and improves 3D object detection. Our proposed method evaluated in KITTI test server outperforms current state-of-the-art approaches.
Tasks	3D Object Detection, Autonomous Vehicles, Object Detection
Published	2019-09-27
URL	https://arxiv.org/abs/1910.04853v1
PDF	https://arxiv.org/pdf/1910.04853v1.pdf
PWC	https://paperswithcode.com/paper/improving-a-quality-of-3d-object-detection-by
Repo
Framework

Fully Automated Pancreas Segmentation with Two-stage 3D Convolutional Neural Networks


Title	Fully Automated Pancreas Segmentation with Two-stage 3D Convolutional Neural Networks
Authors	Ningning Zhao, Nuo Tong, Dan Ruan, Ke Sheng
Abstract	Due to the fact that pancreas is an abdominal organ with very large variations in shape and size, automatic and accurate pancreas segmentation can be challenging for medical image analysis. In this work, we proposed a fully automated two stage framework for pancreas segmentation based on convolutional neural networks (CNN). In the first stage, a U-Net is trained for the down-sampled 3D volume segmentation. Then a candidate region covering the pancreas is extracted from the estimated labels. Motivated by the superior performance reported by renowned region based CNN, in the second stage, another 3D U-Net is trained on the candidate region generated in the first stage. We evaluated the performance of the proposed method on the NIH computed tomography (CT) dataset, and verified its superiority over other state-of-the-art 2D and 3D approaches for pancreas segmentation in terms of dice-sorensen coefficient (DSC) accuracy in testing. The mean DSC of the proposed method is 85.99%.
Tasks	Automated Pancreas Segmentation, Computed Tomography (CT), Pancreas Segmentation
Published	2019-06-05
URL	https://arxiv.org/abs/1906.01795v2
PDF	https://arxiv.org/pdf/1906.01795v2.pdf
PWC	https://paperswithcode.com/paper/fully-automated-pancreas-segmentation-with
Repo
Framework

Object-Centric Stereo Matching for 3D Object Detection


Title	Object-Centric Stereo Matching for 3D Object Detection
Authors	Alex D. Pon, Jason Ku, Chengyao Li, Steven L. Waslander
Abstract	Safe autonomous driving requires reliable 3D object detection-determining the 6 DoF pose and dimensions of objects of interest. Using stereo cameras to solve this task is a cost-effective alternative to the widely used LiDAR sensor. The current state-of-the-art for stereo 3D object detection takes the existing PSMNet stereo matching network, with no modifications, and converts the estimated disparities into a 3D point cloud, and feeds this point cloud into a LiDAR-based 3D object detector. The issue with existing stereo matching networks is that they are designed for disparity estimation, not 3D object detection; the shape and accuracy of object point clouds are not the focus. Stereo matching networks commonly suffer from inaccurate depth estimates at object boundaries, which we define as streaking, because background and foreground points are jointly estimated. Existing networks also penalize disparity instead of the estimated position of object point clouds in their loss functions. We propose a novel 2D box association and object-centric stereo matching method that only estimates the disparities of the objects of interest to address these two issues. Our method achieves state-of-the-art results on the KITTI 3D and BEV benchmarks.
Tasks	3D Object Detection, Autonomous Driving, Disparity Estimation, Object Detection, Stereo Matching, Stereo Matching Hand
Published	2019-09-17
URL	https://arxiv.org/abs/1909.07566v2
PDF	https://arxiv.org/pdf/1909.07566v2.pdf
PWC	https://paperswithcode.com/paper/object-centric-stereo-matching-for-3d-object
Repo
Framework

MLOD: A multi-view 3D object detection based on robust feature fusion method


Title	MLOD: A multi-view 3D object detection based on robust feature fusion method
Authors	Jian Deng, Krzysztof Czarnecki
Abstract	This paper presents Multi-view Labelling Object Detector (MLOD). The detector takes an RGB image and a LIDAR point cloud as input and follows the two-stage object detection framework. A Region Proposal Network (RPN) generates 3D proposals in a Bird’s Eye View (BEV) projection of the point cloud. The second stage projects the 3D proposal bounding boxes to the image and BEV feature maps and sends the corresponding map crops to a detection header for classification and bounding-box regression. Unlike other multi-view based methods, the cropped image features are not directly fed to the detection header, but masked by the depth information to filter out parts outside 3D bounding boxes. The fusion of image and BEV features is challenging, as they are derived from different perspectives. We introduce a novel detection header, which provides detection results not just from fusion layer, but also from each sensor channel. Hence the object detector can be trained on data labelled in different views to avoid the degeneration of feature extractors. MLOD achieves state-of-the-art performance on the KITTI 3D object detection benchmark. Most importantly, the evaluation shows that the new header architecture is effective in preventing image feature extractor degeneration.
Tasks	3D Object Detection, Object Detection
Published	2019-09-09
URL	https://arxiv.org/abs/1909.04163v1
PDF	https://arxiv.org/pdf/1909.04163v1.pdf
PWC	https://paperswithcode.com/paper/mlod-a-multi-view-3d-object-detection-based
Repo
Framework

3D Bounding Box Estimation for Autonomous Vehicles by Cascaded Geometric Constraints and Depurated 2D Detections Using 3D Results


Title	3D Bounding Box Estimation for Autonomous Vehicles by Cascaded Geometric Constraints and Depurated 2D Detections Using 3D Results
Authors	Jiaojiao Fang, Lingtao Zhou, Guizhong Liu
Abstract	3D object detection is one of the most important tasks in 3D vision perceptual system of autonomous vehicles. In this paper, we propose a novel two stage 3D object detection method aimed at get the optimal solution of object location in 3D space based on regressing two additional 3D object properties by a deep convolutional neural network and combined with cascaded geometric constraints between the 2D and 3D boxes. First, we modify the existing 3D properties regressing network by adding two additional components, viewpoints classification and the center projection of the 3D bounding box s bottom face. Second, we use the predicted center projection combined with similar triangle constraint to acquire an initial 3D bounding box by a closed-form solution. Then, the location predicted by previous step is used as the initial value of the over-determined equations constructed by 2D and 3D boxes fitting constraint with the configuration determined with the classified viewpoint. Finally, we use the recovered physical world information by the 3D detections to filter out the false detection and false alarm in 2D detections. We compare our method with the state-of-the-arts on the KITTI dataset show that although conceptually simple, our method outperforms more complex and computational expensive methods not only by improving the overall precision of 3D detections, but also increasing the orientation estimation precision. Furthermore our method can deal with the truncated objects to some extent and remove the false alarm and false detections in both 2D and 3D detections.
Tasks	3D Object Detection, Autonomous Vehicles, Object Detection
Published	2019-09-01
URL	https://arxiv.org/abs/1909.01867v1
PDF	https://arxiv.org/pdf/1909.01867v1.pdf
PWC	https://paperswithcode.com/paper/3d-bounding-box-estimation-for-autonomous
Repo
Framework

E2-Capsule Neural Networks for Facial Expression Recognition Using AU-Aware Attention


Title	E2-Capsule Neural Networks for Facial Expression Recognition Using AU-Aware Attention
Authors	Shan Cao, Yuqian Yao, Gaoyun An
Abstract	Capsule neural network is a new and popular technique in deep learning. However, the traditional capsule neural network does not extract features sufficiently before the dynamic routing between the capsules. In this paper, the one Double Enhanced Capsule Neural Network (E2-Capsnet) that uses AU-aware attention for facial expression recognition (FER) is proposed. The E2-Capsnet takes advantage of dynamic routing between the capsules, and has two enhancement modules which are beneficial for FER. The first enhancement module is the convolutional neural network with AU-aware attention, which can help focus on the active areas of the expression. The second enhancement module is the capsule neural network with multiple convolutional layers, which enhances the ability of the feature representation. Finally, squashing function is used to classify the facial expression. We demonstrate the effectiveness of E2-Capsnet on the two public benchmark datasets, RAF-DB and EmotioNet. The experimental results show that our E2-Capsnet is superior to the state-of-the-art methods. Our implementation will be publicly available online.
Tasks	Facial Expression Recognition
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02491v1
PDF	https://arxiv.org/pdf/1912.02491v1.pdf
PWC	https://paperswithcode.com/paper/e2-capsule-neural-networks-for-facial
Repo
Framework

IoU Loss for 2D/3D Object Detection


Title	IoU Loss for 2D/3D Object Detection
Authors	Dingfu Zhou, Jin Fang, Xibin Song, Chenye Guan, Junbo Yin, Yuchao Dai, Ruigang Yang
Abstract	In 2D/3D object detection task, Intersection-over-Union (IoU) has been widely employed as an evaluation metric to evaluate the performance of different detectors in the testing stage. However, during the training stage, the common distance loss (\eg, $L_1$ or $L_2$) is often adopted as the loss function to minimize the discrepancy between the predicted and ground truth Bounding Box (Bbox). To eliminate the performance gap between training and testing, the IoU loss has been introduced for 2D object detection in \cite{yu2016unitbox} and \cite{rezatofighi2019generalized}. Unfortunately, all these approaches only work for axis-aligned 2D Bboxes, which cannot be applied for more general object detection task with rotated Bboxes. To resolve this issue, we investigate the IoU computation for two rotated Bboxes first and then implement a unified framework, IoU loss layer for both 2D and 3D object detection tasks. By integrating the implemented IoU loss into several state-of-the-art 3D object detectors, consistent improvements have been achieved for both bird-eye-view 2D detection and point cloud 3D detection on the public KITTI benchmark.
Tasks	3D Object Detection, Object Detection
Published	2019-08-11
URL	https://arxiv.org/abs/1908.03851v1
PDF	https://arxiv.org/pdf/1908.03851v1.pdf
PWC	https://paperswithcode.com/paper/iou-loss-for-2d3d-object-detection
Repo
Framework

Auto-Precision Scaling for Distributed Deep Learning


Title	Auto-Precision Scaling for Distributed Deep Learning
Authors	Ruobing Han, Yang You, James Demmel
Abstract	In recent years, large-batch optimization is becoming the key of distributed deep learning. However, large-batch optimization is hard. Straightforwardly porting the code often leads to a significant loss in testing accuracy. As some researchers suggested that large batch optimization leads to a low generalization performance, and they further conjectured that large-batch training needs a higher floating-point precision to achieve a higher generalization performance. To solve this problem, we conduct an open study in this paper. Our target is to find the number of bits that large-batch training needs. To do so, we need a system for customized precision study. However, state-of-the-art systems have some limitations that lower the efficiency of developers and researchers. To solve this problem, we design and implement our own system CPD: A High Performance System for Customized-Precision Distributed DL. In our experiments, our application often loses accuracy if we use a very-low precision (e.g. 8 bits or 4 bits). To solve this problem, we proposed the APS (Auto-Precision-Scaling) algorithm, which is a layer-wise adaptive scheme for gradients shifting. With APS, we are able to make the large-batch training converge with only 4 bits.
Tasks
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08907v1
PDF	https://arxiv.org/pdf/1911.08907v1.pdf
PWC	https://paperswithcode.com/paper/auto-precision-scaling-for-distributed-deep
Repo
Framework

Dual-FOFE-net Neural Models for Entity Linking with PageRank


Title	Dual-FOFE-net Neural Models for Entity Linking with PageRank
Authors	Feng Wei, Uyen Trang Nguyen, Hui Jiang
Abstract	This paper presents a simple and computationally efficient approach for entity linking (EL), compared with recurrent neural networks (RNNs) or convolutional neural networks (CNNs), by making use of feedforward neural networks (FFNNs) and the recent dual fixed-size ordinally forgetting encoding (dual-FOFE) method to fully encode the sentence fragment and its left/right contexts into a fixed-size representation. Furthermore, in this work, we propose to incorporate PageRank based distillation in our candidate generation module. Our neural linking models consist of three parts: a PageRank based candidate generation module, a dual-FOFE-net neural ranking model and a simple NIL entity clustering system. Experimental results have shown that our proposed neural linking models achieved higher EL accuracy than state-of-the-art models on the TAC2016 task dataset over the baseline system, without requiring any in-house data or complicated handcrafted features. Moreover, it achieves a competitive accuracy on the TAC2017 task dataset.
Tasks	Entity Linking
Published	2019-07-30
URL	https://arxiv.org/abs/1907.12697v1
PDF	https://arxiv.org/pdf/1907.12697v1.pdf
PWC	https://paperswithcode.com/paper/dual-fofe-net-neural-models-for-entity
Repo
Framework

Latent Representations of Dynamical Systems: When Two is Better Than One


Title	Latent Representations of Dynamical Systems: When Two is Better Than One
Authors	Max Tegmark
Abstract	A popular approach for predicting the future of dynamical systems involves mapping them into a lower-dimensional “latent space” where prediction is easier. We show that the information-theoretically optimal approach uses different mappings for present and future, in contrast to state-of-the-art machine-learning approaches where both mappings are the same. We illustrate this dichotomy by predicting the time-evolution of coupled harmonic oscillators with dissipation and thermal noise, showing how the optimal 2-mapping method significantly outperforms principal component analysis and all other approaches that use a single latent representation, and discuss the intuitive reason why two representations are better than one. We conjecture that a single latent representation is optimal only for time-reversible processes, not for e.g. text, speech, music or out-of-equilibrium physical systems.
Tasks
Published	2019-02-09
URL	http://arxiv.org/abs/1902.03364v2
PDF	http://arxiv.org/pdf/1902.03364v2.pdf
PWC	https://paperswithcode.com/paper/latent-representations-of-dynamical-systems
Repo
Framework

Disease Labeling via Machine Learning is NOT quite the same as Medical Diagnosis


Title	Disease Labeling via Machine Learning is NOT quite the same as Medical Diagnosis
Authors	Moshe BenBassat
Abstract	A key step in medical diagnosis is giving the patient a universally recognized label (e.g. Appendicitis) which essentially assigns the patient to a class(es) of patients with similar body failures. However, two patients having the same disease label(s) with high probability may still have differences in their feature manifestation patterns implying differences in the required treatments. Additionally, in many cases, the labels of the primary diagnoses leave some findings unexplained. Medical diagnosis is only partially about probability calculations for label X or Y. Diagnosis is not complete until the patient overall situation is clinically understood to the level that enables the best therapeutic decisions. Most machine learning models are data centric models, and evidence so far suggest they can reach expert level performance in the disease labeling phase. Nonetheless, like any other mathematical technique, they have their limitations and applicability scope. Primarily, data centric algorithms are knowledge blind and lack anatomy and physiology knowledge that physicians leverage to achieve complete diagnosis. This article advocates to complement them with intelligence to overcome their inherent limitations as knowledge blind algorithms. Machines can learn many things from data, but data is not the only source that machines can learn from. Historic patient data only tells us what the possible manifestations of a certain body failure are. Anatomy and physiology knowledge tell us how the body works and fails. Both are needed for complete diagnosis. The proposed Double Deep Learning approach, along with the initiative for Medical Wikipedia for Smart Machines, leads to AI diagnostic support solutions for complete diagnosis beyond the limited data only labeling solutions we see today. AI for medicine will forever be limited until their intelligence also integrates anatomy and physiology.
Tasks	Medical Diagnosis
Published	2019-09-08
URL	https://arxiv.org/abs/1909.03470v1
PDF	https://arxiv.org/pdf/1909.03470v1.pdf
PWC	https://paperswithcode.com/paper/disease-labeling-via-machine-learning-is-not
Repo
Framework

Probabilistic framework for solving Visual Dialog


Title	Probabilistic framework for solving Visual Dialog
Authors	Badri N. Patro, Anupriy, Vinay P. Namboodiri
Abstract	In this paper, we propose a probabilistic framework for solving the task of `Visual Dialog’. Solving this task requires reasoning and understanding of visual modality, language modality, and common sense knowledge to answer. Various architectures have been proposed to solve this task by variants of multi-modal deep learning techniques that combine visual and language representations. However, we believe that it is crucial to understand and analyze the sources of uncertainty for solving this task. Our approach allows for estimating uncertainty and also aids a diverse generation of answers. The proposed approach is obtained through a probabilistic representation module that provides us with representations for image, question and conversation history, a module that ensures that diverse latent representations for candidate answers are obtained given the probabilistic representations and an uncertainty representation module that chooses the appropriate answer that minimizes uncertainty. We thoroughly evaluate the model with a detailed ablation analysis, comparison with state of the art and visualization of the uncertainty that aids in the understanding of the method. Using the proposed probabilistic framework, we thus obtain an improved visual dialog system that is also more explainable. \|
Tasks	Common Sense Reasoning, Visual Dialog
Published	2019-09-11
URL	https://arxiv.org/abs/1909.04800v2
PDF	https://arxiv.org/pdf/1909.04800v2.pdf
PWC	https://paperswithcode.com/paper/probabilistic-framework-for-solving-visual
Repo
Framework