Paper Group ANR 1122
Software Engineering Practices for Machine Learning. Towards Making the Most of BERT in Neural Machine Translation. Asymmetric GAN for Unpaired Image-to-image Translation. Improving a Quality of 3D Object Detection by Spatial Transformation Mechanism. Fully Automated Pancreas Segmentation with Two-stage 3D Convolutional Neural Networks. Object-Cent …
Software Engineering Practices for Machine Learning
Title | Software Engineering Practices for Machine Learning |
Authors | Peter Kriens, Tim Verbelen |
Abstract | In the last couple of years we have witnessed an enormous increase of machine learning (ML) applications. More and more program functions are no longer written in code, but learnt from a huge amount of data samples using an ML algorithm. However, what is often overlooked is the complexity of managing the resulting ML models as well as bringing these into a real production system. In software engineering, we have spent decades on developing tools and methodologies to create, manage and assemble complex software modules. We present an overview of current techniques to manage complex software, and how this applies to ML models. |
Tasks | |
Published | 2019-06-25 |
URL | https://arxiv.org/abs/1906.10366v1 |
https://arxiv.org/pdf/1906.10366v1.pdf | |
PWC | https://paperswithcode.com/paper/software-engineering-practices-for-machine |
Repo | |
Framework | |
Towards Making the Most of BERT in Neural Machine Translation
Title | Towards Making the Most of BERT in Neural Machine Translation |
Authors | Jiacheng Yang, Mingxuan Wang, Hao Zhou, Chengqi Zhao, Yong Yu, Weinan Zhang, Lei Li |
Abstract | GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various natural language processing tasks. However, LM fine-tuning often suffers from catastrophic forgetting when applied to resource-rich tasks. In this work, we introduce a concerted training framework (\method) that is the key to integrate the pre-trained LMs to neural machine translation (NMT). Our proposed Cnmt consists of three techniques: a) asymptotic distillation to ensure that the NMT model can retain the previous pre-trained knowledge; b) a dynamic switching gate to avoid catastrophic forgetting of pre-trained knowledge; and c) a strategy to adjust the learning paces according to a scheduled policy. Our experiments in machine translation show \method gains of up to 3 BLEU score on the WMT14 English-German language pair which even surpasses the previous state-of-the-art pre-training aided NMT by 1.4 BLEU score. While for the large WMT14 English-French task with 40 millions of sentence-pairs, our base model still significantly improves upon the state-of-the-art Transformer big model by more than 1 BLEU score. |
Tasks | Machine Translation |
Published | 2019-08-15 |
URL | https://arxiv.org/abs/1908.05672v4 |
https://arxiv.org/pdf/1908.05672v4.pdf | |
PWC | https://paperswithcode.com/paper/towards-making-the-most-of-bert-in-neural |
Repo | |
Framework | |
Asymmetric GAN for Unpaired Image-to-image Translation
Title | Asymmetric GAN for Unpaired Image-to-image Translation |
Authors | Yu Li, Sheng Tang, Rui Zhang, Yongdong Zhang, Jintao Li, Shuicheng Yan |
Abstract | Unpaired image-to-image translation problem aims to model the mapping from one domain to another with unpaired training data. Current works like the well-acknowledged Cycle GAN provide a general solution for any two domains through modeling injective mappings with a symmetric structure. While in situations where two domains are asymmetric in complexity, i.e., the amount of information between two domains is different, these approaches pose problems of poor generation quality, mapping ambiguity, and model sensitivity. To address these issues, we propose Asymmetric GAN (AsymGAN) to adapt the asymmetric domains by introducing an auxiliary variable (aux) to learn the extra information for transferring from the information-poor domain to the information-rich domain, which improves the performance of state-of-the-art approaches in the following ways. First, aux better balances the information between two domains which benefits the quality of generation. Second, the imbalance of information commonly leads to mapping ambiguity, where we are able to model one-to-many mappings by tuning aux, and furthermore, our aux is controllable. Third, the training of Cycle GAN can easily make the generator pair sensitive to small disturbances and variations while our model decouples the ill-conditioned relevance of generators by injecting aux during training. We verify the effectiveness of our proposed method both qualitatively and quantitatively on asymmetric situation, label-photo task, on Cityscapes and Helen datasets, and show many applications of asymmetric image translations. In conclusion, our AsymGAN provides a better solution for unpaired image-to-image translation in asymmetric domains. |
Tasks | Image-to-Image Translation |
Published | 2019-12-25 |
URL | https://arxiv.org/abs/1912.11660v1 |
https://arxiv.org/pdf/1912.11660v1.pdf | |
PWC | https://paperswithcode.com/paper/asymmetric-gan-for-unpaired-image-to-image |
Repo | |
Framework | |
Improving a Quality of 3D Object Detection by Spatial Transformation Mechanism
Title | Improving a Quality of 3D Object Detection by Spatial Transformation Mechanism |
Authors | Kiwoo Shin, Masayoshi Tomizuka |
Abstract | We present an endpoint box regression module(epBRM), which is designed for predicting precise 3D bounding boxes using raw LiDAR 3D point clouds. The proposed epBRM is built with sequence of small networks and is computationally lightweight. Our approach can improve a 3D object detection performance by predicting more precise 3D bounding box coordinates. The proposed approach requires 40 minutes of training to improve the detection performance. Moreover, epBRM imposes less than 12ms to network inference time for up-to 20 objects. The proposed approach utilizes a spatial transformation mechanism to simplify the box regression task. Adopting spatial transformation mechanism into epBRM makes it possible to improve the quality of detection with a small sized network. We conduct in-depth analysis of the effect of various spatial transformation mechanisms applied on raw LiDAR 3D point clouds. We also evaluate the proposed epBRM by applying it to several state-of-the-art 3D object detection systems. We evaluate our approach on KITTI dataset, a standard 3D object detection benchmark for autonomous vehicles. The proposed epBRM enhances the overlaps between ground truth bounding boxes and detected bounding boxes, and improves 3D object detection. Our proposed method evaluated in KITTI test server outperforms current state-of-the-art approaches. |
Tasks | 3D Object Detection, Autonomous Vehicles, Object Detection |
Published | 2019-09-27 |
URL | https://arxiv.org/abs/1910.04853v1 |
https://arxiv.org/pdf/1910.04853v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-a-quality-of-3d-object-detection-by |
Repo | |
Framework | |
Fully Automated Pancreas Segmentation with Two-stage 3D Convolutional Neural Networks
Title | Fully Automated Pancreas Segmentation with Two-stage 3D Convolutional Neural Networks |
Authors | Ningning Zhao, Nuo Tong, Dan Ruan, Ke Sheng |
Abstract | Due to the fact that pancreas is an abdominal organ with very large variations in shape and size, automatic and accurate pancreas segmentation can be challenging for medical image analysis. In this work, we proposed a fully automated two stage framework for pancreas segmentation based on convolutional neural networks (CNN). In the first stage, a U-Net is trained for the down-sampled 3D volume segmentation. Then a candidate region covering the pancreas is extracted from the estimated labels. Motivated by the superior performance reported by renowned region based CNN, in the second stage, another 3D U-Net is trained on the candidate region generated in the first stage. We evaluated the performance of the proposed method on the NIH computed tomography (CT) dataset, and verified its superiority over other state-of-the-art 2D and 3D approaches for pancreas segmentation in terms of dice-sorensen coefficient (DSC) accuracy in testing. The mean DSC of the proposed method is 85.99%. |
Tasks | Automated Pancreas Segmentation, Computed Tomography (CT), Pancreas Segmentation |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.01795v2 |
https://arxiv.org/pdf/1906.01795v2.pdf | |
PWC | https://paperswithcode.com/paper/fully-automated-pancreas-segmentation-with |
Repo | |
Framework | |
Object-Centric Stereo Matching for 3D Object Detection
Title | Object-Centric Stereo Matching for 3D Object Detection |
Authors | Alex D. Pon, Jason Ku, Chengyao Li, Steven L. Waslander |
Abstract | Safe autonomous driving requires reliable 3D object detection-determining the 6 DoF pose and dimensions of objects of interest. Using stereo cameras to solve this task is a cost-effective alternative to the widely used LiDAR sensor. The current state-of-the-art for stereo 3D object detection takes the existing PSMNet stereo matching network, with no modifications, and converts the estimated disparities into a 3D point cloud, and feeds this point cloud into a LiDAR-based 3D object detector. The issue with existing stereo matching networks is that they are designed for disparity estimation, not 3D object detection; the shape and accuracy of object point clouds are not the focus. Stereo matching networks commonly suffer from inaccurate depth estimates at object boundaries, which we define as streaking, because background and foreground points are jointly estimated. Existing networks also penalize disparity instead of the estimated position of object point clouds in their loss functions. We propose a novel 2D box association and object-centric stereo matching method that only estimates the disparities of the objects of interest to address these two issues. Our method achieves state-of-the-art results on the KITTI 3D and BEV benchmarks. |
Tasks | 3D Object Detection, Autonomous Driving, Disparity Estimation, Object Detection, Stereo Matching, Stereo Matching Hand |
Published | 2019-09-17 |
URL | https://arxiv.org/abs/1909.07566v2 |
https://arxiv.org/pdf/1909.07566v2.pdf | |
PWC | https://paperswithcode.com/paper/object-centric-stereo-matching-for-3d-object |
Repo | |
Framework | |
MLOD: A multi-view 3D object detection based on robust feature fusion method
Title | MLOD: A multi-view 3D object detection based on robust feature fusion method |
Authors | Jian Deng, Krzysztof Czarnecki |
Abstract | This paper presents Multi-view Labelling Object Detector (MLOD). The detector takes an RGB image and a LIDAR point cloud as input and follows the two-stage object detection framework. A Region Proposal Network (RPN) generates 3D proposals in a Bird’s Eye View (BEV) projection of the point cloud. The second stage projects the 3D proposal bounding boxes to the image and BEV feature maps and sends the corresponding map crops to a detection header for classification and bounding-box regression. Unlike other multi-view based methods, the cropped image features are not directly fed to the detection header, but masked by the depth information to filter out parts outside 3D bounding boxes. The fusion of image and BEV features is challenging, as they are derived from different perspectives. We introduce a novel detection header, which provides detection results not just from fusion layer, but also from each sensor channel. Hence the object detector can be trained on data labelled in different views to avoid the degeneration of feature extractors. MLOD achieves state-of-the-art performance on the KITTI 3D object detection benchmark. Most importantly, the evaluation shows that the new header architecture is effective in preventing image feature extractor degeneration. |
Tasks | 3D Object Detection, Object Detection |
Published | 2019-09-09 |
URL | https://arxiv.org/abs/1909.04163v1 |
https://arxiv.org/pdf/1909.04163v1.pdf | |
PWC | https://paperswithcode.com/paper/mlod-a-multi-view-3d-object-detection-based |
Repo | |
Framework | |
3D Bounding Box Estimation for Autonomous Vehicles by Cascaded Geometric Constraints and Depurated 2D Detections Using 3D Results
Title | 3D Bounding Box Estimation for Autonomous Vehicles by Cascaded Geometric Constraints and Depurated 2D Detections Using 3D Results |
Authors | Jiaojiao Fang, Lingtao Zhou, Guizhong Liu |
Abstract | 3D object detection is one of the most important tasks in 3D vision perceptual system of autonomous vehicles. In this paper, we propose a novel two stage 3D object detection method aimed at get the optimal solution of object location in 3D space based on regressing two additional 3D object properties by a deep convolutional neural network and combined with cascaded geometric constraints between the 2D and 3D boxes. First, we modify the existing 3D properties regressing network by adding two additional components, viewpoints classification and the center projection of the 3D bounding box s bottom face. Second, we use the predicted center projection combined with similar triangle constraint to acquire an initial 3D bounding box by a closed-form solution. Then, the location predicted by previous step is used as the initial value of the over-determined equations constructed by 2D and 3D boxes fitting constraint with the configuration determined with the classified viewpoint. Finally, we use the recovered physical world information by the 3D detections to filter out the false detection and false alarm in 2D detections. We compare our method with the state-of-the-arts on the KITTI dataset show that although conceptually simple, our method outperforms more complex and computational expensive methods not only by improving the overall precision of 3D detections, but also increasing the orientation estimation precision. Furthermore our method can deal with the truncated objects to some extent and remove the false alarm and false detections in both 2D and 3D detections. |
Tasks | 3D Object Detection, Autonomous Vehicles, Object Detection |
Published | 2019-09-01 |
URL | https://arxiv.org/abs/1909.01867v1 |
https://arxiv.org/pdf/1909.01867v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-bounding-box-estimation-for-autonomous |
Repo | |
Framework | |
E2-Capsule Neural Networks for Facial Expression Recognition Using AU-Aware Attention
Title | E2-Capsule Neural Networks for Facial Expression Recognition Using AU-Aware Attention |
Authors | Shan Cao, Yuqian Yao, Gaoyun An |
Abstract | Capsule neural network is a new and popular technique in deep learning. However, the traditional capsule neural network does not extract features sufficiently before the dynamic routing between the capsules. In this paper, the one Double Enhanced Capsule Neural Network (E2-Capsnet) that uses AU-aware attention for facial expression recognition (FER) is proposed. The E2-Capsnet takes advantage of dynamic routing between the capsules, and has two enhancement modules which are beneficial for FER. The first enhancement module is the convolutional neural network with AU-aware attention, which can help focus on the active areas of the expression. The second enhancement module is the capsule neural network with multiple convolutional layers, which enhances the ability of the feature representation. Finally, squashing function is used to classify the facial expression. We demonstrate the effectiveness of E2-Capsnet on the two public benchmark datasets, RAF-DB and EmotioNet. The experimental results show that our E2-Capsnet is superior to the state-of-the-art methods. Our implementation will be publicly available online. |
Tasks | Facial Expression Recognition |
Published | 2019-12-05 |
URL | https://arxiv.org/abs/1912.02491v1 |
https://arxiv.org/pdf/1912.02491v1.pdf | |
PWC | https://paperswithcode.com/paper/e2-capsule-neural-networks-for-facial |
Repo | |
Framework | |
IoU Loss for 2D/3D Object Detection
Title | IoU Loss for 2D/3D Object Detection |
Authors | Dingfu Zhou, Jin Fang, Xibin Song, Chenye Guan, Junbo Yin, Yuchao Dai, Ruigang Yang |
Abstract | In 2D/3D object detection task, Intersection-over-Union (IoU) has been widely employed as an evaluation metric to evaluate the performance of different detectors in the testing stage. However, during the training stage, the common distance loss (\eg, $L_1$ or $L_2$) is often adopted as the loss function to minimize the discrepancy between the predicted and ground truth Bounding Box (Bbox). To eliminate the performance gap between training and testing, the IoU loss has been introduced for 2D object detection in \cite{yu2016unitbox} and \cite{rezatofighi2019generalized}. Unfortunately, all these approaches only work for axis-aligned 2D Bboxes, which cannot be applied for more general object detection task with rotated Bboxes. To resolve this issue, we investigate the IoU computation for two rotated Bboxes first and then implement a unified framework, IoU loss layer for both 2D and 3D object detection tasks. By integrating the implemented IoU loss into several state-of-the-art 3D object detectors, consistent improvements have been achieved for both bird-eye-view 2D detection and point cloud 3D detection on the public KITTI benchmark. |
Tasks | 3D Object Detection, Object Detection |
Published | 2019-08-11 |
URL | https://arxiv.org/abs/1908.03851v1 |
https://arxiv.org/pdf/1908.03851v1.pdf | |
PWC | https://paperswithcode.com/paper/iou-loss-for-2d3d-object-detection |
Repo | |
Framework | |
Auto-Precision Scaling for Distributed Deep Learning
Title | Auto-Precision Scaling for Distributed Deep Learning |
Authors | Ruobing Han, Yang You, James Demmel |
Abstract | In recent years, large-batch optimization is becoming the key of distributed deep learning. However, large-batch optimization is hard. Straightforwardly porting the code often leads to a significant loss in testing accuracy. As some researchers suggested that large batch optimization leads to a low generalization performance, and they further conjectured that large-batch training needs a higher floating-point precision to achieve a higher generalization performance. To solve this problem, we conduct an open study in this paper. Our target is to find the number of bits that large-batch training needs. To do so, we need a system for customized precision study. However, state-of-the-art systems have some limitations that lower the efficiency of developers and researchers. To solve this problem, we design and implement our own system CPD: A High Performance System for Customized-Precision Distributed DL. In our experiments, our application often loses accuracy if we use a very-low precision (e.g. 8 bits or 4 bits). To solve this problem, we proposed the APS (Auto-Precision-Scaling) algorithm, which is a layer-wise adaptive scheme for gradients shifting. With APS, we are able to make the large-batch training converge with only 4 bits. |
Tasks | |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.08907v1 |
https://arxiv.org/pdf/1911.08907v1.pdf | |
PWC | https://paperswithcode.com/paper/auto-precision-scaling-for-distributed-deep |
Repo | |
Framework | |
Dual-FOFE-net Neural Models for Entity Linking with PageRank
Title | Dual-FOFE-net Neural Models for Entity Linking with PageRank |
Authors | Feng Wei, Uyen Trang Nguyen, Hui Jiang |
Abstract | This paper presents a simple and computationally efficient approach for entity linking (EL), compared with recurrent neural networks (RNNs) or convolutional neural networks (CNNs), by making use of feedforward neural networks (FFNNs) and the recent dual fixed-size ordinally forgetting encoding (dual-FOFE) method to fully encode the sentence fragment and its left/right contexts into a fixed-size representation. Furthermore, in this work, we propose to incorporate PageRank based distillation in our candidate generation module. Our neural linking models consist of three parts: a PageRank based candidate generation module, a dual-FOFE-net neural ranking model and a simple NIL entity clustering system. Experimental results have shown that our proposed neural linking models achieved higher EL accuracy than state-of-the-art models on the TAC2016 task dataset over the baseline system, without requiring any in-house data or complicated handcrafted features. Moreover, it achieves a competitive accuracy on the TAC2017 task dataset. |
Tasks | Entity Linking |
Published | 2019-07-30 |
URL | https://arxiv.org/abs/1907.12697v1 |
https://arxiv.org/pdf/1907.12697v1.pdf | |
PWC | https://paperswithcode.com/paper/dual-fofe-net-neural-models-for-entity |
Repo | |
Framework | |
Latent Representations of Dynamical Systems: When Two is Better Than One
Title | Latent Representations of Dynamical Systems: When Two is Better Than One |
Authors | Max Tegmark |
Abstract | A popular approach for predicting the future of dynamical systems involves mapping them into a lower-dimensional “latent space” where prediction is easier. We show that the information-theoretically optimal approach uses different mappings for present and future, in contrast to state-of-the-art machine-learning approaches where both mappings are the same. We illustrate this dichotomy by predicting the time-evolution of coupled harmonic oscillators with dissipation and thermal noise, showing how the optimal 2-mapping method significantly outperforms principal component analysis and all other approaches that use a single latent representation, and discuss the intuitive reason why two representations are better than one. We conjecture that a single latent representation is optimal only for time-reversible processes, not for e.g. text, speech, music or out-of-equilibrium physical systems. |
Tasks | |
Published | 2019-02-09 |
URL | http://arxiv.org/abs/1902.03364v2 |
http://arxiv.org/pdf/1902.03364v2.pdf | |
PWC | https://paperswithcode.com/paper/latent-representations-of-dynamical-systems |
Repo | |
Framework | |
Disease Labeling via Machine Learning is NOT quite the same as Medical Diagnosis
Title | Disease Labeling via Machine Learning is NOT quite the same as Medical Diagnosis |
Authors | Moshe BenBassat |
Abstract | A key step in medical diagnosis is giving the patient a universally recognized label (e.g. Appendicitis) which essentially assigns the patient to a class(es) of patients with similar body failures. However, two patients having the same disease label(s) with high probability may still have differences in their feature manifestation patterns implying differences in the required treatments. Additionally, in many cases, the labels of the primary diagnoses leave some findings unexplained. Medical diagnosis is only partially about probability calculations for label X or Y. Diagnosis is not complete until the patient overall situation is clinically understood to the level that enables the best therapeutic decisions. Most machine learning models are data centric models, and evidence so far suggest they can reach expert level performance in the disease labeling phase. Nonetheless, like any other mathematical technique, they have their limitations and applicability scope. Primarily, data centric algorithms are knowledge blind and lack anatomy and physiology knowledge that physicians leverage to achieve complete diagnosis. This article advocates to complement them with intelligence to overcome their inherent limitations as knowledge blind algorithms. Machines can learn many things from data, but data is not the only source that machines can learn from. Historic patient data only tells us what the possible manifestations of a certain body failure are. Anatomy and physiology knowledge tell us how the body works and fails. Both are needed for complete diagnosis. The proposed Double Deep Learning approach, along with the initiative for Medical Wikipedia for Smart Machines, leads to AI diagnostic support solutions for complete diagnosis beyond the limited data only labeling solutions we see today. AI for medicine will forever be limited until their intelligence also integrates anatomy and physiology. |
Tasks | Medical Diagnosis |
Published | 2019-09-08 |
URL | https://arxiv.org/abs/1909.03470v1 |
https://arxiv.org/pdf/1909.03470v1.pdf | |
PWC | https://paperswithcode.com/paper/disease-labeling-via-machine-learning-is-not |
Repo | |
Framework | |
Probabilistic framework for solving Visual Dialog
Title | Probabilistic framework for solving Visual Dialog |
Authors | Badri N. Patro, Anupriy, Vinay P. Namboodiri |
Abstract | In this paper, we propose a probabilistic framework for solving the task of `Visual Dialog’. Solving this task requires reasoning and understanding of visual modality, language modality, and common sense knowledge to answer. Various architectures have been proposed to solve this task by variants of multi-modal deep learning techniques that combine visual and language representations. However, we believe that it is crucial to understand and analyze the sources of uncertainty for solving this task. Our approach allows for estimating uncertainty and also aids a diverse generation of answers. The proposed approach is obtained through a probabilistic representation module that provides us with representations for image, question and conversation history, a module that ensures that diverse latent representations for candidate answers are obtained given the probabilistic representations and an uncertainty representation module that chooses the appropriate answer that minimizes uncertainty. We thoroughly evaluate the model with a detailed ablation analysis, comparison with state of the art and visualization of the uncertainty that aids in the understanding of the method. Using the proposed probabilistic framework, we thus obtain an improved visual dialog system that is also more explainable. | |
Tasks | Common Sense Reasoning, Visual Dialog |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.04800v2 |
https://arxiv.org/pdf/1909.04800v2.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-framework-for-solving-visual |
Repo | |
Framework | |