May 7, 2019

2974 words 14 mins read

Paper Group AWR 55

Paper Group AWR 55

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. Ordinal Common-sense Inference. Adversarial Deep Structural Networks for Mammographic Mass Segmentation. Unsupervised Learning from Continuous Video in a Scalable Predictive Recurrent Network. Fast Robust Monocular Depth Estimation for Obstacle Detection with F …

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

Title Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
Authors Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, Jeffrey Dean
Abstract We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no change in the model architecture from our base system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. The rest of the model, which includes encoder, decoder and attention, remains unchanged and is shared across all languages. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT using a single model without any increase in parameters, which is significantly simpler than previous proposals for Multilingual NMT. Our method often improves the translation quality of all involved language pairs, even while keeping the total number of model parameters constant. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English$\rightarrow$French and surpasses state-of-the-art results for English$\rightarrow$German. Similarly, a single multilingual model surpasses state-of-the-art results for French$\rightarrow$English and German$\rightarrow$English on WMT’14 and WMT’15 benchmarks respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. In addition to improving the translation quality of language pairs that the model was trained with, our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and show some interesting examples when mixing languages.
Tasks Machine Translation, Transfer Learning
Published 2016-11-14
URL http://arxiv.org/abs/1611.04558v2
PDF http://arxiv.org/pdf/1611.04558v2.pdf
PWC https://paperswithcode.com/paper/googles-multilingual-neural-machine
Repo https://github.com/tilde-nlp/multilingual-nmt-data-prep
Framework none

Ordinal Common-sense Inference

Title Ordinal Common-sense Inference
Authors Sheng Zhang, Rachel Rudinger, Kevin Duh, Benjamin Van Durme
Abstract Humans have the capacity to draw common-sense inferences from natural language: various things that are likely but not certain to hold based on established discourse, and are rarely stated explicitly. We propose an evaluation of automated common-sense inference based on an extension of recognizing textual entailment: predicting ordinal human responses on the subjective likelihood of an inference holding in a given context. We describe a framework for extracting common-sense knowledge from corpora, which is then used to construct a dataset for this ordinal entailment task. We train a neural sequence-to-sequence model on this dataset, which we use to score and generate possible inferences. Further, we annotate subsets of previously established datasets via our ordinal annotation protocol in order to then analyze the distinctions between these and what we have constructed.
Tasks Common Sense Reasoning, Natural Language Inference
Published 2016-11-02
URL http://arxiv.org/abs/1611.00601v3
PDF http://arxiv.org/pdf/1611.00601v3.pdf
PWC https://paperswithcode.com/paper/ordinal-common-sense-inference
Repo https://github.com/felipessalvatore/NLI_datasets
Framework none

Adversarial Deep Structural Networks for Mammographic Mass Segmentation

Title Adversarial Deep Structural Networks for Mammographic Mass Segmentation
Authors Wentao Zhu, Xiang Xiang, Trac D. Tran, Xiaohui Xie
Abstract Mass segmentation is an important task in mammogram analysis, providing effective morphological features and regions of interest (ROI) for mass detection and classification. Inspired by the success of using deep convolutional features for natural image analysis and conditional random fields (CRF) for structural learning, we propose an end-to-end network for mammographic mass segmentation. The network employs a fully convolutional network (FCN) to model potential function, followed by a CRF to perform structural learning. Because the mass distribution varies greatly with pixel position, the FCN is combined with position priori for the task. Due to the small size of mammogram datasets, we use adversarial training to control over-fitting. Four models with different convolutional kernels are further fused to improve the segmentation results. Experimental results on two public datasets, INbreast and DDSM-BCRP, show that our end-to-end network combined with adversarial training achieves the-state-of-the-art results.
Tasks
Published 2016-12-18
URL http://arxiv.org/abs/1612.05970v2
PDF http://arxiv.org/pdf/1612.05970v2.pdf
PWC https://paperswithcode.com/paper/adversarial-deep-structural-networks-for
Repo https://github.com/wentaozhu/adversarial-deep-structural-networks
Framework none

Unsupervised Learning from Continuous Video in a Scalable Predictive Recurrent Network

Title Unsupervised Learning from Continuous Video in a Scalable Predictive Recurrent Network
Authors Filip Piekniewski, Patryk Laurent, Csaba Petre, Micah Richert, Dimitry Fisher, Todd Hylton
Abstract Understanding visual reality involves acquiring common-sense knowledge about countless regularities in the visual world, e.g., how illumination alters the appearance of objects in a scene, and how motion changes their apparent spatial relationship. These regularities are hard to label for training supervised machine learning algorithms; consequently, algorithms need to learn these regularities from the real world in an unsupervised way. We present a novel network meta-architecture that can learn world dynamics from raw, continuous video. The components of this network can be implemented using any algorithm that possesses three key capabilities: prediction of a signal over time, reduction of signal dimensionality (compression), and the ability to use supplementary contextual information to inform the prediction. The presented architecture is highly-parallelized and scalable, and is implemented using localized connectivity, processing, and learning. We demonstrate an implementation of this architecture where the components are built from multi-layer perceptrons. We apply the implementation to create a system capable of stable and robust visual tracking of objects as seen by a moving camera. Results show performance on par with or exceeding state-of-the-art tracking algorithms. The tracker can be trained in either fully supervised or unsupervised-then-briefly-supervised regimes. Success of the briefly-supervised regime suggests that the unsupervised portion of the model extracts useful information about visual reality. The results suggest a new class of AI algorithms that uniquely combine prediction and scalability in a way that makes them suitable for learning from and — and eventually acting within — the real world.
Tasks Common Sense Reasoning, Visual Tracking
Published 2016-07-22
URL http://arxiv.org/abs/1607.06854v3
PDF http://arxiv.org/pdf/1607.06854v3.pdf
PWC https://paperswithcode.com/paper/unsupervised-learning-from-continuous-video
Repo https://github.com/mhazoglou/PVM_PyCUDA
Framework none

Fast Robust Monocular Depth Estimation for Obstacle Detection with Fully Convolutional Networks

Title Fast Robust Monocular Depth Estimation for Obstacle Detection with Fully Convolutional Networks
Authors Michele Mancini, Gabriele Costante, Paolo Valigi, Thomas A. Ciarfuglia
Abstract Obstacle Detection is a central problem for any robotic system, and critical for autonomous systems that travel at high speeds in unpredictable environment. This is often achieved through scene depth estimation, by various means. When fast motion is considered, the detection range must be longer enough to allow for safe avoidance and path planning. Current solutions often make assumption on the motion of the vehicle that limit their applicability, or work at very limited ranges due to intrinsic constraints. We propose a novel appearance-based Object Detection system that is able to detect obstacles at very long range and at a very high speed (~300Hz), without making assumptions on the type of motion. We achieve these results using a Deep Neural Network approach trained on real and synthetic images and trading some depth accuracy for fast, robust and consistent operation. We show how photo-realistic synthetic images are able to solve the problem of training set dimension and variety typical of machine learning approaches, and how our system is robust to massive blurring of test images.
Tasks Depth Estimation, Monocular Depth Estimation, Object Detection
Published 2016-07-21
URL http://arxiv.org/abs/1607.06349v1
PDF http://arxiv.org/pdf/1607.06349v1.pdf
PWC https://paperswithcode.com/paper/fast-robust-monocular-depth-estimation-for
Repo https://github.com/fangchangma/sparse-to-dense.pytorch
Framework pytorch
Title Complex Embeddings for Simple Link Prediction
Authors Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, Guillaume Bouchard
Abstract In statistical relational learning, the link prediction problem is key to automatically understand the structure of large knowledge bases. As in previous studies, we propose to solve this problem through latent factorization. However, here we make use of complex valued embeddings. The composition of complex embeddings can handle a large variety of binary relations, among them symmetric and antisymmetric relations. Compared to state-of-the-art models such as Neural Tensor Network and Holographic Embeddings, our approach based on complex embeddings is arguably simpler, as it only uses the Hermitian dot product, the complex counterpart of the standard dot product between real vectors. Our approach is scalable to large datasets as it remains linear in both space and time, while consistently outperforming alternative approaches on standard link prediction benchmarks.
Tasks Link Prediction, Relational Reasoning
Published 2016-06-20
URL http://arxiv.org/abs/1606.06357v1
PDF http://arxiv.org/pdf/1606.06357v1.pdf
PWC https://paperswithcode.com/paper/complex-embeddings-for-simple-link-prediction
Repo https://github.com/stellargraph/stellargraph
Framework tf

LipNet: End-to-End Sentence-level Lipreading

Title LipNet: End-to-End Sentence-level Lipreading
Authors Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas
Abstract Lipreading is the task of decoding text from the movement of a speaker’s mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). However, existing work on models trained end-to-end perform only word classification, rather than sentence-level sequence prediction. Studies have shown that human lipreading performance increases for longer words (Easton & Basala, 1982), indicating the importance of features capturing temporal context in an ambiguous communication channel. Motivated by this observation, we present LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, a recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end. To the best of our knowledge, LipNet is the first end-to-end sentence-level lipreading model that simultaneously learns spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet achieves 95.2% accuracy in sentence-level, overlapped speaker split task, outperforming experienced human lipreaders and the previous 86.4% word-level state-of-the-art accuracy (Gergen et al., 2016).
Tasks Lipreading
Published 2016-11-05
URL http://arxiv.org/abs/1611.01599v2
PDF http://arxiv.org/pdf/1611.01599v2.pdf
PWC https://paperswithcode.com/paper/lipnet-end-to-end-sentence-level-lipreading
Repo https://github.com/ms8909/LipONet
Framework tf

Low-shot Visual Recognition by Shrinking and Hallucinating Features

Title Low-shot Visual Recognition by Shrinking and Hallucinating Features
Authors Bharath Hariharan, Ross Girshick
Abstract Low-shot visual learning—the ability to recognize novel object categories from very few examples—is a hallmark of human visual intelligence. Existing machine learning approaches fail to generalize in the same way. To make progress on this foundational problem, we present a low-shot learning benchmark on complex images that mimics challenges faced by recognition systems in the wild. We then propose a) representation regularization techniques, and b) techniques to hallucinate additional training examples for data-starved classes. Together, our methods improve the effectiveness of convolutional networks in low-shot learning, improving the one-shot accuracy on novel classes by 2.3x on the challenging ImageNet dataset.
Tasks
Published 2016-06-09
URL http://arxiv.org/abs/1606.02819v4
PDF http://arxiv.org/pdf/1606.02819v4.pdf
PWC https://paperswithcode.com/paper/low-shot-visual-recognition-by-shrinking-and
Repo https://github.com/facebookresearch/low-shot-shrink-hallucinate
Framework pytorch

GPU-based Pedestrian Detection for Autonomous Driving

Title GPU-based Pedestrian Detection for Autonomous Driving
Authors Victor Campmany, Sergio Silva, Antonio Espinosa, Juan Carlos Moure, David Vázquez, Antonio M. López
Abstract We propose a real-time pedestrian detection system for the embedded Nvidia Tegra X1 GPU-CPU hybrid platform. The pipeline is composed by the following state-of-the-art algorithms: Histogram of Local Binary Patterns (LBP) and Histograms of Oriented Gradients (HOG) features extracted from the input image; Pyramidal Sliding Window technique for candidate generation; and Support Vector Machine (SVM) for classification. Results show a 8x speedup in the target Tegra X1 platform and a better performance/watt ratio than desktop CUDA platforms in study.
Tasks Autonomous Driving, Pedestrian Detection
Published 2016-11-05
URL http://arxiv.org/abs/1611.01642v1
PDF http://arxiv.org/pdf/1611.01642v1.pdf
PWC https://paperswithcode.com/paper/gpu-based-pedestrian-detection-for-autonomous
Repo https://github.com/vcampmany/CudaVisionSysDeploy
Framework none

Using Semantic Similarity for Input Topic Identification in Crawling-based Web Application Testing

Title Using Semantic Similarity for Input Topic Identification in Crawling-based Web Application Testing
Authors Jun-Wei Lin, Farn Wang
Abstract To automatically test web applications, crawling-based techniques are usually adopted to mine the behavior models, explore the state spaces or detect the violated invariants of the applications. However, in existing crawlers, rules for identifying the topics of input text fields, such as login ids, passwords, emails, dates and phone numbers, have to be manually configured. Moreover, the rules for one application are very often not suitable for another. In addition, when several rules conflict and match an input text field to more than one topics, it can be difficult to determine which rule suggests a better match. This paper presents a natural-language approach to automatically identify the topics of encountered input fields during crawling by semantically comparing their similarities with the input fields in labeled corpus. In our evaluation with 100 real-world forms, the proposed approach demonstrated comparable performance to the rule-based one. Our experiments also show that the accuracy of the rule-based approach can be improved by up to 19% when integrated with our approach.
Tasks Semantic Similarity, Semantic Textual Similarity
Published 2016-08-23
URL http://arxiv.org/abs/1608.06549v1
PDF http://arxiv.org/pdf/1608.06549v1.pdf
PWC https://paperswithcode.com/paper/using-semantic-similarity-for-input-topic
Repo https://github.com/jwlin/arxiv-160430
Framework none

Higher-Order Factorization Machines

Title Higher-Order Factorization Machines
Authors Mathieu Blondel, Akinori Fujino, Naonori Ueda, Masakazu Ishihata
Abstract Factorization machines (FMs) are a supervised learning approach that can use second-order feature combinations even when the data is very high-dimensional. Unfortunately, despite increasing interest in FMs, there exists to date no efficient training algorithm for higher-order FMs (HOFMs). In this paper, we present the first generic yet efficient algorithms for training arbitrary-order HOFMs. We also present new variants of HOFMs with shared parameters, which greatly reduce model size and prediction times while maintaining similar accuracy. We demonstrate the proposed approaches on four different link prediction tasks.
Tasks Link Prediction
Published 2016-07-25
URL http://arxiv.org/abs/1607.07195v2
PDF http://arxiv.org/pdf/1607.07195v2.pdf
PWC https://paperswithcode.com/paper/higher-order-factorization-machines
Repo https://github.com/taohu88/recommendations
Framework tf

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

Title ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
Authors Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello
Abstract The ability to perform pixel-wise semantic segmentation in real-time is of paramount importance in mobile applications. Recent deep neural networks aimed at this task have the disadvantage of requiring a large number of floating point operations and have long run-times that hinder their usability. In this paper, we propose a novel deep neural network architecture named ENet (efficient neural network), created specifically for tasks requiring low latency operation. ENet is up to 18$\times$ faster, requires 75$\times$ less FLOPs, has 79$\times$ less parameters, and provides similar or better accuracy to existing models. We have tested it on CamVid, Cityscapes and SUN datasets and report on comparisons with existing state-of-the-art methods, and the trade-offs between accuracy and processing time of a network. We present performance measurements of the proposed architecture on embedded systems and suggest possible software improvements that could make ENet even faster.
Tasks Real-Time Semantic Segmentation, Semantic Segmentation
Published 2016-06-07
URL http://arxiv.org/abs/1606.02147v1
PDF http://arxiv.org/pdf/1606.02147v1.pdf
PWC https://paperswithcode.com/paper/enet-a-deep-neural-network-architecture-for
Repo https://github.com/fregu856/segmentation
Framework tf

An efficient iterative thresholding method for image segmentation

Title An efficient iterative thresholding method for image segmentation
Authors Dong Wang, Haohan Li, Xiaoyu Wei, Xiaoping Wang
Abstract We proposed an efficient iterative thresholding method for multi-phase image segmentation. The algorithm is based on minimizing piecewise constant Mumford-Shah functional in which the contour length (or perimeter) is approximated by a non-local multi-phase energy. The minimization problem is solved by an iterative method. Each iteration consists of computing simple convolutions followed by a thresholding step. The algorithm is easy to implement and has the optimal complexity $O(N \log N)$ per iteration. We also show that the iterative algorithm has the total energy decaying property. We present some numerical results to show the efficiency of our method.
Tasks Semantic Segmentation
Published 2016-08-04
URL http://arxiv.org/abs/1608.01431v2
PDF http://arxiv.org/pdf/1608.01431v2.pdf
PWC https://paperswithcode.com/paper/an-efficient-iterative-thresholding-method
Repo https://github.com/xywei/threshseg
Framework none

Learning from the Hindsight Plan – Episodic MPC Improvement

Title Learning from the Hindsight Plan – Episodic MPC Improvement
Authors Aviv Tamar, Garrett Thomas, Tianhao Zhang, Sergey Levine, Pieter Abbeel
Abstract Model predictive control (MPC) is a popular control method that has proved effective for robotics, among other fields. MPC performs re-planning at every time step. Re-planning is done with a limited horizon per computational and real-time constraints and often also for robustness to potential model errors. However, the limited horizon leads to suboptimal performance. In this work, we consider the iterative learning setting, where the same task can be repeated several times, and propose a policy improvement scheme for MPC. The main idea is that between executions we can, offline, run MPC with a longer horizon, resulting in a hindsight plan. To bring the next real-world execution closer to the hindsight plan, our approach learns to re-shape the original cost function with the goal of satisfying the following property: short horizon planning (as realistic during real executions) with respect to the shaped cost should result in mimicking the hindsight plan. This effectively consolidates long-term reasoning into the short-horizon planning. We empirically evaluate our approach in contact-rich manipulation tasks both in simulated and real environments, such as peg insertion by a real PR2 robot.
Tasks
Published 2016-09-28
URL http://arxiv.org/abs/1609.09001v2
PDF http://arxiv.org/pdf/1609.09001v2.pdf
PWC https://paperswithcode.com/paper/learning-from-the-hindsight-plan-episodic-mpc
Repo https://github.com/zuoxingdong/VIN_PyTorch_Visdom
Framework pytorch

Training Region-based Object Detectors with Online Hard Example Mining

Title Training Region-based Object Detectors with Online Hard Example Mining
Authors Abhinav Shrivastava, Abhinav Gupta, Ross Girshick
Abstract The field of object detection has made significant advances riding on the wave of region-based ConvNets, but their training procedure still includes many heuristics and hyperparameters that are costly to tune. We present a simple yet surprisingly effective online hard example mining (OHEM) algorithm for training region-based ConvNet detectors. Our motivation is the same as it has always been – detection datasets contain an overwhelming number of easy examples and a small number of hard examples. Automatic selection of these hard examples can make training more effective and efficient. OHEM is a simple and intuitive algorithm that eliminates several heuristics and hyperparameters in common use. But more importantly, it yields consistent and significant boosts in detection performance on benchmarks like PASCAL VOC 2007 and 2012. Its effectiveness increases as datasets become larger and more difficult, as demonstrated by the results on the MS COCO dataset. Moreover, combined with complementary advances in the field, OHEM leads to state-of-the-art results of 78.9% and 76.3% mAP on PASCAL VOC 2007 and 2012 respectively.
Tasks Object Detection
Published 2016-04-12
URL http://arxiv.org/abs/1604.03540v1
PDF http://arxiv.org/pdf/1604.03540v1.pdf
PWC https://paperswithcode.com/paper/training-region-based-object-detectors-with
Repo https://github.com/tkuanlun350/Kaggle_Ship_Detection_2018
Framework tf
comments powered by Disqus