May 7, 2019

2850 words 14 mins read

Paper Group AWR 15

Paper Group AWR 15

Lens Distortion Rectification using Triangulation based Interpolation. MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving. Optimization Methods for Large-Scale Machine Learning. Unsupervised Cross-Domain Image Generation. Beam Search for Learning a Deep Convolutional Neural Network of 3D Shapes. Recurrent Neural Network Grammars. J …

Lens Distortion Rectification using Triangulation based Interpolation

Title Lens Distortion Rectification using Triangulation based Interpolation
Authors Burak Benligiray, Cihan Topal
Abstract Nonlinear lens distortion rectification is a common first step in image processing applications where the assumption of a linear camera model is essential. For rectifying the lens distortion, forward distortion model needs to be known. However, many self-calibration methods estimate the inverse distortion model. In the literature, the inverse of the estimated model is approximated for image rectification, which introduces additional error to the system. We propose a novel distortion rectification method that uses the inverse distortion model directly. The method starts by mapping the distorted pixels to the rectified image using the inverse distortion model. The resulting set of points with subpixel locations are triangulated. The pixel values of the rectified image are linearly interpolated based on this triangulation. The method is applicable to all camera calibration methods that estimate the inverse distortion model and performs well across a large range of parameters.
Tasks Calibration
Published 2016-11-29
URL http://arxiv.org/abs/1611.09559v2
PDF http://arxiv.org/pdf/1611.09559v2.pdf
PWC https://paperswithcode.com/paper/lens-distortion-rectification-using
Repo https://github.com/bbenligiray/lens-distortion-triangulation
Framework none

MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving

Title MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving
Authors Marvin Teichmann, Michael Weber, Marius Zoellner, Roberto Cipolla, Raquel Urtasun
Abstract While most approaches to semantic reasoning have focused on improving performance, in this paper we argue that computational times are very important in order to enable real time applications such as autonomous driving. Towards this goal, we present an approach to joint classification, detection and semantic segmentation via a unified architecture where the encoder is shared amongst the three tasks. Our approach is very simple, can be trained end-to-end and performs extremely well in the challenging KITTI dataset, outperforming the state-of-the-art in the road segmentation task. Our approach is also very efficient, taking less than 100 ms to perform all tasks.
Tasks Autonomous Driving, Semantic Segmentation
Published 2016-12-22
URL http://arxiv.org/abs/1612.07695v2
PDF http://arxiv.org/pdf/1612.07695v2.pdf
PWC https://paperswithcode.com/paper/multinet-real-time-joint-semantic-reasoning
Repo https://github.com/kinglintianxia/MultiNet
Framework tf

Optimization Methods for Large-Scale Machine Learning

Title Optimization Methods for Large-Scale Machine Learning
Authors Léon Bottou, Frank E. Curtis, Jorge Nocedal
Abstract This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations.
Tasks Text Classification
Published 2016-06-15
URL http://arxiv.org/abs/1606.04838v3
PDF http://arxiv.org/pdf/1606.04838v3.pdf
PWC https://paperswithcode.com/paper/optimization-methods-for-large-scale-machine
Repo https://github.com/stephenbeckr/AIMS
Framework none

Unsupervised Cross-Domain Image Generation

Title Unsupervised Cross-Domain Image Generation
Authors Yaniv Taigman, Adam Polyak, Lior Wolf
Abstract We study the problem of transferring a sample in one domain to an analog sample in another domain. Given two related domains, S and T, we would like to learn a generative function G that maps an input sample from S to the domain T, such that the output of a given function f, which accepts inputs in either domains, would remain unchanged. Other than the function f, the training data is unsupervised and consist of a set of samples from each domain. The Domain Transfer Network (DTN) we present employs a compound loss function that includes a multiclass GAN loss, an f-constancy component, and a regularizing component that encourages G to map samples from T to themselves. We apply our method to visual domains including digits and face images and demonstrate its ability to generate convincing novel images of previously unseen entities, while preserving their identity.
Tasks Domain Adaptation, Image Generation, Image-to-Image Translation, Unsupervised Image-To-Image Translation
Published 2016-11-07
URL http://arxiv.org/abs/1611.02200v1
PDF http://arxiv.org/pdf/1611.02200v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-cross-domain-image-generation
Repo https://github.com/kaonashi-tyc/zi2zi
Framework tf

Beam Search for Learning a Deep Convolutional Neural Network of 3D Shapes

Title Beam Search for Learning a Deep Convolutional Neural Network of 3D Shapes
Authors Xu Xu, Sinisa Todorovic
Abstract This paper addresses 3D shape recognition. Recent work typically represents a 3D shape as a set of binary variables corresponding to 3D voxels of a uniform 3D grid centered on the shape, and resorts to deep convolutional neural networks(CNNs) for modeling these binary variables. Robust learning of such CNNs is currently limited by the small datasets of 3D shapes available, an order of magnitude smaller than other common datasets in computer vision. Related work typically deals with the small training datasets using a number of ad hoc, hand-tuning strategies. To address this issue, we formulate CNN learning as a beam search aimed at identifying an optimal CNN architecture, namely, the number of layers, nodes, and their connectivity in the network, as well as estimating parameters of such an optimal CNN. Each state of the beam search corresponds to a candidate CNN. Two types of actions are defined to add new convolutional filters or new convolutional layers to a parent CNN, and thus transition to children states. The utility function of each action is efficiently computed by transferring parameter values of the parent CNN to its children, thereby enabling an efficient beam search. Our experimental evaluation on the 3D ModelNet dataset demonstrates that our model pursuit using the beam search yields a CNN with superior performance on 3D shape classification than the state of the art.
Tasks 3D Shape Recognition
Published 2016-12-14
URL http://arxiv.org/abs/1612.04774v1
PDF http://arxiv.org/pdf/1612.04774v1.pdf
PWC https://paperswithcode.com/paper/beam-search-for-learning-a-deep-convolutional
Repo https://github.com/xuxucmkox/3D-shape-Classification-Beam-Search
Framework none

Recurrent Neural Network Grammars

Title Recurrent Neural Network Grammars
Authors Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, Noah A. Smith
Abstract We introduce recurrent neural network grammars, probabilistic models of sentences with explicit phrase structure. We explain efficient inference procedures that allow application to both parsing and language modeling. Experiments show that they provide better parsing in English than any single previously published supervised generative model and better language modeling than state-of-the-art sequential RNNs in English and Chinese.
Tasks Constituency Parsing, Language Modelling
Published 2016-02-25
URL http://arxiv.org/abs/1602.07776v4
PDF http://arxiv.org/pdf/1602.07776v4.pdf
PWC https://paperswithcode.com/paper/recurrent-neural-network-grammars
Repo https://github.com/Psarpei/Recognition-of-logical-document-structures
Framework none

Judging a Book By its Cover

Title Judging a Book By its Cover
Authors Brian Kenji Iwana, Syed Tahseen Raza Rizvi, Sheraz Ahmed, Andreas Dengel, Seiichi Uchida
Abstract Book covers communicate information to potential readers, but can that same information be learned by computers? We propose using a deep Convolutional Neural Network (CNN) to predict the genre of a book based on the visual clues provided by its cover. The purpose of this research is to investigate whether relationships between books and their covers can be learned. However, determining the genre of a book is a difficult task because covers can be ambiguous and genres can be overarching. Despite this, we show that a CNN can extract features and learn underlying design rules set by the designer to define a genre. Using machine learning, we can bring the large amount of resources available to the book cover design process. In addition, we present a new challenging dataset that can be used for many pattern recognition tasks.
Tasks
Published 2016-10-28
URL http://arxiv.org/abs/1610.09204v3
PDF http://arxiv.org/pdf/1610.09204v3.pdf
PWC https://paperswithcode.com/paper/judging-a-book-by-its-cover
Repo https://github.com/uchidalab/book-dataset
Framework none

SPICE: Semantic Propositional Image Caption Evaluation

Title SPICE: Semantic Propositional Image Caption Evaluation
Authors Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould
Abstract There is considerable interest in the task of automatically generating image captions. However, evaluation is challenging. Existing automatic evaluation metrics are primarily sensitive to n-gram overlap, which is neither necessary nor sufficient for the task of simulating human judgment. We hypothesize that semantic propositional content is an important component of human caption evaluation, and propose a new automated caption evaluation metric defined over scene graphs coined SPICE. Extensive evaluations across a range of models and datasets indicate that SPICE captures human judgments over model-generated captions better than other automatic metrics (e.g., system-level correlation of 0.88 with human judgments on the MS COCO dataset, versus 0.43 for CIDEr and 0.53 for METEOR). Furthermore, SPICE can answer questions such as which caption-generator best understands colors?' and can caption-generators count?’
Tasks Image Captioning
Published 2016-07-29
URL http://arxiv.org/abs/1607.08822v1
PDF http://arxiv.org/pdf/1607.08822v1.pdf
PWC https://paperswithcode.com/paper/spice-semantic-propositional-image-caption
Repo https://github.com/mtanti/coco-caption
Framework none

Self-critical Sequence Training for Image Captioning

Title Self-critical Sequence Training for Image Captioning
Authors Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, Vaibhava Goel
Abstract Recently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep end-to-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, significant gains in performance can be realized. Our systems are built using a new optimization approach that we call self-critical sequence training (SCST). SCST is a form of the popular REINFORCE algorithm that, rather than estimating a “baseline” to normalize the rewards and reduce variance, utilizes the output of its own test-time inference algorithm to normalize the rewards it experiences. Using this approach, estimating the reward signal (as actor-critic methods must do) and estimating normalization (as REINFORCE algorithms typically do) is avoided, while at the same time harmonizing the model with respect to its test-time inference procedure. Empirically we find that directly optimizing the CIDEr metric with SCST and greedy decoding at test-time is highly effective. Our results on the MSCOCO evaluation sever establish a new state-of-the-art on the task, improving the best result in terms of CIDEr from 104.9 to 114.7.
Tasks Image Captioning, Policy Gradient Methods
Published 2016-12-02
URL http://arxiv.org/abs/1612.00563v2
PDF http://arxiv.org/pdf/1612.00563v2.pdf
PWC https://paperswithcode.com/paper/self-critical-sequence-training-for-image
Repo https://github.com/ruotianluo/self-critical.pytorch
Framework pytorch

Algebraic multigrid support vector machines

Title Algebraic multigrid support vector machines
Authors Ehsan Sadrfaridpour, Sandeep Jeereddy, Ken Kennedy, Andre Luckow, Talayeh Razzaghi, Ilya Safro
Abstract The support vector machine is a flexible optimization-based technique widely used for classification problems. In practice, its training part becomes computationally expensive on large-scale data sets because of such reasons as the complexity and number of iterations in parameter fitting methods, underlying optimization solvers, and nonlinearity of kernels. We introduce a fast multilevel framework for solving support vector machine models that is inspired by the algebraic multigrid. Significant improvement in the running has been achieved without any loss in the quality. The proposed technique is highly beneficial on imbalanced sets. We demonstrate computational results on publicly available and industrial data sets.
Tasks
Published 2016-11-16
URL http://arxiv.org/abs/1611.05487v2
PDF http://arxiv.org/pdf/1611.05487v2.pdf
PWC https://paperswithcode.com/paper/algebraic-multigrid-support-vector-machines
Repo https://github.com/esadr/mlsvm
Framework none

ConceptNet 5.5: An Open Multilingual Graph of General Knowledge

Title ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
Authors Robyn Speer, Joshua Chin, Catherine Havasi
Abstract Machine learning about language can be improved by supplying it with specific knowledge and sources of external information. We present here a new version of the linked open data resource ConceptNet that is particularly well suited to be used with modern NLP techniques such as word embeddings. ConceptNet is a knowledge graph that connects words and phrases of natural language with labeled edges. Its knowledge is collected from many sources that include expert-created resources, crowd-sourcing, and games with a purpose. It is designed to represent the general knowledge involved in understanding language, improving natural language applications by allowing the application to better understand the meanings behind the words people use. When ConceptNet is combined with word embeddings acquired from distributional semantics (such as word2vec), it provides applications with understanding that they would not acquire from distributional semantics alone, nor from narrower resources such as WordNet or DBPedia. We demonstrate this with state-of-the-art results on intrinsic evaluations of word relatedness that translate into improvements on applications of word vectors, including solving SAT-style analogies.
Tasks Word Embeddings
Published 2016-12-12
URL http://arxiv.org/abs/1612.03975v2
PDF http://arxiv.org/pdf/1612.03975v2.pdf
PWC https://paperswithcode.com/paper/conceptnet-55-an-open-multilingual-graph-of
Repo https://github.com/shayanray/ApplyingCommonSense
Framework none

3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation

Title 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation
Authors Özgün Çiçek, Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox, Olaf Ronneberger
Abstract This paper introduces a network for volumetric segmentation that learns from sparsely annotated volumetric images. We outline two attractive use cases of this method: (1) In a semi-automated setup, the user annotates some slices in the volume to be segmented. The network learns from these sparse annotations and provides a dense 3D segmentation. (2) In a fully-automated setup, we assume that a representative, sparsely annotated training set exists. Trained on this data set, the network densely segments new volumetric images. The proposed network extends the previous u-net architecture from Ronneberger et al. by replacing all 2D operations with their 3D counterparts. The implementation performs on-the-fly elastic deformations for efficient data augmentation during training. It is trained end-to-end from scratch, i.e., no pre-trained network is required. We test the performance of the proposed method on a complex, highly variable 3D structure, the Xenopus kidney, and achieve good results for both use cases.
Tasks Data Augmentation
Published 2016-06-21
URL http://arxiv.org/abs/1606.06650v1
PDF http://arxiv.org/pdf/1606.06650v1.pdf
PWC https://paperswithcode.com/paper/3d-u-net-learning-dense-volumetric
Repo https://github.com/gvtulder/elasticdeform
Framework tf

Distribution-Free Predictive Inference For Regression

Title Distribution-Free Predictive Inference For Regression
Authors Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J. Tibshirani, Larry Wasserman
Abstract We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guaranteeing finite-sample marginal coverage even when these assumptions do not hold. We analyze and compare, both empirically and theoretically, the two major variants of our conformal framework: full conformal inference and split conformal inference, along with a related jackknife method. These methods offer different tradeoffs between statistical accuracy (length of resulting prediction intervals) and computational efficiency. As extensions, we develop a method for constructing valid in-sample prediction intervals called {\it rank-one-out} conformal inference, which has essentially the same computational efficiency as split conformal inference. We also describe an extension of our procedures for producing prediction bands with locally varying length, in order to adapt to heteroskedascity in the data. Finally, we propose a model-free notion of variable importance, called {\it leave-one-covariate-out} or LOCO inference. Accompanying this paper is an R package {\tt conformalInference} that implements all of the proposals we have introduced. In the spirit of reproducibility, all of our empirical results can also be easily (re)generated using this package.
Tasks
Published 2016-04-14
URL http://arxiv.org/abs/1604.04173v2
PDF http://arxiv.org/pdf/1604.04173v2.pdf
PWC https://paperswithcode.com/paper/distribution-free-predictive-inference-for
Repo https://github.com/DEck13/conformal.glm
Framework none

Towards Deep Symbolic Reinforcement Learning

Title Towards Deep Symbolic Reinforcement Learning
Authors Marta Garnelo, Kai Arulkumaran, Murray Shanahan
Abstract Deep reinforcement learning (DRL) brings the power of deep neural networks to bear on the generic task of trial-and-error learning, and its effectiveness has been convincingly demonstrated on tasks such as Atari video games and the game of Go. However, contemporary DRL systems inherit a number of shortcomings from the current generation of deep learning techniques. For example, they require very large datasets to work effectively, entailing that they are slow to learn even when such datasets are available. Moreover, they lack the ability to reason on an abstract level, which makes it difficult to implement high-level cognitive functions such as transfer learning, analogical reasoning, and hypothesis-based reasoning. Finally, their operation is largely opaque to humans, rendering them unsuitable for domains in which verifiability is important. In this paper, we propose an end-to-end reinforcement learning architecture comprising a neural back end and a symbolic front end with the potential to overcome each of these shortcomings. As proof-of-concept, we present a preliminary implementation of the architecture and apply it to several variants of a simple video game. We show that the resulting system – though just a prototype – learns effectively, and, by acquiring a set of symbolic rules that are easily comprehensible to humans, dramatically outperforms a conventional, fully neural DRL system on a stochastic variant of the game.
Tasks Game of Go, Transfer Learning
Published 2016-09-18
URL http://arxiv.org/abs/1609.05518v2
PDF http://arxiv.org/pdf/1609.05518v2.pdf
PWC https://paperswithcode.com/paper/towards-deep-symbolic-reinforcement-learning
Repo https://github.com/epignatelli/literature
Framework none

OpenAI Gym

Title OpenAI Gym
Authors Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, Wojciech Zaremba
Abstract OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software.
Tasks
Published 2016-06-05
URL http://arxiv.org/abs/1606.01540v1
PDF http://arxiv.org/pdf/1606.01540v1.pdf
PWC https://paperswithcode.com/paper/openai-gym
Repo https://github.com/nohboogy/gym
Framework tf
comments powered by Disqus