May 7, 2019

2850 words 14 mins read

Paper Group AWR 15

Lens Distortion Rectification using Triangulation based Interpolation. MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving. Optimization Methods for Large-Scale Machine Learning. Unsupervised Cross-Domain Image Generation. Beam Search for Learning a Deep Convolutional Neural Network of 3D Shapes. Recurrent Neural Network Grammars. J …

Lens Distortion Rectification using Triangulation based Interpolation


Title	Lens Distortion Rectification using Triangulation based Interpolation
Authors	Burak Benligiray, Cihan Topal
Abstract	Nonlinear lens distortion rectification is a common first step in image processing applications where the assumption of a linear camera model is essential. For rectifying the lens distortion, forward distortion model needs to be known. However, many self-calibration methods estimate the inverse distortion model. In the literature, the inverse of the estimated model is approximated for image rectification, which introduces additional error to the system. We propose a novel distortion rectification method that uses the inverse distortion model directly. The method starts by mapping the distorted pixels to the rectified image using the inverse distortion model. The resulting set of points with subpixel locations are triangulated. The pixel values of the rectified image are linearly interpolated based on this triangulation. The method is applicable to all camera calibration methods that estimate the inverse distortion model and performs well across a large range of parameters.
Tasks	Calibration
Published	2016-11-29
URL	http://arxiv.org/abs/1611.09559v2
PDF	http://arxiv.org/pdf/1611.09559v2.pdf
PWC	https://paperswithcode.com/paper/lens-distortion-rectification-using
Repo	https://github.com/bbenligiray/lens-distortion-triangulation
Framework	none

MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving


Title	MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving
Authors	Marvin Teichmann, Michael Weber, Marius Zoellner, Roberto Cipolla, Raquel Urtasun
Abstract	While most approaches to semantic reasoning have focused on improving performance, in this paper we argue that computational times are very important in order to enable real time applications such as autonomous driving. Towards this goal, we present an approach to joint classification, detection and semantic segmentation via a unified architecture where the encoder is shared amongst the three tasks. Our approach is very simple, can be trained end-to-end and performs extremely well in the challenging KITTI dataset, outperforming the state-of-the-art in the road segmentation task. Our approach is also very efficient, taking less than 100 ms to perform all tasks.
Tasks	Autonomous Driving, Semantic Segmentation
Published	2016-12-22
URL	http://arxiv.org/abs/1612.07695v2
PDF	http://arxiv.org/pdf/1612.07695v2.pdf
PWC	https://paperswithcode.com/paper/multinet-real-time-joint-semantic-reasoning
Repo	https://github.com/kinglintianxia/MultiNet
Framework	tf

Optimization Methods for Large-Scale Machine Learning


Title	Optimization Methods for Large-Scale Machine Learning
Authors	Léon Bottou, Frank E. Curtis, Jorge Nocedal
Abstract	This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations.
Tasks	Text Classification
Published	2016-06-15
URL	http://arxiv.org/abs/1606.04838v3
PDF	http://arxiv.org/pdf/1606.04838v3.pdf
PWC	https://paperswithcode.com/paper/optimization-methods-for-large-scale-machine
Repo	https://github.com/stephenbeckr/AIMS
Framework	none

Unsupervised Cross-Domain Image Generation


Title	Unsupervised Cross-Domain Image Generation
Authors	Yaniv Taigman, Adam Polyak, Lior Wolf
Abstract	We study the problem of transferring a sample in one domain to an analog sample in another domain. Given two related domains, S and T, we would like to learn a generative function G that maps an input sample from S to the domain T, such that the output of a given function f, which accepts inputs in either domains, would remain unchanged. Other than the function f, the training data is unsupervised and consist of a set of samples from each domain. The Domain Transfer Network (DTN) we present employs a compound loss function that includes a multiclass GAN loss, an f-constancy component, and a regularizing component that encourages G to map samples from T to themselves. We apply our method to visual domains including digits and face images and demonstrate its ability to generate convincing novel images of previously unseen entities, while preserving their identity.
Tasks	Domain Adaptation, Image Generation, Image-to-Image Translation, Unsupervised Image-To-Image Translation
Published	2016-11-07
URL	http://arxiv.org/abs/1611.02200v1
PDF	http://arxiv.org/pdf/1611.02200v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-cross-domain-image-generation
Repo	https://github.com/kaonashi-tyc/zi2zi
Framework	tf

Beam Search for Learning a Deep Convolutional Neural Network of 3D Shapes


Title	Beam Search for Learning a Deep Convolutional Neural Network of 3D Shapes
Authors	Xu Xu, Sinisa Todorovic
Abstract	This paper addresses 3D shape recognition. Recent work typically represents a 3D shape as a set of binary variables corresponding to 3D voxels of a uniform 3D grid centered on the shape, and resorts to deep convolutional neural networks(CNNs) for modeling these binary variables. Robust learning of such CNNs is currently limited by the small datasets of 3D shapes available, an order of magnitude smaller than other common datasets in computer vision. Related work typically deals with the small training datasets using a number of ad hoc, hand-tuning strategies. To address this issue, we formulate CNN learning as a beam search aimed at identifying an optimal CNN architecture, namely, the number of layers, nodes, and their connectivity in the network, as well as estimating parameters of such an optimal CNN. Each state of the beam search corresponds to a candidate CNN. Two types of actions are defined to add new convolutional filters or new convolutional layers to a parent CNN, and thus transition to children states. The utility function of each action is efficiently computed by transferring parameter values of the parent CNN to its children, thereby enabling an efficient beam search. Our experimental evaluation on the 3D ModelNet dataset demonstrates that our model pursuit using the beam search yields a CNN with superior performance on 3D shape classification than the state of the art.
Tasks	3D Shape Recognition
Published	2016-12-14
URL	http://arxiv.org/abs/1612.04774v1
PDF	http://arxiv.org/pdf/1612.04774v1.pdf
PWC	https://paperswithcode.com/paper/beam-search-for-learning-a-deep-convolutional
Repo	https://github.com/xuxucmkox/3D-shape-Classification-Beam-Search
Framework	none

Recurrent Neural Network Grammars


Title	Recurrent Neural Network Grammars
Authors	Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, Noah A. Smith
Abstract	We introduce recurrent neural network grammars, probabilistic models of sentences with explicit phrase structure. We explain efficient inference procedures that allow application to both parsing and language modeling. Experiments show that they provide better parsing in English than any single previously published supervised generative model and better language modeling than state-of-the-art sequential RNNs in English and Chinese.
Tasks	Constituency Parsing, Language Modelling
Published	2016-02-25
URL	http://arxiv.org/abs/1602.07776v4
PDF	http://arxiv.org/pdf/1602.07776v4.pdf
PWC	https://paperswithcode.com/paper/recurrent-neural-network-grammars
Repo	https://github.com/Psarpei/Recognition-of-logical-document-structures
Framework	none

Judging a Book By its Cover


Title	Judging a Book By its Cover
Authors	Brian Kenji Iwana, Syed Tahseen Raza Rizvi, Sheraz Ahmed, Andreas Dengel, Seiichi Uchida
Abstract	Book covers communicate information to potential readers, but can that same information be learned by computers? We propose using a deep Convolutional Neural Network (CNN) to predict the genre of a book based on the visual clues provided by its cover. The purpose of this research is to investigate whether relationships between books and their covers can be learned. However, determining the genre of a book is a difficult task because covers can be ambiguous and genres can be overarching. Despite this, we show that a CNN can extract features and learn underlying design rules set by the designer to define a genre. Using machine learning, we can bring the large amount of resources available to the book cover design process. In addition, we present a new challenging dataset that can be used for many pattern recognition tasks.
Tasks
Published	2016-10-28
URL	http://arxiv.org/abs/1610.09204v3
PDF	http://arxiv.org/pdf/1610.09204v3.pdf
PWC	https://paperswithcode.com/paper/judging-a-book-by-its-cover
Repo	https://github.com/uchidalab/book-dataset
Framework	none

SPICE: Semantic Propositional Image Caption Evaluation


Title	SPICE: Semantic Propositional Image Caption Evaluation
Authors	Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould
Abstract	There is considerable interest in the task of automatically generating image captions. However, evaluation is challenging. Existing automatic evaluation metrics are primarily sensitive to n-gram overlap, which is neither necessary nor sufficient for the task of simulating human judgment. We hypothesize that semantic propositional content is an important component of human caption evaluation, and propose a new automated caption evaluation metric defined over scene graphs coined SPICE. Extensive evaluations across a range of models and datasets indicate that SPICE captures human judgments over model-generated captions better than other automatic metrics (e.g., system-level correlation of 0.88 with human judgments on the MS COCO dataset, versus 0.43 for CIDEr and 0.53 for METEOR). Furthermore, SPICE can answer questions such as `which caption-generator best understands colors?' and` can caption-generators count?’
Tasks	Image Captioning
Published	2016-07-29
URL	http://arxiv.org/abs/1607.08822v1
PDF	http://arxiv.org/pdf/1607.08822v1.pdf
PWC	https://paperswithcode.com/paper/spice-semantic-propositional-image-caption
Repo	https://github.com/mtanti/coco-caption
Framework	none

Self-critical Sequence Training for Image Captioning


Title	Self-critical Sequence Training for Image Captioning
Authors	Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, Vaibhava Goel
Abstract	Recently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep end-to-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, significant gains in performance can be realized. Our systems are built using a new optimization approach that we call self-critical sequence training (SCST). SCST is a form of the popular REINFORCE algorithm that, rather than estimating a “baseline” to normalize the rewards and reduce variance, utilizes the output of its own test-time inference algorithm to normalize the rewards it experiences. Using this approach, estimating the reward signal (as actor-critic methods must do) and estimating normalization (as REINFORCE algorithms typically do) is avoided, while at the same time harmonizing the model with respect to its test-time inference procedure. Empirically we find that directly optimizing the CIDEr metric with SCST and greedy decoding at test-time is highly effective. Our results on the MSCOCO evaluation sever establish a new state-of-the-art on the task, improving the best result in terms of CIDEr from 104.9 to 114.7.
Tasks	Image Captioning, Policy Gradient Methods
Published	2016-12-02
URL	http://arxiv.org/abs/1612.00563v2
PDF	http://arxiv.org/pdf/1612.00563v2.pdf
PWC	https://paperswithcode.com/paper/self-critical-sequence-training-for-image
Repo	https://github.com/ruotianluo/self-critical.pytorch
Framework	pytorch

Algebraic multigrid support vector machines


Title	Algebraic multigrid support vector machines
Authors	Ehsan Sadrfaridpour, Sandeep Jeereddy, Ken Kennedy, Andre Luckow, Talayeh Razzaghi, Ilya Safro
Abstract	The support vector machine is a flexible optimization-based technique widely used for classification problems. In practice, its training part becomes computationally expensive on large-scale data sets because of such reasons as the complexity and number of iterations in parameter fitting methods, underlying optimization solvers, and nonlinearity of kernels. We introduce a fast multilevel framework for solving support vector machine models that is inspired by the algebraic multigrid. Significant improvement in the running has been achieved without any loss in the quality. The proposed technique is highly beneficial on imbalanced sets. We demonstrate computational results on publicly available and industrial data sets.
Tasks
Published	2016-11-16
URL	http://arxiv.org/abs/1611.05487v2
PDF	http://arxiv.org/pdf/1611.05487v2.pdf
PWC	https://paperswithcode.com/paper/algebraic-multigrid-support-vector-machines
Repo	https://github.com/esadr/mlsvm
Framework	none

ConceptNet 5.5: An Open Multilingual Graph of General Knowledge


Title	ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
Authors	Robyn Speer, Joshua Chin, Catherine Havasi
Abstract	Machine learning about language can be improved by supplying it with specific knowledge and sources of external information. We present here a new version of the linked open data resource ConceptNet that is particularly well suited to be used with modern NLP techniques such as word embeddings. ConceptNet is a knowledge graph that connects words and phrases of natural language with labeled edges. Its knowledge is collected from many sources that include expert-created resources, crowd-sourcing, and games with a purpose. It is designed to represent the general knowledge involved in understanding language, improving natural language applications by allowing the application to better understand the meanings behind the words people use. When ConceptNet is combined with word embeddings acquired from distributional semantics (such as word2vec), it provides applications with understanding that they would not acquire from distributional semantics alone, nor from narrower resources such as WordNet or DBPedia. We demonstrate this with state-of-the-art results on intrinsic evaluations of word relatedness that translate into improvements on applications of word vectors, including solving SAT-style analogies.
Tasks	Word Embeddings
Published	2016-12-12
URL	http://arxiv.org/abs/1612.03975v2
PDF	http://arxiv.org/pdf/1612.03975v2.pdf
PWC	https://paperswithcode.com/paper/conceptnet-55-an-open-multilingual-graph-of
Repo	https://github.com/shayanray/ApplyingCommonSense
Framework	none

3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation


Title	3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation
Authors	Özgün Çiçek, Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox, Olaf Ronneberger
Abstract	This paper introduces a network for volumetric segmentation that learns from sparsely annotated volumetric images. We outline two attractive use cases of this method: (1) In a semi-automated setup, the user annotates some slices in the volume to be segmented. The network learns from these sparse annotations and provides a dense 3D segmentation. (2) In a fully-automated setup, we assume that a representative, sparsely annotated training set exists. Trained on this data set, the network densely segments new volumetric images. The proposed network extends the previous u-net architecture from Ronneberger et al. by replacing all 2D operations with their 3D counterparts. The implementation performs on-the-fly elastic deformations for efficient data augmentation during training. It is trained end-to-end from scratch, i.e., no pre-trained network is required. We test the performance of the proposed method on a complex, highly variable 3D structure, the Xenopus kidney, and achieve good results for both use cases.
Tasks	Data Augmentation
Published	2016-06-21
URL	http://arxiv.org/abs/1606.06650v1
PDF	http://arxiv.org/pdf/1606.06650v1.pdf
PWC	https://paperswithcode.com/paper/3d-u-net-learning-dense-volumetric
Repo	https://github.com/gvtulder/elasticdeform
Framework	tf

Distribution-Free Predictive Inference For Regression


Title	Distribution-Free Predictive Inference For Regression
Authors	Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J. Tibshirani, Larry Wasserman
Abstract	We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guaranteeing finite-sample marginal coverage even when these assumptions do not hold. We analyze and compare, both empirically and theoretically, the two major variants of our conformal framework: full conformal inference and split conformal inference, along with a related jackknife method. These methods offer different tradeoffs between statistical accuracy (length of resulting prediction intervals) and computational efficiency. As extensions, we develop a method for constructing valid in-sample prediction intervals called {\it rank-one-out} conformal inference, which has essentially the same computational efficiency as split conformal inference. We also describe an extension of our procedures for producing prediction bands with locally varying length, in order to adapt to heteroskedascity in the data. Finally, we propose a model-free notion of variable importance, called {\it leave-one-covariate-out} or LOCO inference. Accompanying this paper is an R package {\tt conformalInference} that implements all of the proposals we have introduced. In the spirit of reproducibility, all of our empirical results can also be easily (re)generated using this package.
Tasks
Published	2016-04-14
URL	http://arxiv.org/abs/1604.04173v2
PDF	http://arxiv.org/pdf/1604.04173v2.pdf
PWC	https://paperswithcode.com/paper/distribution-free-predictive-inference-for
Repo	https://github.com/DEck13/conformal.glm
Framework	none

Towards Deep Symbolic Reinforcement Learning


Title	Towards Deep Symbolic Reinforcement Learning
Authors	Marta Garnelo, Kai Arulkumaran, Murray Shanahan
Abstract	Deep reinforcement learning (DRL) brings the power of deep neural networks to bear on the generic task of trial-and-error learning, and its effectiveness has been convincingly demonstrated on tasks such as Atari video games and the game of Go. However, contemporary DRL systems inherit a number of shortcomings from the current generation of deep learning techniques. For example, they require very large datasets to work effectively, entailing that they are slow to learn even when such datasets are available. Moreover, they lack the ability to reason on an abstract level, which makes it difficult to implement high-level cognitive functions such as transfer learning, analogical reasoning, and hypothesis-based reasoning. Finally, their operation is largely opaque to humans, rendering them unsuitable for domains in which verifiability is important. In this paper, we propose an end-to-end reinforcement learning architecture comprising a neural back end and a symbolic front end with the potential to overcome each of these shortcomings. As proof-of-concept, we present a preliminary implementation of the architecture and apply it to several variants of a simple video game. We show that the resulting system – though just a prototype – learns effectively, and, by acquiring a set of symbolic rules that are easily comprehensible to humans, dramatically outperforms a conventional, fully neural DRL system on a stochastic variant of the game.
Tasks	Game of Go, Transfer Learning
Published	2016-09-18
URL	http://arxiv.org/abs/1609.05518v2
PDF	http://arxiv.org/pdf/1609.05518v2.pdf
PWC	https://paperswithcode.com/paper/towards-deep-symbolic-reinforcement-learning
Repo	https://github.com/epignatelli/literature
Framework	none

OpenAI Gym


Title	OpenAI Gym
Authors	Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, Wojciech Zaremba
Abstract	OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software.
Tasks
Published	2016-06-05
URL	http://arxiv.org/abs/1606.01540v1
PDF	http://arxiv.org/pdf/1606.01540v1.pdf
PWC	https://paperswithcode.com/paper/openai-gym
Repo	https://github.com/nohboogy/gym
Framework	tf

Semantic Segmentation deep learning Domain Adaptation Game of Go reinforcement learning Word Embeddings Unsupervised Image-To-Image Translation Policy Gradient Methods Image Generation 3D Shape Recognition Text Classification Data Augmentation Autonomous Driving Transfer Learning Image-to-Image Translation machine learning Language Modelling dataset Image Captioning computer vision Constituency Parsing Calibration nlp