Paper Group AWR 15
Lens Distortion Rectification using Triangulation based Interpolation. MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving. Optimization Methods for Large-Scale Machine Learning. Unsupervised Cross-Domain Image Generation. Beam Search for Learning a Deep Convolutional Neural Network of 3D Shapes. Recurrent Neural Network Grammars. J …
Lens Distortion Rectification using Triangulation based Interpolation
Title | Lens Distortion Rectification using Triangulation based Interpolation |
Authors | Burak Benligiray, Cihan Topal |
Abstract | Nonlinear lens distortion rectification is a common first step in image processing applications where the assumption of a linear camera model is essential. For rectifying the lens distortion, forward distortion model needs to be known. However, many self-calibration methods estimate the inverse distortion model. In the literature, the inverse of the estimated model is approximated for image rectification, which introduces additional error to the system. We propose a novel distortion rectification method that uses the inverse distortion model directly. The method starts by mapping the distorted pixels to the rectified image using the inverse distortion model. The resulting set of points with subpixel locations are triangulated. The pixel values of the rectified image are linearly interpolated based on this triangulation. The method is applicable to all camera calibration methods that estimate the inverse distortion model and performs well across a large range of parameters. |
Tasks | Calibration |
Published | 2016-11-29 |
URL | http://arxiv.org/abs/1611.09559v2 |
http://arxiv.org/pdf/1611.09559v2.pdf | |
PWC | https://paperswithcode.com/paper/lens-distortion-rectification-using |
Repo | https://github.com/bbenligiray/lens-distortion-triangulation |
Framework | none |
MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving
Title | MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving |
Authors | Marvin Teichmann, Michael Weber, Marius Zoellner, Roberto Cipolla, Raquel Urtasun |
Abstract | While most approaches to semantic reasoning have focused on improving performance, in this paper we argue that computational times are very important in order to enable real time applications such as autonomous driving. Towards this goal, we present an approach to joint classification, detection and semantic segmentation via a unified architecture where the encoder is shared amongst the three tasks. Our approach is very simple, can be trained end-to-end and performs extremely well in the challenging KITTI dataset, outperforming the state-of-the-art in the road segmentation task. Our approach is also very efficient, taking less than 100 ms to perform all tasks. |
Tasks | Autonomous Driving, Semantic Segmentation |
Published | 2016-12-22 |
URL | http://arxiv.org/abs/1612.07695v2 |
http://arxiv.org/pdf/1612.07695v2.pdf | |
PWC | https://paperswithcode.com/paper/multinet-real-time-joint-semantic-reasoning |
Repo | https://github.com/kinglintianxia/MultiNet |
Framework | tf |
Optimization Methods for Large-Scale Machine Learning
Title | Optimization Methods for Large-Scale Machine Learning |
Authors | Léon Bottou, Frank E. Curtis, Jorge Nocedal |
Abstract | This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations. |
Tasks | Text Classification |
Published | 2016-06-15 |
URL | http://arxiv.org/abs/1606.04838v3 |
http://arxiv.org/pdf/1606.04838v3.pdf | |
PWC | https://paperswithcode.com/paper/optimization-methods-for-large-scale-machine |
Repo | https://github.com/stephenbeckr/AIMS |
Framework | none |
Unsupervised Cross-Domain Image Generation
Title | Unsupervised Cross-Domain Image Generation |
Authors | Yaniv Taigman, Adam Polyak, Lior Wolf |
Abstract | We study the problem of transferring a sample in one domain to an analog sample in another domain. Given two related domains, S and T, we would like to learn a generative function G that maps an input sample from S to the domain T, such that the output of a given function f, which accepts inputs in either domains, would remain unchanged. Other than the function f, the training data is unsupervised and consist of a set of samples from each domain. The Domain Transfer Network (DTN) we present employs a compound loss function that includes a multiclass GAN loss, an f-constancy component, and a regularizing component that encourages G to map samples from T to themselves. We apply our method to visual domains including digits and face images and demonstrate its ability to generate convincing novel images of previously unseen entities, while preserving their identity. |
Tasks | Domain Adaptation, Image Generation, Image-to-Image Translation, Unsupervised Image-To-Image Translation |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.02200v1 |
http://arxiv.org/pdf/1611.02200v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-cross-domain-image-generation |
Repo | https://github.com/kaonashi-tyc/zi2zi |
Framework | tf |
Beam Search for Learning a Deep Convolutional Neural Network of 3D Shapes
Title | Beam Search for Learning a Deep Convolutional Neural Network of 3D Shapes |
Authors | Xu Xu, Sinisa Todorovic |
Abstract | This paper addresses 3D shape recognition. Recent work typically represents a 3D shape as a set of binary variables corresponding to 3D voxels of a uniform 3D grid centered on the shape, and resorts to deep convolutional neural networks(CNNs) for modeling these binary variables. Robust learning of such CNNs is currently limited by the small datasets of 3D shapes available, an order of magnitude smaller than other common datasets in computer vision. Related work typically deals with the small training datasets using a number of ad hoc, hand-tuning strategies. To address this issue, we formulate CNN learning as a beam search aimed at identifying an optimal CNN architecture, namely, the number of layers, nodes, and their connectivity in the network, as well as estimating parameters of such an optimal CNN. Each state of the beam search corresponds to a candidate CNN. Two types of actions are defined to add new convolutional filters or new convolutional layers to a parent CNN, and thus transition to children states. The utility function of each action is efficiently computed by transferring parameter values of the parent CNN to its children, thereby enabling an efficient beam search. Our experimental evaluation on the 3D ModelNet dataset demonstrates that our model pursuit using the beam search yields a CNN with superior performance on 3D shape classification than the state of the art. |
Tasks | 3D Shape Recognition |
Published | 2016-12-14 |
URL | http://arxiv.org/abs/1612.04774v1 |
http://arxiv.org/pdf/1612.04774v1.pdf | |
PWC | https://paperswithcode.com/paper/beam-search-for-learning-a-deep-convolutional |
Repo | https://github.com/xuxucmkox/3D-shape-Classification-Beam-Search |
Framework | none |
Recurrent Neural Network Grammars
Title | Recurrent Neural Network Grammars |
Authors | Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, Noah A. Smith |
Abstract | We introduce recurrent neural network grammars, probabilistic models of sentences with explicit phrase structure. We explain efficient inference procedures that allow application to both parsing and language modeling. Experiments show that they provide better parsing in English than any single previously published supervised generative model and better language modeling than state-of-the-art sequential RNNs in English and Chinese. |
Tasks | Constituency Parsing, Language Modelling |
Published | 2016-02-25 |
URL | http://arxiv.org/abs/1602.07776v4 |
http://arxiv.org/pdf/1602.07776v4.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-neural-network-grammars |
Repo | https://github.com/Psarpei/Recognition-of-logical-document-structures |
Framework | none |
Judging a Book By its Cover
Title | Judging a Book By its Cover |
Authors | Brian Kenji Iwana, Syed Tahseen Raza Rizvi, Sheraz Ahmed, Andreas Dengel, Seiichi Uchida |
Abstract | Book covers communicate information to potential readers, but can that same information be learned by computers? We propose using a deep Convolutional Neural Network (CNN) to predict the genre of a book based on the visual clues provided by its cover. The purpose of this research is to investigate whether relationships between books and their covers can be learned. However, determining the genre of a book is a difficult task because covers can be ambiguous and genres can be overarching. Despite this, we show that a CNN can extract features and learn underlying design rules set by the designer to define a genre. Using machine learning, we can bring the large amount of resources available to the book cover design process. In addition, we present a new challenging dataset that can be used for many pattern recognition tasks. |
Tasks | |
Published | 2016-10-28 |
URL | http://arxiv.org/abs/1610.09204v3 |
http://arxiv.org/pdf/1610.09204v3.pdf | |
PWC | https://paperswithcode.com/paper/judging-a-book-by-its-cover |
Repo | https://github.com/uchidalab/book-dataset |
Framework | none |
SPICE: Semantic Propositional Image Caption Evaluation
Title | SPICE: Semantic Propositional Image Caption Evaluation |
Authors | Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould |
Abstract | There is considerable interest in the task of automatically generating image captions. However, evaluation is challenging. Existing automatic evaluation metrics are primarily sensitive to n-gram overlap, which is neither necessary nor sufficient for the task of simulating human judgment. We hypothesize that semantic propositional content is an important component of human caption evaluation, and propose a new automated caption evaluation metric defined over scene graphs coined SPICE. Extensive evaluations across a range of models and datasets indicate that SPICE captures human judgments over model-generated captions better than other automatic metrics (e.g., system-level correlation of 0.88 with human judgments on the MS COCO dataset, versus 0.43 for CIDEr and 0.53 for METEOR). Furthermore, SPICE can answer questions such as which caption-generator best understands colors?' and can caption-generators count?’ |
Tasks | Image Captioning |
Published | 2016-07-29 |
URL | http://arxiv.org/abs/1607.08822v1 |
http://arxiv.org/pdf/1607.08822v1.pdf | |
PWC | https://paperswithcode.com/paper/spice-semantic-propositional-image-caption |
Repo | https://github.com/mtanti/coco-caption |
Framework | none |
Self-critical Sequence Training for Image Captioning
Title | Self-critical Sequence Training for Image Captioning |
Authors | Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, Vaibhava Goel |
Abstract | Recently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep end-to-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, significant gains in performance can be realized. Our systems are built using a new optimization approach that we call self-critical sequence training (SCST). SCST is a form of the popular REINFORCE algorithm that, rather than estimating a “baseline” to normalize the rewards and reduce variance, utilizes the output of its own test-time inference algorithm to normalize the rewards it experiences. Using this approach, estimating the reward signal (as actor-critic methods must do) and estimating normalization (as REINFORCE algorithms typically do) is avoided, while at the same time harmonizing the model with respect to its test-time inference procedure. Empirically we find that directly optimizing the CIDEr metric with SCST and greedy decoding at test-time is highly effective. Our results on the MSCOCO evaluation sever establish a new state-of-the-art on the task, improving the best result in terms of CIDEr from 104.9 to 114.7. |
Tasks | Image Captioning, Policy Gradient Methods |
Published | 2016-12-02 |
URL | http://arxiv.org/abs/1612.00563v2 |
http://arxiv.org/pdf/1612.00563v2.pdf | |
PWC | https://paperswithcode.com/paper/self-critical-sequence-training-for-image |
Repo | https://github.com/ruotianluo/self-critical.pytorch |
Framework | pytorch |
Algebraic multigrid support vector machines
Title | Algebraic multigrid support vector machines |
Authors | Ehsan Sadrfaridpour, Sandeep Jeereddy, Ken Kennedy, Andre Luckow, Talayeh Razzaghi, Ilya Safro |
Abstract | The support vector machine is a flexible optimization-based technique widely used for classification problems. In practice, its training part becomes computationally expensive on large-scale data sets because of such reasons as the complexity and number of iterations in parameter fitting methods, underlying optimization solvers, and nonlinearity of kernels. We introduce a fast multilevel framework for solving support vector machine models that is inspired by the algebraic multigrid. Significant improvement in the running has been achieved without any loss in the quality. The proposed technique is highly beneficial on imbalanced sets. We demonstrate computational results on publicly available and industrial data sets. |
Tasks | |
Published | 2016-11-16 |
URL | http://arxiv.org/abs/1611.05487v2 |
http://arxiv.org/pdf/1611.05487v2.pdf | |
PWC | https://paperswithcode.com/paper/algebraic-multigrid-support-vector-machines |
Repo | https://github.com/esadr/mlsvm |
Framework | none |
ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
Title | ConceptNet 5.5: An Open Multilingual Graph of General Knowledge |
Authors | Robyn Speer, Joshua Chin, Catherine Havasi |
Abstract | Machine learning about language can be improved by supplying it with specific knowledge and sources of external information. We present here a new version of the linked open data resource ConceptNet that is particularly well suited to be used with modern NLP techniques such as word embeddings. ConceptNet is a knowledge graph that connects words and phrases of natural language with labeled edges. Its knowledge is collected from many sources that include expert-created resources, crowd-sourcing, and games with a purpose. It is designed to represent the general knowledge involved in understanding language, improving natural language applications by allowing the application to better understand the meanings behind the words people use. When ConceptNet is combined with word embeddings acquired from distributional semantics (such as word2vec), it provides applications with understanding that they would not acquire from distributional semantics alone, nor from narrower resources such as WordNet or DBPedia. We demonstrate this with state-of-the-art results on intrinsic evaluations of word relatedness that translate into improvements on applications of word vectors, including solving SAT-style analogies. |
Tasks | Word Embeddings |
Published | 2016-12-12 |
URL | http://arxiv.org/abs/1612.03975v2 |
http://arxiv.org/pdf/1612.03975v2.pdf | |
PWC | https://paperswithcode.com/paper/conceptnet-55-an-open-multilingual-graph-of |
Repo | https://github.com/shayanray/ApplyingCommonSense |
Framework | none |
3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation
Title | 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation |
Authors | Özgün Çiçek, Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox, Olaf Ronneberger |
Abstract | This paper introduces a network for volumetric segmentation that learns from sparsely annotated volumetric images. We outline two attractive use cases of this method: (1) In a semi-automated setup, the user annotates some slices in the volume to be segmented. The network learns from these sparse annotations and provides a dense 3D segmentation. (2) In a fully-automated setup, we assume that a representative, sparsely annotated training set exists. Trained on this data set, the network densely segments new volumetric images. The proposed network extends the previous u-net architecture from Ronneberger et al. by replacing all 2D operations with their 3D counterparts. The implementation performs on-the-fly elastic deformations for efficient data augmentation during training. It is trained end-to-end from scratch, i.e., no pre-trained network is required. We test the performance of the proposed method on a complex, highly variable 3D structure, the Xenopus kidney, and achieve good results for both use cases. |
Tasks | Data Augmentation |
Published | 2016-06-21 |
URL | http://arxiv.org/abs/1606.06650v1 |
http://arxiv.org/pdf/1606.06650v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-u-net-learning-dense-volumetric |
Repo | https://github.com/gvtulder/elasticdeform |
Framework | tf |
Distribution-Free Predictive Inference For Regression
Title | Distribution-Free Predictive Inference For Regression |
Authors | Jing Lei, Max G’Sell, Alessandro Rinaldo, Ryan J. Tibshirani, Larry Wasserman |
Abstract | We develop a general framework for distribution-free predictive inference in regression, using conformal inference. The proposed methodology allows for the construction of a prediction band for the response variable using any estimator of the regression function. The resulting prediction band preserves the consistency properties of the original estimator under standard assumptions, while guaranteeing finite-sample marginal coverage even when these assumptions do not hold. We analyze and compare, both empirically and theoretically, the two major variants of our conformal framework: full conformal inference and split conformal inference, along with a related jackknife method. These methods offer different tradeoffs between statistical accuracy (length of resulting prediction intervals) and computational efficiency. As extensions, we develop a method for constructing valid in-sample prediction intervals called {\it rank-one-out} conformal inference, which has essentially the same computational efficiency as split conformal inference. We also describe an extension of our procedures for producing prediction bands with locally varying length, in order to adapt to heteroskedascity in the data. Finally, we propose a model-free notion of variable importance, called {\it leave-one-covariate-out} or LOCO inference. Accompanying this paper is an R package {\tt conformalInference} that implements all of the proposals we have introduced. In the spirit of reproducibility, all of our empirical results can also be easily (re)generated using this package. |
Tasks | |
Published | 2016-04-14 |
URL | http://arxiv.org/abs/1604.04173v2 |
http://arxiv.org/pdf/1604.04173v2.pdf | |
PWC | https://paperswithcode.com/paper/distribution-free-predictive-inference-for |
Repo | https://github.com/DEck13/conformal.glm |
Framework | none |
Towards Deep Symbolic Reinforcement Learning
Title | Towards Deep Symbolic Reinforcement Learning |
Authors | Marta Garnelo, Kai Arulkumaran, Murray Shanahan |
Abstract | Deep reinforcement learning (DRL) brings the power of deep neural networks to bear on the generic task of trial-and-error learning, and its effectiveness has been convincingly demonstrated on tasks such as Atari video games and the game of Go. However, contemporary DRL systems inherit a number of shortcomings from the current generation of deep learning techniques. For example, they require very large datasets to work effectively, entailing that they are slow to learn even when such datasets are available. Moreover, they lack the ability to reason on an abstract level, which makes it difficult to implement high-level cognitive functions such as transfer learning, analogical reasoning, and hypothesis-based reasoning. Finally, their operation is largely opaque to humans, rendering them unsuitable for domains in which verifiability is important. In this paper, we propose an end-to-end reinforcement learning architecture comprising a neural back end and a symbolic front end with the potential to overcome each of these shortcomings. As proof-of-concept, we present a preliminary implementation of the architecture and apply it to several variants of a simple video game. We show that the resulting system – though just a prototype – learns effectively, and, by acquiring a set of symbolic rules that are easily comprehensible to humans, dramatically outperforms a conventional, fully neural DRL system on a stochastic variant of the game. |
Tasks | Game of Go, Transfer Learning |
Published | 2016-09-18 |
URL | http://arxiv.org/abs/1609.05518v2 |
http://arxiv.org/pdf/1609.05518v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-deep-symbolic-reinforcement-learning |
Repo | https://github.com/epignatelli/literature |
Framework | none |
OpenAI Gym
Title | OpenAI Gym |
Authors | Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, Wojciech Zaremba |
Abstract | OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software. |
Tasks | |
Published | 2016-06-05 |
URL | http://arxiv.org/abs/1606.01540v1 |
http://arxiv.org/pdf/1606.01540v1.pdf | |
PWC | https://paperswithcode.com/paper/openai-gym |
Repo | https://github.com/nohboogy/gym |
Framework | tf |