May 7, 2019

2720 words 13 mins read

Paper Group AWR 41

Paper Group AWR 41

An overview of gradient descent optimization algorithms. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. Convolutional Sketch Inversion. GuessWhat?! Visual object discovery through multi-modal dialogue. VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering. Dialogue Learni …

An overview of gradient descent optimization algorithms

Title An overview of gradient descent optimization algorithms
Authors Sebastian Ruder
Abstract Gradient descent optimization algorithms, while increasingly popular, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by. This article aims to provide the reader with intuitions with regard to the behaviour of different algorithms that will allow her to put them to use. In the course of this overview, we look at different variants of gradient descent, summarize challenges, introduce the most common optimization algorithms, review architectures in a parallel and distributed setting, and investigate additional strategies for optimizing gradient descent.
Tasks
Published 2016-09-15
URL http://arxiv.org/abs/1609.04747v2
PDF http://arxiv.org/pdf/1609.04747v2.pdf
PWC https://paperswithcode.com/paper/an-overview-of-gradient-descent-optimization
Repo https://github.com/congcui2007/AWS
Framework tf

Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning

Title Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning
Authors Guillaume Lemaitre, Fernando Nogueira, Christos K. Aridas
Abstract Imbalanced-learn is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern recognition. The implemented state-of-the-art methods can be categorized into 4 groups: (i) under-sampling, (ii) over-sampling, (iii) combination of over- and under-sampling, and (iv) ensemble learning methods. The proposed toolbox only depends on numpy, scipy, and scikit-learn and is distributed under MIT license. Furthermore, it is fully compatible with scikit-learn and is part of the scikit-learn-contrib supported project. Documentation, unit tests as well as integration tests are provided to ease usage and contribution. The toolbox is publicly available in GitHub: https://github.com/scikit-learn-contrib/imbalanced-learn.
Tasks
Published 2016-09-21
URL http://arxiv.org/abs/1609.06570v1
PDF http://arxiv.org/pdf/1609.06570v1.pdf
PWC https://paperswithcode.com/paper/imbalanced-learn-a-python-toolbox-to-tackle
Repo https://github.com/scikit-learn-contrib/imbalanced-learn
Framework tf

Convolutional Sketch Inversion

Title Convolutional Sketch Inversion
Authors Yağmur Güçlütürk, Umut Güçlü, Rob van Lier, Marcel A. J. van Gerven
Abstract In this paper, we use deep neural networks for inverting face sketches to synthesize photorealistic face images. We first construct a semi-simulated dataset containing a very large number of computer-generated face sketches with different styles and corresponding face images by expanding existing unconstrained face data sets. We then train models achieving state-of-the-art results on both computer-generated sketches and hand-drawn sketches by leveraging recent advances in deep learning such as batch normalization, deep residual learning, perceptual losses and stochastic optimization in combination with our new dataset. We finally demonstrate potential applications of our models in fine arts and forensic arts. In contrast to existing patch-based approaches, our deep-neural-network-based approach can be used for synthesizing photorealistic face images by inverting face sketches in the wild.
Tasks Stochastic Optimization
Published 2016-06-09
URL http://arxiv.org/abs/1606.03073v1
PDF http://arxiv.org/pdf/1606.03073v1.pdf
PWC https://paperswithcode.com/paper/convolutional-sketch-inversion
Repo https://github.com/saifvazir/Convolutional-Sketch-Inversion
Framework pytorch

GuessWhat?! Visual object discovery through multi-modal dialogue

Title GuessWhat?! Visual object discovery through multi-modal dialogue
Authors Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, Aaron Courville
Abstract We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images. We explain our design decisions in collecting the dataset and introduce the oracle and questioner tasks that are associated with the two players of the game. We prototyped deep learning models to establish initial baselines of the introduced tasks.
Tasks
Published 2016-11-23
URL http://arxiv.org/abs/1611.08481v2
PDF http://arxiv.org/pdf/1611.08481v2.pdf
PWC https://paperswithcode.com/paper/guesswhat-visual-object-discovery-through
Repo https://github.com/ibrahimSouleiman/GuessWhat
Framework tf

VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering

Title VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering
Authors Marc Bolaños, Álvaro Peris, Francisco Casacuberta, Petia Radeva
Abstract In this paper, we address the problem of visual question answering by proposing a novel model, called VIBIKNet. Our model is based on integrating Kernelized Convolutional Neural Networks and Long-Short Term Memory units to generate an answer given a question about an image. We prove that VIBIKNet is an optimal trade-off between accuracy and computational load, in terms of memory and time consumption. We validate our method on the VQA challenge dataset and compare it to the top performing methods in order to illustrate its performance and speed.
Tasks Question Answering, Visual Question Answering
Published 2016-12-12
URL http://arxiv.org/abs/1612.03628v1
PDF http://arxiv.org/pdf/1612.03628v1.pdf
PWC https://paperswithcode.com/paper/vibiknet-visual-bidirectional-kernelized
Repo https://github.com/MarcBS/VIBIKNet
Framework none

Dialogue Learning With Human-In-The-Loop

Title Dialogue Learning With Human-In-The-Loop
Authors Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc’Aurelio Ranzato, Jason Weston
Abstract An important aspect of developing conversational agents is to give a bot the ability to improve through communicating with humans and to learn from the mistakes that it makes. Most research has focused on learning from fixed training sets of labeled data rather than interacting with a dialogue partner in an online fashion. In this paper we explore this direction in a reinforcement learning setting where the bot improves its question-answering ability from feedback a teacher gives following its generated responses. We build a simulator that tests various aspects of such learning in a synthetic environment, and introduce models that work in this regime. Finally, real experiments with Mechanical Turk validate the approach.
Tasks Question Answering
Published 2016-11-29
URL http://arxiv.org/abs/1611.09823v3
PDF http://arxiv.org/pdf/1611.09823v3.pdf
PWC https://paperswithcode.com/paper/dialogue-learning-with-human-in-the-loop
Repo https://github.com/rohit129/Movie_KnowledgeGraph_QA
Framework none

Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks

Title Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks
Authors Alex J. Champandard
Abstract Convolutional neural networks (CNNs) have proven highly effective at image synthesis and style transfer. For most users, however, using them as tools can be a challenging task due to their unpredictable behavior that goes against common intuitions. This paper introduces a novel concept to augment such generative architectures with semantic annotations, either by manually authoring pixel labels or using existing solutions for semantic segmentation. The result is a content-aware generative algorithm that offers meaningful control over the outcome. Thus, we increase the quality of images generated by avoiding common glitches, make the results look significantly more plausible, and extend the functional range of these algorithms—whether for portraits or landscapes, etc. Applications include semantic style transfer and turning doodles with few colors into masterful paintings!
Tasks Image Generation, Semantic Segmentation, Style Transfer
Published 2016-03-05
URL http://arxiv.org/abs/1603.01768v1
PDF http://arxiv.org/pdf/1603.01768v1.pdf
PWC https://paperswithcode.com/paper/semantic-style-transfer-and-turning-two-bit
Repo https://github.com/paulwarkentin/pytorch-neural-doodle
Framework pytorch

Learning Thermodynamics with Boltzmann Machines

Title Learning Thermodynamics with Boltzmann Machines
Authors Giacomo Torlai, Roger G. Melko
Abstract A Boltzmann machine is a stochastic neural network that has been extensively used in the layers of deep architectures for modern machine learning applications. In this paper, we develop a Boltzmann machine that is capable of modelling thermodynamic observables for physical systems in thermal equilibrium. Through unsupervised learning, we train the Boltzmann machine on data sets constructed with spin configurations importance-sampled from the partition function of an Ising Hamiltonian at different temperatures using Monte Carlo (MC) methods. The trained Boltzmann machine is then used to generate spin states, for which we compare thermodynamic observables to those computed by direct MC sampling. We demonstrate that the Boltzmann machine can faithfully reproduce the observables of the physical system. Further, we observe that the number of neurons required to obtain accurate results increases as the system is brought close to criticality.
Tasks
Published 2016-06-08
URL http://arxiv.org/abs/1606.02718v1
PDF http://arxiv.org/pdf/1606.02718v1.pdf
PWC https://paperswithcode.com/paper/learning-thermodynamics-with-boltzmann
Repo https://github.com/loppy1243/IsingBoltzmann
Framework none

Multi-label Methods for Prediction with Sequential Data

Title Multi-label Methods for Prediction with Sequential Data
Authors Jesse Read, Luca Martino, Jaakko Hollmén
Abstract The number of methods available for classification of multi-label data has increased rapidly over recent years, yet relatively few links have been made with the related task of classification of sequential data. If labels indices are considered as time indices, the problems can often be seen as equivalent. In this paper we detect and elaborate on connections between multi-label methods and Markovian models, and study the suitability of multi-label methods for prediction in sequential data. From this study we draw upon the most suitable techniques from the area and develop two novel competitive approaches which can be applied to either kind of data. We carry out an empirical evaluation investigating performance on real-world sequential-prediction tasks: electricity demand, and route prediction. As well as showing that several popular multi-label algorithms are in fact easily applicable to sequencing tasks, our novel approaches, which benefit from a unified view of these areas, prove very competitive against established methods.
Tasks
Published 2016-09-27
URL http://arxiv.org/abs/1609.08349v2
PDF http://arxiv.org/pdf/1609.08349v2.pdf
PWC https://paperswithcode.com/paper/multi-label-methods-for-prediction-with
Repo https://github.com/abimur-123/Canvass_codingchallenge
Framework none

Predicting Ground-Level Scene Layout from Aerial Imagery

Title Predicting Ground-Level Scene Layout from Aerial Imagery
Authors Menghua Zhai, Zachary Bessinger, Scott Workman, Nathan Jacobs
Abstract We introduce a novel strategy for learning to extract semantically meaningful features from aerial imagery. Instead of manually labeling the aerial imagery, we propose to predict (noisy) semantic features automatically extracted from co-located ground imagery. Our network architecture takes an aerial image as input, extracts features using a convolutional neural network, and then applies an adaptive transformation to map these features into the ground-level perspective. We use an end-to-end learning approach to minimize the difference between the semantic segmentation extracted directly from the ground image and the semantic segmentation predicted solely based on the aerial image. We show that a model learned using this strategy, with no additional training, is already capable of rough semantic labeling of aerial imagery. Furthermore, we demonstrate that by finetuning this model we can achieve more accurate semantic segmentation than two baseline initialization strategies. We use our network to address the task of estimating the geolocation and geoorientation of a ground image. Finally, we show how features extracted from an aerial image can be used to hallucinate a plausible ground-level panorama.
Tasks Cross-View Image-to-Image Translation, Semantic Segmentation
Published 2016-12-08
URL http://arxiv.org/abs/1612.02709v1
PDF http://arxiv.org/pdf/1612.02709v1.pdf
PWC https://paperswithcode.com/paper/predicting-ground-level-scene-layout-from
Repo https://github.com/viibridges/crossnet
Framework tf

Revisiting Visual Question Answering Baselines

Title Revisiting Visual Question Answering Baselines
Authors Allan Jabri, Armand Joulin, Laurens van der Maaten
Abstract Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding. Many of the recently proposed VQA systems include attention or memory mechanisms designed to support “reasoning”. For multiple-choice VQA, nearly all of these systems train a multi-class classifier on image and question features to predict an answer. This paper questions the value of these common practices and develops a simple alternative model based on binary classification. Instead of treating answers as competing choices, our model receives the answer as input and predicts whether or not an image-question-answer triplet is correct. We evaluate our model on the Visual7W Telling and the VQA Real Multiple Choice tasks, and find that even simple versions of our model perform competitively. Our best model achieves state-of-the-art performance on the Visual7W Telling task and compares surprisingly well with the most complex systems proposed for the VQA Real Multiple Choice task. We explore variants of the model and study its transferability between both datasets. We also present an error analysis of our model that suggests a key problem of current VQA systems lies in the lack of visual grounding of concepts that occur in the questions and answers. Overall, our results suggest that the performance of current VQA systems is not significantly better than that of systems designed to exploit dataset biases.
Tasks Question Answering, Visual Question Answering
Published 2016-06-27
URL http://arxiv.org/abs/1606.08390v2
PDF http://arxiv.org/pdf/1606.08390v2.pdf
PWC https://paperswithcode.com/paper/revisiting-visual-question-answering
Repo https://github.com/Cold-Winter/vqs
Framework caffe2

Text Matching as Image Recognition

Title Text Matching as Image Recognition
Authors Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, Xueqi Cheng
Abstract Matching two texts is a fundamental problem in many natural language processing tasks. An effective way is to extract meaningful matching patterns from words, phrases, and sentences to produce the matching score. Inspired by the success of convolutional neural network in image recognition, where neurons can capture many complicated patterns based on the extracted elementary visual patterns such as oriented edges and corners, we propose to model text matching as the problem of image recognition. Firstly, a matching matrix whose entries represent the similarities between words is constructed and viewed as an image. Then a convolutional neural network is utilized to capture rich matching patterns in a layer-by-layer way. We show that by resembling the compositional hierarchies of patterns in image recognition, our model can successfully identify salient signals such as n-gram and n-term matchings. Experimental results demonstrate its superiority against the baselines.
Tasks Ad-Hoc Information Retrieval, Text Matching
Published 2016-02-20
URL http://arxiv.org/abs/1602.06359v1
PDF http://arxiv.org/pdf/1602.06359v1.pdf
PWC https://paperswithcode.com/paper/text-matching-as-image-recognition
Repo https://github.com/super-zhangchao/learning-to-match
Framework tf

SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series

Title SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series
Authors Chen Luo, Anshumali Shrivastava
Abstract Similarity search on time series is a frequent operation in large-scale data-driven applications. Sophisticated similarity measures are standard for time series matching, as they are usually misaligned. Dynamic Time Warping or DTW is the most widely used similarity measure for time series because it combines alignment and matching at the same time. However, the alignment makes DTW slow. To speed up the expensive similarity search with DTW, branch and bound based pruning strategies are adopted. However, branch and bound based pruning are only useful for very short queries (low dimensional time series), and the bounds are quite weak for longer queries. Due to the loose bounds branch and bound pruning strategy boils down to a brute-force search. To circumvent this issue, we design SSH (Sketch, Shingle, & Hashing), an efficient and approximate hashing scheme which is much faster than the state-of-the-art branch and bound searching technique: the UCR suite. SSH uses a novel combination of sketching, shingling and hashing techniques to produce (probabilistic) indexes which align (near perfectly) with DTW similarity measure. The generated indexes are then used to create hash buckets for sub-linear search. Our results show that SSH is very effective for longer time sequence and prunes around 95% candidates, leading to the massive speedup in search with DTW. Empirical results on two large-scale benchmark time series data show that our proposed method can be around 20 times faster than the state-of-the-art package (UCR suite) without any significant loss in accuracy.
Tasks Time Series
Published 2016-10-24
URL https://arxiv.org/abs/1610.07328v1
PDF https://arxiv.org/pdf/1610.07328v1.pdf
PWC https://paperswithcode.com/paper/ssh-sketch-shingle-hash-for-indexing-massive
Repo https://github.com/ktatarnikov/time-series
Framework none

Limbo: A Fast and Flexible Library for Bayesian Optimization

Title Limbo: A Fast and Flexible Library for Bayesian Optimization
Authors Antoine Cully, Konstantinos Chatzilygeroudis, Federico Allocati, Jean-Baptiste Mouret
Abstract Limbo is an open-source C++11 library for Bayesian optimization which is designed to be both highly flexible and very fast. It can be used to optimize functions for which the gradient is unknown, evaluations are expensive, and runtime cost matters (e.g., on embedded systems or robots). Benchmarks on standard functions show that Limbo is about 2 times faster than BayesOpt (another C++ library) for a similar accuracy.
Tasks
Published 2016-11-22
URL http://arxiv.org/abs/1611.07343v1
PDF http://arxiv.org/pdf/1611.07343v1.pdf
PWC https://paperswithcode.com/paper/limbo-a-fast-and-flexible-library-for
Repo https://github.com/resibots/limbo
Framework none

Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision

Title Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision
Authors Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. Forbus, Ni Lao
Abstract Harnessing the statistical power of neural networks to perform language understanding and symbolic reasoning is difficult, when it requires executing efficient discrete operations against a large knowledge-base. In this work, we introduce a Neural Symbolic Machine, which contains (a) a neural “programmer”, i.e., a sequence-to-sequence model that maps language utterances to programs and utilizes a key-variable memory to handle compositionality (b) a symbolic “computer”, i.e., a Lisp interpreter that performs program execution, and helps find good programs by pruning the search space. We apply REINFORCE to directly optimize the task reward of this structured prediction problem. To train with weak supervision and improve the stability of REINFORCE, we augment it with an iterative maximum-likelihood training process. NSM outperforms the state-of-the-art on the WebQuestionsSP dataset when trained from question-answer pairs only, without requiring any feature engineering or domain-specific knowledge.
Tasks Feature Engineering, Structured Prediction
Published 2016-10-31
URL http://arxiv.org/abs/1611.00020v4
PDF http://arxiv.org/pdf/1611.00020v4.pdf
PWC https://paperswithcode.com/paper/neural-symbolic-machines-learning-semantic-1
Repo https://github.com/theSparta/neural-symbolic-machines
Framework tf
comments powered by Disqus