May 7, 2019

2720 words 13 mins read

Paper Group AWR 41

An overview of gradient descent optimization algorithms. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. Convolutional Sketch Inversion. GuessWhat?! Visual object discovery through multi-modal dialogue. VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering. Dialogue Learni …

An overview of gradient descent optimization algorithms


Title	An overview of gradient descent optimization algorithms
Authors	Sebastian Ruder
Abstract	Gradient descent optimization algorithms, while increasingly popular, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by. This article aims to provide the reader with intuitions with regard to the behaviour of different algorithms that will allow her to put them to use. In the course of this overview, we look at different variants of gradient descent, summarize challenges, introduce the most common optimization algorithms, review architectures in a parallel and distributed setting, and investigate additional strategies for optimizing gradient descent.
Tasks
Published	2016-09-15
URL	http://arxiv.org/abs/1609.04747v2
PDF	http://arxiv.org/pdf/1609.04747v2.pdf
PWC	https://paperswithcode.com/paper/an-overview-of-gradient-descent-optimization
Repo	https://github.com/congcui2007/AWS
Framework	tf

Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning


Title	Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning
Authors	Guillaume Lemaitre, Fernando Nogueira, Christos K. Aridas
Abstract	Imbalanced-learn is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern recognition. The implemented state-of-the-art methods can be categorized into 4 groups: (i) under-sampling, (ii) over-sampling, (iii) combination of over- and under-sampling, and (iv) ensemble learning methods. The proposed toolbox only depends on numpy, scipy, and scikit-learn and is distributed under MIT license. Furthermore, it is fully compatible with scikit-learn and is part of the scikit-learn-contrib supported project. Documentation, unit tests as well as integration tests are provided to ease usage and contribution. The toolbox is publicly available in GitHub: https://github.com/scikit-learn-contrib/imbalanced-learn.
Tasks
Published	2016-09-21
URL	http://arxiv.org/abs/1609.06570v1
PDF	http://arxiv.org/pdf/1609.06570v1.pdf
PWC	https://paperswithcode.com/paper/imbalanced-learn-a-python-toolbox-to-tackle
Repo	https://github.com/scikit-learn-contrib/imbalanced-learn
Framework	tf

Convolutional Sketch Inversion


Title	Convolutional Sketch Inversion
Authors	Yağmur Güçlütürk, Umut Güçlü, Rob van Lier, Marcel A. J. van Gerven
Abstract	In this paper, we use deep neural networks for inverting face sketches to synthesize photorealistic face images. We first construct a semi-simulated dataset containing a very large number of computer-generated face sketches with different styles and corresponding face images by expanding existing unconstrained face data sets. We then train models achieving state-of-the-art results on both computer-generated sketches and hand-drawn sketches by leveraging recent advances in deep learning such as batch normalization, deep residual learning, perceptual losses and stochastic optimization in combination with our new dataset. We finally demonstrate potential applications of our models in fine arts and forensic arts. In contrast to existing patch-based approaches, our deep-neural-network-based approach can be used for synthesizing photorealistic face images by inverting face sketches in the wild.
Tasks	Stochastic Optimization
Published	2016-06-09
URL	http://arxiv.org/abs/1606.03073v1
PDF	http://arxiv.org/pdf/1606.03073v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-sketch-inversion
Repo	https://github.com/saifvazir/Convolutional-Sketch-Inversion
Framework	pytorch


Title	GuessWhat?! Visual object discovery through multi-modal dialogue
Authors	Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, Aaron Courville
Abstract	We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images. We explain our design decisions in collecting the dataset and introduce the oracle and questioner tasks that are associated with the two players of the game. We prototyped deep learning models to establish initial baselines of the introduced tasks.
Tasks
Published	2016-11-23
URL	http://arxiv.org/abs/1611.08481v2
PDF	http://arxiv.org/pdf/1611.08481v2.pdf
PWC	https://paperswithcode.com/paper/guesswhat-visual-object-discovery-through
Repo	https://github.com/ibrahimSouleiman/GuessWhat
Framework	tf

VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering


Title	VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering
Authors	Marc Bolaños, Álvaro Peris, Francisco Casacuberta, Petia Radeva
Abstract	In this paper, we address the problem of visual question answering by proposing a novel model, called VIBIKNet. Our model is based on integrating Kernelized Convolutional Neural Networks and Long-Short Term Memory units to generate an answer given a question about an image. We prove that VIBIKNet is an optimal trade-off between accuracy and computational load, in terms of memory and time consumption. We validate our method on the VQA challenge dataset and compare it to the top performing methods in order to illustrate its performance and speed.
Tasks	Question Answering, Visual Question Answering
Published	2016-12-12
URL	http://arxiv.org/abs/1612.03628v1
PDF	http://arxiv.org/pdf/1612.03628v1.pdf
PWC	https://paperswithcode.com/paper/vibiknet-visual-bidirectional-kernelized
Repo	https://github.com/MarcBS/VIBIKNet
Framework	none

Dialogue Learning With Human-In-The-Loop


Title	Dialogue Learning With Human-In-The-Loop
Authors	Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc’Aurelio Ranzato, Jason Weston
Abstract	An important aspect of developing conversational agents is to give a bot the ability to improve through communicating with humans and to learn from the mistakes that it makes. Most research has focused on learning from fixed training sets of labeled data rather than interacting with a dialogue partner in an online fashion. In this paper we explore this direction in a reinforcement learning setting where the bot improves its question-answering ability from feedback a teacher gives following its generated responses. We build a simulator that tests various aspects of such learning in a synthetic environment, and introduce models that work in this regime. Finally, real experiments with Mechanical Turk validate the approach.
Tasks	Question Answering
Published	2016-11-29
URL	http://arxiv.org/abs/1611.09823v3
PDF	http://arxiv.org/pdf/1611.09823v3.pdf
PWC	https://paperswithcode.com/paper/dialogue-learning-with-human-in-the-loop
Repo	https://github.com/rohit129/Movie_KnowledgeGraph_QA
Framework	none

Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks


Title	Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks
Authors	Alex J. Champandard
Abstract	Convolutional neural networks (CNNs) have proven highly effective at image synthesis and style transfer. For most users, however, using them as tools can be a challenging task due to their unpredictable behavior that goes against common intuitions. This paper introduces a novel concept to augment such generative architectures with semantic annotations, either by manually authoring pixel labels or using existing solutions for semantic segmentation. The result is a content-aware generative algorithm that offers meaningful control over the outcome. Thus, we increase the quality of images generated by avoiding common glitches, make the results look significantly more plausible, and extend the functional range of these algorithms—whether for portraits or landscapes, etc. Applications include semantic style transfer and turning doodles with few colors into masterful paintings!
Tasks	Image Generation, Semantic Segmentation, Style Transfer
Published	2016-03-05
URL	http://arxiv.org/abs/1603.01768v1
PDF	http://arxiv.org/pdf/1603.01768v1.pdf
PWC	https://paperswithcode.com/paper/semantic-style-transfer-and-turning-two-bit
Repo	https://github.com/paulwarkentin/pytorch-neural-doodle
Framework	pytorch

Learning Thermodynamics with Boltzmann Machines


Title	Learning Thermodynamics with Boltzmann Machines
Authors	Giacomo Torlai, Roger G. Melko
Abstract	A Boltzmann machine is a stochastic neural network that has been extensively used in the layers of deep architectures for modern machine learning applications. In this paper, we develop a Boltzmann machine that is capable of modelling thermodynamic observables for physical systems in thermal equilibrium. Through unsupervised learning, we train the Boltzmann machine on data sets constructed with spin configurations importance-sampled from the partition function of an Ising Hamiltonian at different temperatures using Monte Carlo (MC) methods. The trained Boltzmann machine is then used to generate spin states, for which we compare thermodynamic observables to those computed by direct MC sampling. We demonstrate that the Boltzmann machine can faithfully reproduce the observables of the physical system. Further, we observe that the number of neurons required to obtain accurate results increases as the system is brought close to criticality.
Tasks
Published	2016-06-08
URL	http://arxiv.org/abs/1606.02718v1
PDF	http://arxiv.org/pdf/1606.02718v1.pdf
PWC	https://paperswithcode.com/paper/learning-thermodynamics-with-boltzmann
Repo	https://github.com/loppy1243/IsingBoltzmann
Framework	none

Multi-label Methods for Prediction with Sequential Data


Title	Multi-label Methods for Prediction with Sequential Data
Authors	Jesse Read, Luca Martino, Jaakko Hollmén
Abstract	The number of methods available for classification of multi-label data has increased rapidly over recent years, yet relatively few links have been made with the related task of classification of sequential data. If labels indices are considered as time indices, the problems can often be seen as equivalent. In this paper we detect and elaborate on connections between multi-label methods and Markovian models, and study the suitability of multi-label methods for prediction in sequential data. From this study we draw upon the most suitable techniques from the area and develop two novel competitive approaches which can be applied to either kind of data. We carry out an empirical evaluation investigating performance on real-world sequential-prediction tasks: electricity demand, and route prediction. As well as showing that several popular multi-label algorithms are in fact easily applicable to sequencing tasks, our novel approaches, which benefit from a unified view of these areas, prove very competitive against established methods.
Tasks
Published	2016-09-27
URL	http://arxiv.org/abs/1609.08349v2
PDF	http://arxiv.org/pdf/1609.08349v2.pdf
PWC	https://paperswithcode.com/paper/multi-label-methods-for-prediction-with
Repo	https://github.com/abimur-123/Canvass_codingchallenge
Framework	none

Predicting Ground-Level Scene Layout from Aerial Imagery


Title	Predicting Ground-Level Scene Layout from Aerial Imagery
Authors	Menghua Zhai, Zachary Bessinger, Scott Workman, Nathan Jacobs
Abstract	We introduce a novel strategy for learning to extract semantically meaningful features from aerial imagery. Instead of manually labeling the aerial imagery, we propose to predict (noisy) semantic features automatically extracted from co-located ground imagery. Our network architecture takes an aerial image as input, extracts features using a convolutional neural network, and then applies an adaptive transformation to map these features into the ground-level perspective. We use an end-to-end learning approach to minimize the difference between the semantic segmentation extracted directly from the ground image and the semantic segmentation predicted solely based on the aerial image. We show that a model learned using this strategy, with no additional training, is already capable of rough semantic labeling of aerial imagery. Furthermore, we demonstrate that by finetuning this model we can achieve more accurate semantic segmentation than two baseline initialization strategies. We use our network to address the task of estimating the geolocation and geoorientation of a ground image. Finally, we show how features extracted from an aerial image can be used to hallucinate a plausible ground-level panorama.
Tasks	Cross-View Image-to-Image Translation, Semantic Segmentation
Published	2016-12-08
URL	http://arxiv.org/abs/1612.02709v1
PDF	http://arxiv.org/pdf/1612.02709v1.pdf
PWC	https://paperswithcode.com/paper/predicting-ground-level-scene-layout-from
Repo	https://github.com/viibridges/crossnet
Framework	tf

Revisiting Visual Question Answering Baselines


Title	Revisiting Visual Question Answering Baselines
Authors	Allan Jabri, Armand Joulin, Laurens van der Maaten
Abstract	Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding. Many of the recently proposed VQA systems include attention or memory mechanisms designed to support “reasoning”. For multiple-choice VQA, nearly all of these systems train a multi-class classifier on image and question features to predict an answer. This paper questions the value of these common practices and develops a simple alternative model based on binary classification. Instead of treating answers as competing choices, our model receives the answer as input and predicts whether or not an image-question-answer triplet is correct. We evaluate our model on the Visual7W Telling and the VQA Real Multiple Choice tasks, and find that even simple versions of our model perform competitively. Our best model achieves state-of-the-art performance on the Visual7W Telling task and compares surprisingly well with the most complex systems proposed for the VQA Real Multiple Choice task. We explore variants of the model and study its transferability between both datasets. We also present an error analysis of our model that suggests a key problem of current VQA systems lies in the lack of visual grounding of concepts that occur in the questions and answers. Overall, our results suggest that the performance of current VQA systems is not significantly better than that of systems designed to exploit dataset biases.
Tasks	Question Answering, Visual Question Answering
Published	2016-06-27
URL	http://arxiv.org/abs/1606.08390v2
PDF	http://arxiv.org/pdf/1606.08390v2.pdf
PWC	https://paperswithcode.com/paper/revisiting-visual-question-answering
Repo	https://github.com/Cold-Winter/vqs
Framework	caffe2

Text Matching as Image Recognition


Title	Text Matching as Image Recognition
Authors	Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, Xueqi Cheng
Abstract	Matching two texts is a fundamental problem in many natural language processing tasks. An effective way is to extract meaningful matching patterns from words, phrases, and sentences to produce the matching score. Inspired by the success of convolutional neural network in image recognition, where neurons can capture many complicated patterns based on the extracted elementary visual patterns such as oriented edges and corners, we propose to model text matching as the problem of image recognition. Firstly, a matching matrix whose entries represent the similarities between words is constructed and viewed as an image. Then a convolutional neural network is utilized to capture rich matching patterns in a layer-by-layer way. We show that by resembling the compositional hierarchies of patterns in image recognition, our model can successfully identify salient signals such as n-gram and n-term matchings. Experimental results demonstrate its superiority against the baselines.
Tasks	Ad-Hoc Information Retrieval, Text Matching
Published	2016-02-20
URL	http://arxiv.org/abs/1602.06359v1
PDF	http://arxiv.org/pdf/1602.06359v1.pdf
PWC	https://paperswithcode.com/paper/text-matching-as-image-recognition
Repo	https://github.com/super-zhangchao/learning-to-match
Framework	tf

SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series


Title	SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series
Authors	Chen Luo, Anshumali Shrivastava
Abstract	Similarity search on time series is a frequent operation in large-scale data-driven applications. Sophisticated similarity measures are standard for time series matching, as they are usually misaligned. Dynamic Time Warping or DTW is the most widely used similarity measure for time series because it combines alignment and matching at the same time. However, the alignment makes DTW slow. To speed up the expensive similarity search with DTW, branch and bound based pruning strategies are adopted. However, branch and bound based pruning are only useful for very short queries (low dimensional time series), and the bounds are quite weak for longer queries. Due to the loose bounds branch and bound pruning strategy boils down to a brute-force search. To circumvent this issue, we design SSH (Sketch, Shingle, & Hashing), an efficient and approximate hashing scheme which is much faster than the state-of-the-art branch and bound searching technique: the UCR suite. SSH uses a novel combination of sketching, shingling and hashing techniques to produce (probabilistic) indexes which align (near perfectly) with DTW similarity measure. The generated indexes are then used to create hash buckets for sub-linear search. Our results show that SSH is very effective for longer time sequence and prunes around 95% candidates, leading to the massive speedup in search with DTW. Empirical results on two large-scale benchmark time series data show that our proposed method can be around 20 times faster than the state-of-the-art package (UCR suite) without any significant loss in accuracy.
Tasks	Time Series
Published	2016-10-24
URL	https://arxiv.org/abs/1610.07328v1
PDF	https://arxiv.org/pdf/1610.07328v1.pdf
PWC	https://paperswithcode.com/paper/ssh-sketch-shingle-hash-for-indexing-massive
Repo	https://github.com/ktatarnikov/time-series
Framework	none

Limbo: A Fast and Flexible Library for Bayesian Optimization


Title	Limbo: A Fast and Flexible Library for Bayesian Optimization
Authors	Antoine Cully, Konstantinos Chatzilygeroudis, Federico Allocati, Jean-Baptiste Mouret
Abstract	Limbo is an open-source C++11 library for Bayesian optimization which is designed to be both highly flexible and very fast. It can be used to optimize functions for which the gradient is unknown, evaluations are expensive, and runtime cost matters (e.g., on embedded systems or robots). Benchmarks on standard functions show that Limbo is about 2 times faster than BayesOpt (another C++ library) for a similar accuracy.
Tasks
Published	2016-11-22
URL	http://arxiv.org/abs/1611.07343v1
PDF	http://arxiv.org/pdf/1611.07343v1.pdf
PWC	https://paperswithcode.com/paper/limbo-a-fast-and-flexible-library-for
Repo	https://github.com/resibots/limbo
Framework	none

Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision


Title	Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision
Authors	Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. Forbus, Ni Lao
Abstract	Harnessing the statistical power of neural networks to perform language understanding and symbolic reasoning is difficult, when it requires executing efficient discrete operations against a large knowledge-base. In this work, we introduce a Neural Symbolic Machine, which contains (a) a neural “programmer”, i.e., a sequence-to-sequence model that maps language utterances to programs and utilizes a key-variable memory to handle compositionality (b) a symbolic “computer”, i.e., a Lisp interpreter that performs program execution, and helps find good programs by pruning the search space. We apply REINFORCE to directly optimize the task reward of this structured prediction problem. To train with weak supervision and improve the stability of REINFORCE, we augment it with an iterative maximum-likelihood training process. NSM outperforms the state-of-the-art on the WebQuestionsSP dataset when trained from question-answer pairs only, without requiring any feature engineering or domain-specific knowledge.
Tasks	Feature Engineering, Structured Prediction
Published	2016-10-31
URL	http://arxiv.org/abs/1611.00020v4
PDF	http://arxiv.org/pdf/1611.00020v4.pdf
PWC	https://paperswithcode.com/paper/neural-symbolic-machines-learning-semantic-1
Repo	https://github.com/theSparta/neural-symbolic-machines
Framework	tf