Paper Group AWR 41
An overview of gradient descent optimization algorithms. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. Convolutional Sketch Inversion. GuessWhat?! Visual object discovery through multi-modal dialogue. VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering. Dialogue Learni …
An overview of gradient descent optimization algorithms
Title | An overview of gradient descent optimization algorithms |
Authors | Sebastian Ruder |
Abstract | Gradient descent optimization algorithms, while increasingly popular, are often used as black-box optimizers, as practical explanations of their strengths and weaknesses are hard to come by. This article aims to provide the reader with intuitions with regard to the behaviour of different algorithms that will allow her to put them to use. In the course of this overview, we look at different variants of gradient descent, summarize challenges, introduce the most common optimization algorithms, review architectures in a parallel and distributed setting, and investigate additional strategies for optimizing gradient descent. |
Tasks | |
Published | 2016-09-15 |
URL | http://arxiv.org/abs/1609.04747v2 |
http://arxiv.org/pdf/1609.04747v2.pdf | |
PWC | https://paperswithcode.com/paper/an-overview-of-gradient-descent-optimization |
Repo | https://github.com/congcui2007/AWS |
Framework | tf |
Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning
Title | Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning |
Authors | Guillaume Lemaitre, Fernando Nogueira, Christos K. Aridas |
Abstract | Imbalanced-learn is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern recognition. The implemented state-of-the-art methods can be categorized into 4 groups: (i) under-sampling, (ii) over-sampling, (iii) combination of over- and under-sampling, and (iv) ensemble learning methods. The proposed toolbox only depends on numpy, scipy, and scikit-learn and is distributed under MIT license. Furthermore, it is fully compatible with scikit-learn and is part of the scikit-learn-contrib supported project. Documentation, unit tests as well as integration tests are provided to ease usage and contribution. The toolbox is publicly available in GitHub: https://github.com/scikit-learn-contrib/imbalanced-learn. |
Tasks | |
Published | 2016-09-21 |
URL | http://arxiv.org/abs/1609.06570v1 |
http://arxiv.org/pdf/1609.06570v1.pdf | |
PWC | https://paperswithcode.com/paper/imbalanced-learn-a-python-toolbox-to-tackle |
Repo | https://github.com/scikit-learn-contrib/imbalanced-learn |
Framework | tf |
Convolutional Sketch Inversion
Title | Convolutional Sketch Inversion |
Authors | Yağmur Güçlütürk, Umut Güçlü, Rob van Lier, Marcel A. J. van Gerven |
Abstract | In this paper, we use deep neural networks for inverting face sketches to synthesize photorealistic face images. We first construct a semi-simulated dataset containing a very large number of computer-generated face sketches with different styles and corresponding face images by expanding existing unconstrained face data sets. We then train models achieving state-of-the-art results on both computer-generated sketches and hand-drawn sketches by leveraging recent advances in deep learning such as batch normalization, deep residual learning, perceptual losses and stochastic optimization in combination with our new dataset. We finally demonstrate potential applications of our models in fine arts and forensic arts. In contrast to existing patch-based approaches, our deep-neural-network-based approach can be used for synthesizing photorealistic face images by inverting face sketches in the wild. |
Tasks | Stochastic Optimization |
Published | 2016-06-09 |
URL | http://arxiv.org/abs/1606.03073v1 |
http://arxiv.org/pdf/1606.03073v1.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-sketch-inversion |
Repo | https://github.com/saifvazir/Convolutional-Sketch-Inversion |
Framework | pytorch |
GuessWhat?! Visual object discovery through multi-modal dialogue
Title | GuessWhat?! Visual object discovery through multi-modal dialogue |
Authors | Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, Aaron Courville |
Abstract | We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images. We explain our design decisions in collecting the dataset and introduce the oracle and questioner tasks that are associated with the two players of the game. We prototyped deep learning models to establish initial baselines of the introduced tasks. |
Tasks | |
Published | 2016-11-23 |
URL | http://arxiv.org/abs/1611.08481v2 |
http://arxiv.org/pdf/1611.08481v2.pdf | |
PWC | https://paperswithcode.com/paper/guesswhat-visual-object-discovery-through |
Repo | https://github.com/ibrahimSouleiman/GuessWhat |
Framework | tf |
VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering
Title | VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering |
Authors | Marc Bolaños, Álvaro Peris, Francisco Casacuberta, Petia Radeva |
Abstract | In this paper, we address the problem of visual question answering by proposing a novel model, called VIBIKNet. Our model is based on integrating Kernelized Convolutional Neural Networks and Long-Short Term Memory units to generate an answer given a question about an image. We prove that VIBIKNet is an optimal trade-off between accuracy and computational load, in terms of memory and time consumption. We validate our method on the VQA challenge dataset and compare it to the top performing methods in order to illustrate its performance and speed. |
Tasks | Question Answering, Visual Question Answering |
Published | 2016-12-12 |
URL | http://arxiv.org/abs/1612.03628v1 |
http://arxiv.org/pdf/1612.03628v1.pdf | |
PWC | https://paperswithcode.com/paper/vibiknet-visual-bidirectional-kernelized |
Repo | https://github.com/MarcBS/VIBIKNet |
Framework | none |
Dialogue Learning With Human-In-The-Loop
Title | Dialogue Learning With Human-In-The-Loop |
Authors | Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc’Aurelio Ranzato, Jason Weston |
Abstract | An important aspect of developing conversational agents is to give a bot the ability to improve through communicating with humans and to learn from the mistakes that it makes. Most research has focused on learning from fixed training sets of labeled data rather than interacting with a dialogue partner in an online fashion. In this paper we explore this direction in a reinforcement learning setting where the bot improves its question-answering ability from feedback a teacher gives following its generated responses. We build a simulator that tests various aspects of such learning in a synthetic environment, and introduce models that work in this regime. Finally, real experiments with Mechanical Turk validate the approach. |
Tasks | Question Answering |
Published | 2016-11-29 |
URL | http://arxiv.org/abs/1611.09823v3 |
http://arxiv.org/pdf/1611.09823v3.pdf | |
PWC | https://paperswithcode.com/paper/dialogue-learning-with-human-in-the-loop |
Repo | https://github.com/rohit129/Movie_KnowledgeGraph_QA |
Framework | none |
Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks
Title | Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks |
Authors | Alex J. Champandard |
Abstract | Convolutional neural networks (CNNs) have proven highly effective at image synthesis and style transfer. For most users, however, using them as tools can be a challenging task due to their unpredictable behavior that goes against common intuitions. This paper introduces a novel concept to augment such generative architectures with semantic annotations, either by manually authoring pixel labels or using existing solutions for semantic segmentation. The result is a content-aware generative algorithm that offers meaningful control over the outcome. Thus, we increase the quality of images generated by avoiding common glitches, make the results look significantly more plausible, and extend the functional range of these algorithms—whether for portraits or landscapes, etc. Applications include semantic style transfer and turning doodles with few colors into masterful paintings! |
Tasks | Image Generation, Semantic Segmentation, Style Transfer |
Published | 2016-03-05 |
URL | http://arxiv.org/abs/1603.01768v1 |
http://arxiv.org/pdf/1603.01768v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-style-transfer-and-turning-two-bit |
Repo | https://github.com/paulwarkentin/pytorch-neural-doodle |
Framework | pytorch |
Learning Thermodynamics with Boltzmann Machines
Title | Learning Thermodynamics with Boltzmann Machines |
Authors | Giacomo Torlai, Roger G. Melko |
Abstract | A Boltzmann machine is a stochastic neural network that has been extensively used in the layers of deep architectures for modern machine learning applications. In this paper, we develop a Boltzmann machine that is capable of modelling thermodynamic observables for physical systems in thermal equilibrium. Through unsupervised learning, we train the Boltzmann machine on data sets constructed with spin configurations importance-sampled from the partition function of an Ising Hamiltonian at different temperatures using Monte Carlo (MC) methods. The trained Boltzmann machine is then used to generate spin states, for which we compare thermodynamic observables to those computed by direct MC sampling. We demonstrate that the Boltzmann machine can faithfully reproduce the observables of the physical system. Further, we observe that the number of neurons required to obtain accurate results increases as the system is brought close to criticality. |
Tasks | |
Published | 2016-06-08 |
URL | http://arxiv.org/abs/1606.02718v1 |
http://arxiv.org/pdf/1606.02718v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-thermodynamics-with-boltzmann |
Repo | https://github.com/loppy1243/IsingBoltzmann |
Framework | none |
Multi-label Methods for Prediction with Sequential Data
Title | Multi-label Methods for Prediction with Sequential Data |
Authors | Jesse Read, Luca Martino, Jaakko Hollmén |
Abstract | The number of methods available for classification of multi-label data has increased rapidly over recent years, yet relatively few links have been made with the related task of classification of sequential data. If labels indices are considered as time indices, the problems can often be seen as equivalent. In this paper we detect and elaborate on connections between multi-label methods and Markovian models, and study the suitability of multi-label methods for prediction in sequential data. From this study we draw upon the most suitable techniques from the area and develop two novel competitive approaches which can be applied to either kind of data. We carry out an empirical evaluation investigating performance on real-world sequential-prediction tasks: electricity demand, and route prediction. As well as showing that several popular multi-label algorithms are in fact easily applicable to sequencing tasks, our novel approaches, which benefit from a unified view of these areas, prove very competitive against established methods. |
Tasks | |
Published | 2016-09-27 |
URL | http://arxiv.org/abs/1609.08349v2 |
http://arxiv.org/pdf/1609.08349v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-label-methods-for-prediction-with |
Repo | https://github.com/abimur-123/Canvass_codingchallenge |
Framework | none |
Predicting Ground-Level Scene Layout from Aerial Imagery
Title | Predicting Ground-Level Scene Layout from Aerial Imagery |
Authors | Menghua Zhai, Zachary Bessinger, Scott Workman, Nathan Jacobs |
Abstract | We introduce a novel strategy for learning to extract semantically meaningful features from aerial imagery. Instead of manually labeling the aerial imagery, we propose to predict (noisy) semantic features automatically extracted from co-located ground imagery. Our network architecture takes an aerial image as input, extracts features using a convolutional neural network, and then applies an adaptive transformation to map these features into the ground-level perspective. We use an end-to-end learning approach to minimize the difference between the semantic segmentation extracted directly from the ground image and the semantic segmentation predicted solely based on the aerial image. We show that a model learned using this strategy, with no additional training, is already capable of rough semantic labeling of aerial imagery. Furthermore, we demonstrate that by finetuning this model we can achieve more accurate semantic segmentation than two baseline initialization strategies. We use our network to address the task of estimating the geolocation and geoorientation of a ground image. Finally, we show how features extracted from an aerial image can be used to hallucinate a plausible ground-level panorama. |
Tasks | Cross-View Image-to-Image Translation, Semantic Segmentation |
Published | 2016-12-08 |
URL | http://arxiv.org/abs/1612.02709v1 |
http://arxiv.org/pdf/1612.02709v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-ground-level-scene-layout-from |
Repo | https://github.com/viibridges/crossnet |
Framework | tf |
Revisiting Visual Question Answering Baselines
Title | Revisiting Visual Question Answering Baselines |
Authors | Allan Jabri, Armand Joulin, Laurens van der Maaten |
Abstract | Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding. Many of the recently proposed VQA systems include attention or memory mechanisms designed to support “reasoning”. For multiple-choice VQA, nearly all of these systems train a multi-class classifier on image and question features to predict an answer. This paper questions the value of these common practices and develops a simple alternative model based on binary classification. Instead of treating answers as competing choices, our model receives the answer as input and predicts whether or not an image-question-answer triplet is correct. We evaluate our model on the Visual7W Telling and the VQA Real Multiple Choice tasks, and find that even simple versions of our model perform competitively. Our best model achieves state-of-the-art performance on the Visual7W Telling task and compares surprisingly well with the most complex systems proposed for the VQA Real Multiple Choice task. We explore variants of the model and study its transferability between both datasets. We also present an error analysis of our model that suggests a key problem of current VQA systems lies in the lack of visual grounding of concepts that occur in the questions and answers. Overall, our results suggest that the performance of current VQA systems is not significantly better than that of systems designed to exploit dataset biases. |
Tasks | Question Answering, Visual Question Answering |
Published | 2016-06-27 |
URL | http://arxiv.org/abs/1606.08390v2 |
http://arxiv.org/pdf/1606.08390v2.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-visual-question-answering |
Repo | https://github.com/Cold-Winter/vqs |
Framework | caffe2 |
Text Matching as Image Recognition
Title | Text Matching as Image Recognition |
Authors | Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, Xueqi Cheng |
Abstract | Matching two texts is a fundamental problem in many natural language processing tasks. An effective way is to extract meaningful matching patterns from words, phrases, and sentences to produce the matching score. Inspired by the success of convolutional neural network in image recognition, where neurons can capture many complicated patterns based on the extracted elementary visual patterns such as oriented edges and corners, we propose to model text matching as the problem of image recognition. Firstly, a matching matrix whose entries represent the similarities between words is constructed and viewed as an image. Then a convolutional neural network is utilized to capture rich matching patterns in a layer-by-layer way. We show that by resembling the compositional hierarchies of patterns in image recognition, our model can successfully identify salient signals such as n-gram and n-term matchings. Experimental results demonstrate its superiority against the baselines. |
Tasks | Ad-Hoc Information Retrieval, Text Matching |
Published | 2016-02-20 |
URL | http://arxiv.org/abs/1602.06359v1 |
http://arxiv.org/pdf/1602.06359v1.pdf | |
PWC | https://paperswithcode.com/paper/text-matching-as-image-recognition |
Repo | https://github.com/super-zhangchao/learning-to-match |
Framework | tf |
SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series
Title | SSH (Sketch, Shingle, & Hash) for Indexing Massive-Scale Time Series |
Authors | Chen Luo, Anshumali Shrivastava |
Abstract | Similarity search on time series is a frequent operation in large-scale data-driven applications. Sophisticated similarity measures are standard for time series matching, as they are usually misaligned. Dynamic Time Warping or DTW is the most widely used similarity measure for time series because it combines alignment and matching at the same time. However, the alignment makes DTW slow. To speed up the expensive similarity search with DTW, branch and bound based pruning strategies are adopted. However, branch and bound based pruning are only useful for very short queries (low dimensional time series), and the bounds are quite weak for longer queries. Due to the loose bounds branch and bound pruning strategy boils down to a brute-force search. To circumvent this issue, we design SSH (Sketch, Shingle, & Hashing), an efficient and approximate hashing scheme which is much faster than the state-of-the-art branch and bound searching technique: the UCR suite. SSH uses a novel combination of sketching, shingling and hashing techniques to produce (probabilistic) indexes which align (near perfectly) with DTW similarity measure. The generated indexes are then used to create hash buckets for sub-linear search. Our results show that SSH is very effective for longer time sequence and prunes around 95% candidates, leading to the massive speedup in search with DTW. Empirical results on two large-scale benchmark time series data show that our proposed method can be around 20 times faster than the state-of-the-art package (UCR suite) without any significant loss in accuracy. |
Tasks | Time Series |
Published | 2016-10-24 |
URL | https://arxiv.org/abs/1610.07328v1 |
https://arxiv.org/pdf/1610.07328v1.pdf | |
PWC | https://paperswithcode.com/paper/ssh-sketch-shingle-hash-for-indexing-massive |
Repo | https://github.com/ktatarnikov/time-series |
Framework | none |
Limbo: A Fast and Flexible Library for Bayesian Optimization
Title | Limbo: A Fast and Flexible Library for Bayesian Optimization |
Authors | Antoine Cully, Konstantinos Chatzilygeroudis, Federico Allocati, Jean-Baptiste Mouret |
Abstract | Limbo is an open-source C++11 library for Bayesian optimization which is designed to be both highly flexible and very fast. It can be used to optimize functions for which the gradient is unknown, evaluations are expensive, and runtime cost matters (e.g., on embedded systems or robots). Benchmarks on standard functions show that Limbo is about 2 times faster than BayesOpt (another C++ library) for a similar accuracy. |
Tasks | |
Published | 2016-11-22 |
URL | http://arxiv.org/abs/1611.07343v1 |
http://arxiv.org/pdf/1611.07343v1.pdf | |
PWC | https://paperswithcode.com/paper/limbo-a-fast-and-flexible-library-for |
Repo | https://github.com/resibots/limbo |
Framework | none |
Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision
Title | Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision |
Authors | Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. Forbus, Ni Lao |
Abstract | Harnessing the statistical power of neural networks to perform language understanding and symbolic reasoning is difficult, when it requires executing efficient discrete operations against a large knowledge-base. In this work, we introduce a Neural Symbolic Machine, which contains (a) a neural “programmer”, i.e., a sequence-to-sequence model that maps language utterances to programs and utilizes a key-variable memory to handle compositionality (b) a symbolic “computer”, i.e., a Lisp interpreter that performs program execution, and helps find good programs by pruning the search space. We apply REINFORCE to directly optimize the task reward of this structured prediction problem. To train with weak supervision and improve the stability of REINFORCE, we augment it with an iterative maximum-likelihood training process. NSM outperforms the state-of-the-art on the WebQuestionsSP dataset when trained from question-answer pairs only, without requiring any feature engineering or domain-specific knowledge. |
Tasks | Feature Engineering, Structured Prediction |
Published | 2016-10-31 |
URL | http://arxiv.org/abs/1611.00020v4 |
http://arxiv.org/pdf/1611.00020v4.pdf | |
PWC | https://paperswithcode.com/paper/neural-symbolic-machines-learning-semantic-1 |
Repo | https://github.com/theSparta/neural-symbolic-machines |
Framework | tf |