July 29, 2019

2824 words 14 mins read

Paper Group AWR 190

Paper Group AWR 190

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge. 3D Point Cloud Classification and Segmentation using 3D Modified Fisher Vector Representation for Convolutional Neural Networks. Language Generation with Recurrent Generative Adversarial Networks without Pre-training. Integrating Boundary and Center Correlation Filter …

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

Title Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge
Authors Damien Teney, Peter Anderson, Xiaodong He, Anton van den Hengel
Abstract This paper presents a state-of-the-art model for visual question answering (VQA), which won the first place in the 2017 VQA Challenge. VQA is a task of significant importance for research in artificial intelligence, given its multimodal nature, clear evaluation protocol, and potential real-world applications. The performance of deep neural networks for VQA is very dependent on choices of architectures and hyperparameters. To help further research in the area, we describe in detail our high-performing, though relatively simple model. Through a massive exploration of architectures and hyperparameters representing more than 3,000 GPU-hours, we identified tips and tricks that lead to its success, namely: sigmoid outputs, soft training targets, image features from bottom-up attention, gated tanh activations, output embeddings initialized using GloVe and Google Images, large mini-batches, and smart shuffling of training data. We provide a detailed analysis of their impact on performance to assist others in making an appropriate selection.
Tasks Visual Question Answering
Published 2017-08-09
URL http://arxiv.org/abs/1708.02711v1
PDF http://arxiv.org/pdf/1708.02711v1.pdf
PWC https://paperswithcode.com/paper/tips-and-tricks-for-visual-question-answering
Repo https://github.com/VincentYing/Attention-on-Attention-for-VQA
Framework pytorch

3D Point Cloud Classification and Segmentation using 3D Modified Fisher Vector Representation for Convolutional Neural Networks

Title 3D Point Cloud Classification and Segmentation using 3D Modified Fisher Vector Representation for Convolutional Neural Networks
Authors Yizhak Ben-Shabat, Michael Lindenbaum, Anath Fischer
Abstract The point cloud is gaining prominence as a method for representing 3D shapes, but its irregular format poses a challenge for deep learning methods. The common solution of transforming the data into a 3D voxel grid introduces its own challenges, mainly large memory size. In this paper we propose a novel 3D point cloud representation called 3D Modified Fisher Vectors (3DmFV). Our representation is hybrid as it combines the discrete structure of a grid with continuous generalization of Fisher vectors, in a compact and computationally efficient way. Using the grid enables us to design a new CNN architecture for point cloud classification and part segmentation. In a series of experiments we demonstrate competitive performance or even better than state-of-the-art on challenging benchmark datasets.
Tasks
Published 2017-11-22
URL http://arxiv.org/abs/1711.08241v1
PDF http://arxiv.org/pdf/1711.08241v1.pdf
PWC https://paperswithcode.com/paper/3d-point-cloud-classification-and
Repo https://github.com/fferroni/fisher_vector_classifier
Framework tf

Language Generation with Recurrent Generative Adversarial Networks without Pre-training

Title Language Generation with Recurrent Generative Adversarial Networks without Pre-training
Authors Ofir Press, Amir Bar, Ben Bogin, Jonathan Berant, Lior Wolf
Abstract Generative Adversarial Networks (GANs) have shown great promise recently in image generation. Training GANs for language generation has proven to be more difficult, because of the non-differentiable nature of generating text with recurrent neural networks. Consequently, past work has either resorted to pre-training with maximum-likelihood or used convolutional networks for generation. In this work, we show that recurrent neural networks can be trained to generate text with GANs from scratch using curriculum learning, by slowly teaching the model to generate sequences of increasing and variable length. We empirically show that our approach vastly improves the quality of generated sequences compared to a convolutional baseline.
Tasks Text Generation
Published 2017-06-05
URL http://arxiv.org/abs/1706.01399v3
PDF http://arxiv.org/pdf/1706.01399v3.pdf
PWC https://paperswithcode.com/paper/language-generation-with-recurrent-generative
Repo https://github.com/amirbar/rnn.wgan
Framework tf

Integrating Boundary and Center Correlation Filters for Visual Tracking with Aspect Ratio Variation

Title Integrating Boundary and Center Correlation Filters for Visual Tracking with Aspect Ratio Variation
Authors Feng Li, Yingjie Yao, Peihua Li, David Zhang, Wangmeng Zuo, Ming-Hsuan Yang
Abstract The aspect ratio variation frequently appears in visual tracking and has a severe influence on performance. Although many correlation filter (CF)-based trackers have also been suggested for scale adaptive tracking, few studies have been given to handle the aspect ratio variation for CF trackers. In this paper, we make the first attempt to address this issue by introducing a family of 1D boundary CFs to localize the left, right, top, and bottom boundaries in videos. This allows us cope with the aspect ratio variation flexibly during tracking. Specifically, we present a novel tracking model to integrate 1D Boundary and 2D Center CFs (IBCCF) where boundary and center filters are enforced by a near-orthogonality regularization term. To optimize our IBCCF model, we develop an alternating direction method of multipliers. Experiments on several datasets show that IBCCF can effectively handle aspect ratio variation, and achieves state-of-the-art performance in terms of accuracy and robustness.
Tasks Visual Tracking
Published 2017-10-05
URL http://arxiv.org/abs/1710.02039v1
PDF http://arxiv.org/pdf/1710.02039v1.pdf
PWC https://paperswithcode.com/paper/integrating-boundary-and-center-correlation
Repo https://github.com/lifeng9472/IBCCF
Framework none

Switching Convolutional Neural Network for Crowd Counting

Title Switching Convolutional Neural Network for Crowd Counting
Authors Deepak Babu Sam, Shiv Surya, R. Venkatesh Babu
Abstract We propose a novel crowd counting model that maps a given crowd scene to its density. Crowd analysis is compounded by myriad of factors like inter-occlusion between people due to extreme crowding, high similarity of appearance between people and background elements, and large variability of camera view-points. Current state-of-the art approaches tackle these factors by using multi-scale CNN architectures, recurrent networks and late fusion of features from multi-column CNN with different receptive fields. We propose switching convolutional neural network that leverages variation of crowd density within an image to improve the accuracy and localization of the predicted crowd count. Patches from a grid within a crowd scene are relayed to independent CNN regressors based on crowd count prediction quality of the CNN established during training. The independent CNN regressors are designed to have different receptive fields and a switch classifier is trained to relay the crowd scene patch to the best CNN regressor. We perform extensive experiments on all major crowd counting datasets and evidence better performance compared to current state-of-the-art methods. We provide interpretable representations of the multichotomy of space of crowd scene patches inferred from the switch. It is observed that the switch relays an image patch to a particular CNN column based on density of crowd.
Tasks Crowd Counting
Published 2017-08-01
URL http://arxiv.org/abs/1708.00199v2
PDF http://arxiv.org/pdf/1708.00199v2.pdf
PWC https://paperswithcode.com/paper/switching-convolutional-neural-network-for
Repo https://github.com/surajdakua/Crowd-Counting-Using-Pytorch
Framework pytorch

Arbitrary-Oriented Scene Text Detection via Rotation Proposals

Title Arbitrary-Oriented Scene Text Detection via Rotation Proposals
Authors Jianqi Ma, Weiyuan Shao, Hao Ye, Li Wang, Hong Wang, Yingbin Zheng, Xiangyang Xue
Abstract This paper introduces a novel rotation-based framework for arbitrary-oriented text detection in natural scene images. We present the Rotation Region Proposal Networks (RRPN), which are designed to generate inclined proposals with text orientation angle information. The angle information is then adapted for bounding box regression to make the proposals more accurately fit into the text region in terms of the orientation. The Rotation Region-of-Interest (RRoI) pooling layer is proposed to project arbitrary-oriented proposals to a feature map for a text region classifier. The whole framework is built upon a region-proposal-based architecture, which ensures the computational efficiency of the arbitrary-oriented text detection compared with previous text detection systems. We conduct experiments using the rotation-based framework on three real-world scene text detection datasets and demonstrate its superiority in terms of effectiveness and efficiency over previous approaches.
Tasks Scene Text Detection
Published 2017-03-03
URL http://arxiv.org/abs/1703.01086v3
PDF http://arxiv.org/pdf/1703.01086v3.pdf
PWC https://paperswithcode.com/paper/arbitrary-oriented-scene-text-detection-via
Repo https://github.com/kanuore/RRPN
Framework none

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition

Title SEE: Towards Semi-Supervised End-to-End Scene Text Recognition
Authors Christian Bartz, Haojin Yang, Christoph Meinel
Abstract Detecting and recognizing text in natural scene images is a challenging, yet not completely solved task. In recent years several new systems that try to solve at least one of the two sub-tasks (text detection and text recognition) have been proposed. In this paper we present SEE, a step towards semi-supervised neural networks for scene text detection and recognition, that can be optimized end-to-end. Most existing works consist of multiple deep neural networks and several pre-processing steps. In contrast to this, we propose to use a single deep neural network, that learns to detect and recognize text from natural images, in a semi-supervised way. SEE is a network that integrates and jointly learns a spatial transformer network, which can learn to detect text regions in an image, and a text recognition network that takes the identified text regions and recognizes their textual content. We introduce the idea behind our novel approach and show its feasibility, by performing a range of experiments on standard benchmark datasets, where we achieve competitive results.
Tasks Scene Text Detection, Scene Text Recognition
Published 2017-12-14
URL http://arxiv.org/abs/1712.05404v1
PDF http://arxiv.org/pdf/1712.05404v1.pdf
PWC https://paperswithcode.com/paper/see-towards-semi-supervised-end-to-end-scene
Repo https://github.com/Bartzi/see
Framework tf

Finite-dimensional Gaussian approximation with linear inequality constraints

Title Finite-dimensional Gaussian approximation with linear inequality constraints
Authors Andrés F. López-Lopera, François Bachoc, Nicolas Durrande, Olivier Roustant
Abstract Introducing inequality constraints in Gaussian process (GP) models can lead to more realistic uncertainties in learning a great variety of real-world problems. We consider the finite-dimensional Gaussian approach from Maatouk and Bay (2017) which can satisfy inequality conditions everywhere (either boundedness, monotonicity or convexity). Our contributions are threefold. First, we extend their approach in order to deal with general sets of linear inequalities. Second, we explore several Markov Chain Monte Carlo (MCMC) techniques to approximate the posterior distribution. Third, we investigate theoretical and numerical properties of the constrained likelihood for covariance parameter estimation. According to experiments on both artificial and real data, our full framework together with a Hamiltonian Monte Carlo-based sampler provides efficient results on both data fitting and uncertainty quantification.
Tasks
Published 2017-10-20
URL http://arxiv.org/abs/1710.07453v1
PDF http://arxiv.org/pdf/1710.07453v1.pdf
PWC https://paperswithcode.com/paper/finite-dimensional-gaussian-approximation
Repo https://github.com/anfelopera/lineqGPR
Framework none

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Title PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
Authors Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas
Abstract Few prior works study deep learning on point sets. PointNet by Qi et al. is a pioneer in this direction. However, by design PointNet does not capture local structures induced by the metric space points live in, limiting its ability to recognize fine-grained patterns and generalizability to complex scenes. In this work, we introduce a hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set. By exploiting metric space distances, our network is able to learn local features with increasing contextual scales. With further observation that point sets are usually sampled with varying densities, which results in greatly decreased performance for networks trained on uniform densities, we propose novel set learning layers to adaptively combine features from multiple scales. Experiments show that our network called PointNet++ is able to learn deep point set features efficiently and robustly. In particular, results significantly better than state-of-the-art have been obtained on challenging benchmarks of 3D point clouds.
Tasks 3D Part Segmentation, Semantic Segmentation
Published 2017-06-07
URL http://arxiv.org/abs/1706.02413v1
PDF http://arxiv.org/pdf/1706.02413v1.pdf
PWC https://paperswithcode.com/paper/pointnet-deep-hierarchical-feature-learning
Repo https://github.com/houseleo/pointnet
Framework tf

Value Prediction Network

Title Value Prediction Network
Authors Junhyuk Oh, Satinder Singh, Honglak Lee
Abstract This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network. In contrast to typical model-based RL methods, VPN learns a dynamics model whose abstract states are trained to make option-conditional predictions of future values (discounted sum of rewards) rather than of future observations. Our experimental results show that VPN has several advantages over both model-free and model-based baselines in a stochastic environment where careful planning is required but building an accurate observation-prediction model is difficult. Furthermore, VPN outperforms Deep Q-Network (DQN) on several Atari games even with short-lookahead planning, demonstrating its potential as a new way of learning a good state representation.
Tasks Atari Games
Published 2017-07-11
URL http://arxiv.org/abs/1707.03497v2
PDF http://arxiv.org/pdf/1707.03497v2.pdf
PWC https://paperswithcode.com/paper/value-prediction-network
Repo https://github.com/geohot/twitchcoq
Framework none

Submanifold Sparse Convolutional Networks

Title Submanifold Sparse Convolutional Networks
Authors Benjamin Graham, Laurens van der Maaten
Abstract Convolutional network are the de-facto standard for analysing spatio-temporal data such as images, videos, 3D shapes, etc. Whilst some of this data is naturally dense (for instance, photos), many other data sources are inherently sparse. Examples include pen-strokes forming on a piece of paper, or (colored) 3D point clouds that were obtained using a LiDAR scanner or RGB-D camera. Standard “dense” implementations of convolutional networks are very inefficient when applied on such sparse data. We introduce a sparse convolutional operation tailored to processing sparse data that differs from prior work on sparse convolutional networks in that it operates strictly on submanifolds, rather than “dilating” the observation with every layer in the network. Our empirical analysis of the resulting submanifold sparse convolutional networks shows that they perform on par with state-of-the-art methods whilst requiring substantially less computation.
Tasks 3D Part Segmentation
Published 2017-06-05
URL http://arxiv.org/abs/1706.01307v1
PDF http://arxiv.org/pdf/1706.01307v1.pdf
PWC https://paperswithcode.com/paper/submanifold-sparse-convolutional-networks
Repo https://github.com/uber/sbnet
Framework tf

Synthetic Medical Images from Dual Generative Adversarial Networks

Title Synthetic Medical Images from Dual Generative Adversarial Networks
Authors John T. Guibas, Tejpal S. Virdi, Peter S. Li
Abstract Currently there is strong interest in data-driven approaches to medical image classification. However, medical imaging data is scarce, expensive, and fraught with legal concerns regarding patient privacy. Typical consent forms only allow for patient data to be used in medical journals or education, meaning the majority of medical data is inaccessible for general public research. We propose a novel, two-stage pipeline for generating synthetic medical images from a pair of generative adversarial networks, tested in practice on retinal fundi images. We develop a hierarchical generation process to divide the complex image generation task into two parts: geometry and photorealism. We hope researchers will use our pipeline to bring private medical data into the public domain, sparking growth in imaging tasks that have previously relied on the hand-tuning of models. We have begun this initiative through the development of SynthMed, an online repository for synthetic medical images.
Tasks Image Classification, Image Generation, Medical Image Generation
Published 2017-09-06
URL http://arxiv.org/abs/1709.01872v3
PDF http://arxiv.org/pdf/1709.01872v3.pdf
PWC https://paperswithcode.com/paper/synthetic-medical-images-from-dual-generative
Repo https://github.com/HarshaVardhanVanama/Synthetic-Medical-Images
Framework tf

Bandit Structured Prediction for Neural Sequence-to-Sequence Learning

Title Bandit Structured Prediction for Neural Sequence-to-Sequence Learning
Authors Julia Kreutzer, Artem Sokolov, Stefan Riezler
Abstract Bandit structured prediction describes a stochastic optimization framework where learning is performed from partial feedback. This feedback is received in the form of a task loss evaluation to a predicted output structure, without having access to gold standard structures. We advance this framework by lifting linear bandit learning to neural sequence-to-sequence learning problems using attention-based recurrent neural networks. Furthermore, we show how to incorporate control variates into our learning algorithms for variance reduction and improved generalization. We present an evaluation on a neural machine translation task that shows improvements of up to 5.89 BLEU points for domain adaptation from simulated bandit feedback.
Tasks Domain Adaptation, Machine Translation, Stochastic Optimization, Structured Prediction
Published 2017-04-21
URL http://arxiv.org/abs/1704.06497v2
PDF http://arxiv.org/pdf/1704.06497v2.pdf
PWC https://paperswithcode.com/paper/bandit-structured-prediction-for-neural
Repo https://github.com/juliakreutzer/bandit-neuralmonkey
Framework tf

Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics

Title Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics
Authors Konstantinos Chatzilygeroudis, Jean-Baptiste Mouret
Abstract The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. Among the few proposed approaches, the recently introduced Black-DROPS algorithm exploits a black-box optimization algorithm to achieve both high data-efficiency and good computation times when several cores are used; nevertheless, like all model-based policy search approaches, Black-DROPS does not scale to high dimensional state/action spaces. In this paper, we introduce a new model learning procedure in Black-DROPS that leverages parameterized black-box priors to (1) scale up to high-dimensional systems, and (2) be robust to large inaccuracies of the prior information. We demonstrate the effectiveness of our approach with the “pendubot” swing-up task in simulation and with a physical hexapod robot (48D state space, 18D action space) that has to walk forward as fast as possible. The results show that our new algorithm is more data-efficient than previous model-based policy search algorithms (with and without priors) and that it can allow a physical 6-legged robot to learn new gaits in only 16 to 30 seconds of interaction time.
Tasks Continuous Control, Legged Robots
Published 2017-09-20
URL http://arxiv.org/abs/1709.06917v2
PDF http://arxiv.org/pdf/1709.06917v2.pdf
PWC https://paperswithcode.com/paper/using-parameterized-black-box-priors-to-scale
Repo https://github.com/resibots/blackdrops
Framework none

Character Composition Model with Convolutional Neural Networks for Dependency Parsing on Morphologically Rich Languages

Title Character Composition Model with Convolutional Neural Networks for Dependency Parsing on Morphologically Rich Languages
Authors Xiang Yu, Ngoc Thang Vu
Abstract We present a transition-based dependency parser that uses a convolutional neural network to compose word representations from characters. The character composition model shows great improvement over the word-lookup model, especially for parsing agglutinative languages. These improvements are even better than using pre-trained word embeddings from extra data. On the SPMRL data sets, our system outperforms the previous best greedy parser (Ballesteros et al., 2015) by a margin of 3% on average.
Tasks Dependency Parsing, Word Embeddings
Published 2017-05-30
URL http://arxiv.org/abs/1705.10814v1
PDF http://arxiv.org/pdf/1705.10814v1.pdf
PWC https://paperswithcode.com/paper/character-composition-model-with
Repo https://github.com/EggplantElf/sclem2017-tagger
Framework none
comments powered by Disqus