May 7, 2019

3306 words 16 mins read

Paper Group AWR 24

Paper Group AWR 24

Visual Question Answering: A Survey of Methods and Datasets. Multiframe Motion Coupling for Video Super Resolution. On the Quantitative Analysis of Decoder-Based Generative Models. Learning to Poke by Poking: Experiential Learning of Intuitive Physics. The GPU-based Parallel Ant Colony System. 3D fully convolutional networks for subcortical segment …

Visual Question Answering: A Survey of Methods and Datasets

Title Visual Question Answering: A Survey of Methods and Datasets
Authors Qi Wu, Damien Teney, Peng Wang, Chunhua Shen, Anthony Dick, Anton van den Hengel
Abstract Visual Question Answering (VQA) is a challenging task that has received increasing attention from both the computer vision and the natural language processing communities. Given an image and a question in natural language, it requires reasoning over visual elements of the image and general knowledge to infer the correct answer. In the first part of this survey, we examine the state of the art by comparing modern approaches to the problem. We classify methods by their mechanism to connect the visual and textual modalities. In particular, we examine the common approach of combining convolutional and recurrent neural networks to map images and questions to a common feature space. We also discuss memory-augmented and modular architectures that interface with structured knowledge bases. In the second part of this survey, we review the datasets available for training and evaluating VQA systems. The various datatsets contain questions at different levels of complexity, which require different capabilities and types of reasoning. We examine in depth the question/answer pairs from the Visual Genome project, and evaluate the relevance of the structured annotations of images with scene graphs for VQA. Finally, we discuss promising future directions for the field, in particular the connection to structured knowledge bases and the use of natural language processing models.
Tasks Visual Question Answering
Published 2016-07-20
URL http://arxiv.org/abs/1607.05910v1
PDF http://arxiv.org/pdf/1607.05910v1.pdf
PWC https://paperswithcode.com/paper/visual-question-answering-a-survey-of-methods
Repo https://github.com/AI-metrics/AI-metrics
Framework none

Multiframe Motion Coupling for Video Super Resolution

Title Multiframe Motion Coupling for Video Super Resolution
Authors Jonas Geiping, Hendrik Dirks, Daniel Cremers, Michael Moeller
Abstract The idea of video super resolution is to use different view points of a single scene to enhance the overall resolution and quality. Classical energy minimization approaches first establish a correspondence of the current frame to all its neighbors in some radius and then use this temporal information for enhancement. In this paper, we propose the first variational super resolution approach that computes several super resolved frames in one batch optimization procedure by incorporating motion information between the high-resolution image frames themselves. As a consequence, the number of motion estimation problems grows linearly in the number of frames, opposed to a quadratic growth of classical methods and temporal consistency is enforced naturally. We use infimal convolution regularization as well as an automatic parameter balancing scheme to automatically determine the reliability of the motion information and reweight the regularization locally. We demonstrate that our approach yields state-of-the-art results and even is competitive with machine learning approaches.
Tasks Motion Estimation, Super-Resolution, Video Super-Resolution
Published 2016-11-23
URL http://arxiv.org/abs/1611.07767v2
PDF http://arxiv.org/pdf/1611.07767v2.pdf
PWC https://paperswithcode.com/paper/multiframe-motion-coupling-for-video-super
Repo https://github.com/HendrikMuenster/superResolution
Framework none

On the Quantitative Analysis of Decoder-Based Generative Models

Title On the Quantitative Analysis of Decoder-Based Generative Models
Authors Yuhuai Wu, Yuri Burda, Ruslan Salakhutdinov, Roger Grosse
Abstract The past several years have seen remarkable progress in generative models which produce convincing samples of images and other modalities. A shared component of many powerful generative models is a decoder network, a parametric deep neural net that defines a generative distribution. Examples include variational autoencoders, generative adversarial networks, and generative moment matching networks. Unfortunately, it can be difficult to quantify the performance of these models because of the intractability of log-likelihood estimation, and inspecting samples can be misleading. We propose to use Annealed Importance Sampling for evaluating log-likelihoods for decoder-based models and validate its accuracy using bidirectional Monte Carlo. The evaluation code is provided at https://github.com/tonywu95/eval_gen. Using this technique, we analyze the performance of decoder-based models, the effectiveness of existing log-likelihood estimators, the degree of overfitting, and the degree to which these models miss important modes of the data distribution.
Tasks
Published 2016-11-14
URL http://arxiv.org/abs/1611.04273v2
PDF http://arxiv.org/pdf/1611.04273v2.pdf
PWC https://paperswithcode.com/paper/on-the-quantitative-analysis-of-decoder-based
Repo https://github.com/tonywu95/eval_gen
Framework tf

Learning to Poke by Poking: Experiential Learning of Intuitive Physics

Title Learning to Poke by Poking: Experiential Learning of Intuitive Physics
Authors Pulkit Agrawal, Ashvin Nair, Pieter Abbeel, Jitendra Malik, Sergey Levine
Abstract We investigate an experiential learning paradigm for acquiring an internal model of intuitive physics. Our model is evaluated on a real-world robotic manipulation task that requires displacing objects to target locations by poking. The robot gathered over 400 hours of experience by executing more than 100K pokes on different objects. We propose a novel approach based on deep neural networks for modeling the dynamics of robot’s interactions directly from images, by jointly estimating forward and inverse models of dynamics. The inverse model objective provides supervision to construct informative visual features, which the forward model can then predict and in turn regularize the feature space for the inverse model. The interplay between these two objectives creates useful, accurate models that can then be used for multi-step decision making. This formulation has the additional benefit that it is possible to learn forward models in an abstract feature space and thus alleviate the need of predicting pixels. Our experiments show that this joint modeling approach outperforms alternative methods.
Tasks Decision Making
Published 2016-06-23
URL http://arxiv.org/abs/1606.07419v2
PDF http://arxiv.org/pdf/1606.07419v2.pdf
PWC https://paperswithcode.com/paper/learning-to-poke-by-poking-experiential
Repo https://github.com/mbhenaff/EEN
Framework pytorch

The GPU-based Parallel Ant Colony System

Title The GPU-based Parallel Ant Colony System
Authors Rafał Skinderowicz
Abstract The Ant Colony System (ACS) is, next to Ant Colony Optimization (ACO) and the MAX-MIN Ant System (MMAS), one of the most efficient metaheuristic algorithms inspired by the behavior of ants. In this article we present three novel parallel versions of the ACS for the graphics processing units (GPUs). To the best of our knowledge, this is the first such work on the ACS which shares many key elements of the ACO and the MMAS, but differences in the process of building solutions and updating the pheromone trails make obtaining an efficient parallel version for the GPUs a difficult task. The proposed parallel versions of the ACS differ mainly in their implementations of the pheromone memory. The first two use the standard pheromone matrix, and the third uses a novel selective pheromone memory. Computational experiments conducted on several Travelling Salesman Problem (TSP) instances of sizes ranging from 198 to 2392 cities showed that the parallel ACS on Nvidia Kepler GK104 GPU (1536 CUDA cores) is able to obtain a speedup up to 24.29x vs the sequential ACS running on a single core of Intel Xeon E5-2670 CPU. The parallel ACS with the selective pheromone memory achieved speedups up to 16.85x, but in most cases the obtained solutions were of significantly better quality than for the sequential ACS.
Tasks
Published 2016-05-09
URL http://arxiv.org/abs/1605.02669v2
PDF http://arxiv.org/pdf/1605.02669v2.pdf
PWC https://paperswithcode.com/paper/the-gpu-based-parallel-ant-colony-system
Repo https://github.com/RSkinderowicz/GPUBasedACS
Framework none

3D fully convolutional networks for subcortical segmentation in MRI: A large-scale study

Title 3D fully convolutional networks for subcortical segmentation in MRI: A large-scale study
Authors J. Dolz, C. Desrosiers, I. Ben Ayed
Abstract This study investigates a 3D and fully convolutional neural network (CNN) for subcortical brain structure segmentation in MRI. 3D CNN architectures have been generally avoided due to their computational and memory requirements during inference. We address the problem via small kernels, allowing deeper architectures. We further model both local and global context by embedding intermediate-layer outputs in the final prediction, which encourages consistency between features extracted at different scales and embeds fine-grained information directly in the segmentation process. Our model is efficiently trained end-to-end on a graphics processing unit (GPU), in a single stage, exploiting the dense inference capabilities of fully CNNs. We performed comprehensive experiments over two publicly available datasets. First, we demonstrate a state-of-the-art performance on the ISBR dataset. Then, we report a {\em large-scale} multi-site evaluation over 1112 unregistered subject datasets acquired from 17 different sites (ABIDE dataset), with ages ranging from 7 to 64 years, showing that our method is robust to various acquisition protocols, demographics and clinical factors. Our method yielded segmentations that are highly consistent with a standard atlas-based approach, while running in a fraction of the time needed by atlas-based methods and avoiding registration/normalization steps. This makes it convenient for massive multi-site neuroanatomical imaging studies. To the best of our knowledge, our work is the first to study subcortical structure segmentation on such large-scale and heterogeneous data.
Tasks 3D Medical Imaging Segmentation, Brain Segmentation, Medical Image Segmentation
Published 2016-12-12
URL http://arxiv.org/abs/1612.03925v2
PDF http://arxiv.org/pdf/1612.03925v2.pdf
PWC https://paperswithcode.com/paper/3d-fully-convolutional-networks-for
Repo https://github.com/josedolz/LiviaNET
Framework pytorch

Factored Neural Machine Translation

Title Factored Neural Machine Translation
Authors Mercedes García-Martínez, Loïc Barrault, Fethi Bougares
Abstract We present a new approach for neural machine translation (NMT) using the morphological and grammatical decomposition of the words (factors) in the output side of the neural network. This architecture addresses two main problems occurring in MT, namely dealing with a large target language vocabulary and the out of vocabulary (OOV) words. By the means of factors, we are able to handle larger vocabulary and reduce the training time (for systems with equivalent target language vocabulary size). In addition, we can produce new words that are not in the vocabulary. We use a morphological analyser to get a factored representation of each word (lemmas, Part of Speech tag, tense, person, gender and number). We have extended the NMT approach with attention mechanism in order to have two different outputs, one for the lemmas and the other for the rest of the factors. The final translation is built using some \textit{a priori} linguistic information. We compare our extension with a word-based NMT system. The experiments, performed on the IWSLT’15 dataset translating from English to French, show that while the performance do not always increase, the system can manage a much larger vocabulary and consistently reduce the OOV rate. We observe up to 2% BLEU point improvement in a simulated out of domain translation setup.
Tasks Machine Translation
Published 2016-09-15
URL http://arxiv.org/abs/1609.04621v1
PDF http://arxiv.org/pdf/1609.04621v1.pdf
PWC https://paperswithcode.com/paper/factored-neural-machine-translation
Repo https://github.com/lium-lst/nmtpy
Framework none

An Adaptive Test of Independence with Analytic Kernel Embeddings

Title An Adaptive Test of Independence with Analytic Kernel Embeddings
Authors Wittawat Jitkrittum, Zoltan Szabo, Arthur Gretton
Abstract A new computationally efficient dependence measure, and an adaptive statistical test of independence, are proposed. The dependence measure is the difference between analytic embeddings of the joint distribution and the product of the marginals, evaluated at a finite set of locations (features). These features are chosen so as to maximize a lower bound on the test power, resulting in a test that is data-efficient, and that runs in linear time (with respect to the sample size n). The optimized features can be interpreted as evidence to reject the null hypothesis, indicating regions in the joint domain where the joint distribution and the product of the marginals differ most. Consistency of the independence test is established, for an appropriate choice of features. In real-world benchmarks, independence tests using the optimized features perform comparably to the state-of-the-art quadratic-time HSIC test, and outperform competing O(n) and O(n log n) tests.
Tasks
Published 2016-10-15
URL http://arxiv.org/abs/1610.04782v1
PDF http://arxiv.org/pdf/1610.04782v1.pdf
PWC https://paperswithcode.com/paper/an-adaptive-test-of-independence-with
Repo https://github.com/wittawatj/fsic-test
Framework none

Densely Connected Convolutional Networks

Title Densely Connected Convolutional Networks
Authors Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger
Abstract Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections - one between each layer and its subsequent layer - our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet .
Tasks Image Classification, Object Recognition
Published 2016-08-25
URL http://arxiv.org/abs/1608.06993v5
PDF http://arxiv.org/pdf/1608.06993v5.pdf
PWC https://paperswithcode.com/paper/densely-connected-convolutional-networks
Repo https://github.com/andreasveit/densenet-pytorch
Framework pytorch

Boundary-based MWE segmentation with text partitioning

Title Boundary-based MWE segmentation with text partitioning
Authors Jake Ryland Williams
Abstract This work presents a fine-grained, text-chunking algorithm designed for the task of multiword expressions (MWEs) segmentation. As a lexical class, MWEs include a wide variety of idioms, whose automatic identification are a necessity for the handling of colloquial language. This algorithm’s core novelty is its use of non-word tokens, i.e., boundaries, in a bottom-up strategy. Leveraging boundaries refines token-level information, forging high-level performance from relatively basic data. The generality of this model’s feature space allows for its application across languages and domains. Experiments spanning 19 different languages exhibit a broadly-applicable, state-of-the-art model. Evaluation against recent shared-task data places text partitioning as the overall, best performing MWE segmentation algorithm, covering all MWE classes and multiple English domains (including user-generated text). This performance, coupled with a non-combinatorial, fast-running design, produces an ideal combination for implementations at scale, which are facilitated through the release of open-source software.
Tasks Chunking
Published 2016-08-05
URL http://arxiv.org/abs/1608.02025v3
PDF http://arxiv.org/pdf/1608.02025v3.pdf
PWC https://paperswithcode.com/paper/boundary-based-mwe-segmentation-with-text-1
Repo https://github.com/jakerylandwilliams/partitioner
Framework none

Neural Structural Correspondence Learning for Domain Adaptation

Title Neural Structural Correspondence Learning for Domain Adaptation
Authors Yftah Ziser, Roi Reichart
Abstract Domain adaptation, adapting models from domains rich in labeled training data to domains poor in such data, is a fundamental NLP challenge. We introduce a neural network model that marries together ideas from two prominent strands of research on domain adaptation through representation learning: structural correspondence learning (SCL, (Blitzer et al., 2006)) and autoencoder neural networks. Particularly, our model is a three-layer neural network that learns to encode the nonpivot features of an input example into a low-dimensional representation, so that the existence of pivot features (features that are prominent in both domains and convey useful information for the NLP task) in the example can be decoded from that representation. The low-dimensional representation is then employed in a learning algorithm for the task. Moreover, we show how to inject pre-trained word embeddings into our model in order to improve generalization across examples with similar pivot features. On the task of cross-domain product sentiment classification (Blitzer et al., 2007), consisting of 12 domain pairs, our model outperforms both the SCL and the marginalized stacked denoising autoencoder (MSDA, (Chen et al., 2012)) methods by 3.77% and 2.17% respectively, on average across domain pairs.
Tasks Denoising, Domain Adaptation, Representation Learning, Sentiment Analysis, Word Embeddings
Published 2016-10-05
URL http://arxiv.org/abs/1610.01588v3
PDF http://arxiv.org/pdf/1610.01588v3.pdf
PWC https://paperswithcode.com/paper/neural-structural-correspondence-learning-for
Repo https://github.com/yftah89/Neural-SCLDomain-Adaptation
Framework none

An Investigation of Recurrent Neural Architectures for Drug Name Recognition

Title An Investigation of Recurrent Neural Architectures for Drug Name Recognition
Authors Raghavendra Chalapathy, Ehsan Zare Borzeshi, Massimo Piccardi
Abstract Drug name recognition (DNR) is an essential step in the Pharmacovigilance (PV) pipeline. DNR aims to find drug name mentions in unstructured biomedical texts and classify them into predefined categories. State-of-the-art DNR approaches heavily rely on hand crafted features and domain specific resources which are difficult to collect and tune. For this reason, this paper investigates the effectiveness of contemporary recurrent neural architectures - the Elman and Jordan networks and the bidirectional LSTM with CRF decoding - at performing DNR straight from the text. The experimental results achieved on the authoritative SemEval-2013 Task 9.1 benchmarks show that the bidirectional LSTM-CRF ranks closely to highly-dedicated, hand-crafted systems.
Tasks
Published 2016-09-24
URL http://arxiv.org/abs/1609.07585v1
PDF http://arxiv.org/pdf/1609.07585v1.pdf
PWC https://paperswithcode.com/paper/an-investigation-of-recurrent-neural
Repo https://github.com/raghavchalapathy/dnr
Framework none

Wide-Slice Residual Networks for Food Recognition

Title Wide-Slice Residual Networks for Food Recognition
Authors Niki Martinel, Gian Luca Foresti, Christian Micheloni
Abstract Food diary applications represent a tantalizing market. Such applications, based on image food recognition, opened to new challenges for computer vision and pattern recognition algorithms. Recent works in the field are focusing either on hand-crafted representations or on learning these by exploiting deep neural networks. Despite the success of such a last family of works, these generally exploit off-the shelf deep architectures to classify food dishes. Thus, the architectures are not cast to the specific problem. We believe that better results can be obtained if the deep architecture is defined with respect to an analysis of the food composition. Following such an intuition, this work introduces a new deep scheme that is designed to handle the food structure. Specifically, inspired by the recent success of residual deep network, we exploit such a learning scheme and introduce a slice convolution block to capture the vertical food layers. Outputs of the deep residual blocks are combined with the sliced convolution to produce the classification score for specific food categories. To evaluate our proposed architecture we have conducted experimental results on three benchmark datasets. Results demonstrate that our solution shows better performance with respect to existing approaches (e.g., a top-1 accuracy of 90.27% on the Food-101 challenging dataset).
Tasks Image Classification
Published 2016-12-20
URL http://arxiv.org/abs/1612.06543v1
PDF http://arxiv.org/pdf/1612.06543v1.pdf
PWC https://paperswithcode.com/paper/wide-slice-residual-networks-for-food
Repo https://github.com/fishba11/food-1010keras
Framework tf

Neural-based Noise Filtering from Word Embeddings

Title Neural-based Noise Filtering from Word Embeddings
Authors Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu
Abstract Word embeddings have been demonstrated to benefit NLP tasks impressively. Yet, there is room for improvement in the vector representations, because current word embeddings typically contain unnecessary information, i.e., noise. We propose two novel models to improve word embeddings by unsupervised learning, in order to yield word denoising embeddings. The word denoising embeddings are obtained by strengthening salient information and weakening noise in the original word embeddings, based on a deep feed-forward neural network filter. Results from benchmark tasks show that the filtered word denoising embeddings outperform the original word embeddings.
Tasks Denoising, Word Embeddings
Published 2016-10-06
URL http://arxiv.org/abs/1610.01874v1
PDF http://arxiv.org/pdf/1610.01874v1.pdf
PWC https://paperswithcode.com/paper/neural-based-noise-filtering-from-word
Repo https://github.com/nguyenkh/NeuralDenoising
Framework none

A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models

Title A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models
Authors Chelsea Finn, Paul Christiano, Pieter Abbeel, Sergey Levine
Abstract Generative adversarial networks (GANs) are a recently proposed class of generative models in which a generator is trained to optimize a cost function that is being simultaneously learned by a discriminator. While the idea of learning cost functions is relatively new to the field of generative modeling, learning costs has long been studied in control and reinforcement learning (RL) domains, typically for imitation learning from demonstrations. In these fields, learning cost function underlying observed behavior is known as inverse reinforcement learning (IRL) or inverse optimal control. While at first the connection between cost learning in RL and cost learning in generative modeling may appear to be a superficial one, we show in this paper that certain IRL methods are in fact mathematically equivalent to GANs. In particular, we demonstrate an equivalence between a sample-based algorithm for maximum entropy IRL and a GAN in which the generator’s density can be evaluated and is provided as an additional input to the discriminator. Interestingly, maximum entropy IRL is a special case of an energy-based model. We discuss the interpretation of GANs as an algorithm for training energy-based models, and relate this interpretation to other recent work that seeks to connect GANs and EBMs. By formally highlighting the connection between GANs, IRL, and EBMs, we hope that researchers in all three communities can better identify and apply transferable ideas from one domain to another, particularly for developing more stable and scalable algorithms: a major challenge in all three domains.
Tasks Imitation Learning
Published 2016-11-11
URL http://arxiv.org/abs/1611.03852v3
PDF http://arxiv.org/pdf/1611.03852v3.pdf
PWC https://paperswithcode.com/paper/a-connection-between-generative-adversarial
Repo https://github.com/hsilva664/nstep_airl
Framework tf
comments powered by Disqus