July 29, 2019

2773 words 14 mins read

Paper Group AWR 89

Paper Group AWR 89

Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering. Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation. Variational Attention for Sequence-to-Sequence Models. MoNoise: Modeling Noise Using a Modular Normalization System. Exploring the structure of a real-time, arbitrary neura …

Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering

Title Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering
Authors Vahid Kazemi, Ali Elqursh
Abstract This paper presents a new baseline for visual question answering task. Given an image and a question in natural language, our model produces accurate answers according to the content of the image. Our model, while being architecturally simple and relatively small in terms of trainable parameters, sets a new state of the art on both unbalanced and balanced VQA benchmark. On VQA 1.0 open ended challenge, our model achieves 64.6% accuracy on the test-standard set without using additional data, an improvement of 0.4% over state of the art, and on newly released VQA 2.0, our model scores 59.7% on validation set outperforming best previously reported results by 0.5%. The results presented in this paper are especially interesting because very similar models have been tried before but significantly lower performance were reported. In light of the new results we hope to see more meaningful research on visual question answering in the future.
Tasks Visual Question Answering
Published 2017-04-11
URL http://arxiv.org/abs/1704.03162v2
PDF http://arxiv.org/pdf/1704.03162v2.pdf
PWC https://paperswithcode.com/paper/show-ask-attend-and-answer-a-strong-baseline
Repo https://github.com/mshahbazi72/visual-question-answering
Framework none

Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation

Title Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation
Authors Hongliang Yan, Yukang Ding, Peihua Li, Qilong Wang, Yong Xu, Wangmeng Zuo
Abstract In domain adaptation, maximum mean discrepancy (MMD) has been widely adopted as a discrepancy metric between the distributions of source and target domains. However, existing MMD-based domain adaptation methods generally ignore the changes of class prior distributions, i.e., class weight bias across domains. This remains an open problem but ubiquitous for domain adaptation, which can be caused by changes in sample selection criteria and application scenarios. We show that MMD cannot account for class weight bias and results in degraded domain adaptation performance. To address this issue, a weighted MMD model is proposed in this paper. Specifically, we introduce class-specific auxiliary weights into the original MMD for exploiting the class prior probability on source and target domains, whose challenge lies in the fact that the class label in target domain is unavailable. To account for it, our proposed weighted MMD model is defined by introducing an auxiliary weight for each class in the source domain, and a classification EM algorithm is suggested by alternating between assigning the pseudo-labels, estimating auxiliary weights and updating model parameters. Extensive experiments demonstrate the superiority of our weighted MMD over conventional MMD for domain adaptation.
Tasks Domain Adaptation, Unsupervised Domain Adaptation
Published 2017-05-01
URL http://arxiv.org/abs/1705.00609v1
PDF http://arxiv.org/pdf/1705.00609v1.pdf
PWC https://paperswithcode.com/paper/mind-the-class-weight-bias-weighted-maximum
Repo https://github.com/yhldhit/WMMD-Caffe
Framework none

Variational Attention for Sequence-to-Sequence Models

Title Variational Attention for Sequence-to-Sequence Models
Authors Hareesh Bahuleyan, Lili Mou, Olga Vechtomova, Pascal Poupart
Abstract The variational encoder-decoder (VED) encodes source information as a set of random variables using a neural network, which in turn is decoded into target data using another neural network. In natural language processing, sequence-to-sequence (Seq2Seq) models typically serve as encoder-decoder networks. When combined with a traditional (deterministic) attention mechanism, the variational latent space may be bypassed by the attention model, and thus becomes ineffective. In this paper, we propose a variational attention mechanism for VED, where the attention vector is also modeled as Gaussian distributed random variables. Results on two experiments show that, without loss of quality, our proposed method alleviates the bypassing phenomenon as it increases the diversity of generated sentences.
Tasks
Published 2017-12-21
URL http://arxiv.org/abs/1712.08207v3
PDF http://arxiv.org/pdf/1712.08207v3.pdf
PWC https://paperswithcode.com/paper/variational-attention-for-sequence-to
Repo https://github.com/keshavvinayak01/Dramatic-Chatbot
Framework none

MoNoise: Modeling Noise Using a Modular Normalization System

Title MoNoise: Modeling Noise Using a Modular Normalization System
Authors Rob van der Goot, Gertjan van Noord
Abstract We propose MoNoise: a normalization model focused on generalizability and efficiency, it aims at being easily reusable and adaptable. Normalization is the task of translating texts from a non- canonical domain to a more canonical domain, in our case: from social media data to standard language. Our proposed model is based on a modular candidate generation in which each module is responsible for a different type of normalization action. The most important generation modules are a spelling correction system and a word embeddings module. Depending on the definition of the normalization task, a static lookup list can be crucial for performance. We train a random forest classifier to rank the candidates, which generalizes well to all different types of normaliza- tion actions. Most features for the ranking originate from the generation modules; besides these features, N-gram features prove to be an important source of information. We show that MoNoise beats the state-of-the-art on different normalization benchmarks for English and Dutch, which all define the task of normalization slightly different.
Tasks Lexical Normalization, Spelling Correction, Word Embeddings
Published 2017-10-10
URL http://arxiv.org/abs/1710.03476v1
PDF http://arxiv.org/pdf/1710.03476v1.pdf
PWC https://paperswithcode.com/paper/monoise-modeling-noise-using-a-modular
Repo https://github.com/wesselreijngoud/masterthesis2019
Framework none

Exploring the structure of a real-time, arbitrary neural artistic stylization network

Title Exploring the structure of a real-time, arbitrary neural artistic stylization network
Authors Golnaz Ghiasi, Honglak Lee, Manjunath Kudlur, Vincent Dumoulin, Jonathon Shlens
Abstract In this paper, we present a method which combines the flexibility of the neural algorithm of artistic style with the speed of fast style transfer networks to allow real-time stylization using any content/style image pair. We build upon recent work leveraging conditional instance normalization for multi-style transfer networks by learning to predict the conditional instance normalization parameters directly from a style image. The model is successfully trained on a corpus of roughly 80,000 paintings and is able to generalize to paintings previously unobserved. We demonstrate that the learned embedding space is smooth and contains a rich structure and organizes semantic information associated with paintings in an entirely unsupervised manner.
Tasks Style Transfer
Published 2017-05-18
URL http://arxiv.org/abs/1705.06830v2
PDF http://arxiv.org/pdf/1705.06830v2.pdf
PWC https://paperswithcode.com/paper/exploring-the-structure-of-a-real-time
Repo https://github.com/telecombcn-dl/2018-dlai-team5
Framework tf

Generative Adversarial Networks: An Overview

Title Generative Adversarial Networks: An Overview
Authors Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, Anil A Bharath
Abstract Generative adversarial networks (GANs) provide a way to learn deep representations without extensively annotated training data. They achieve this through deriving backpropagation signals through a competitive process involving a pair of networks. The representations that can be learned by GANs may be used in a variety of applications, including image synthesis, semantic image editing, style transfer, image super-resolution and classification. The aim of this review paper is to provide an overview of GANs for the signal processing community, drawing on familiar analogies and concepts where possible. In addition to identifying different methods for training and constructing GANs, we also point to remaining challenges in their theory and application.
Tasks Image Generation, Image Super-Resolution, Style Transfer, Super-Resolution
Published 2017-10-19
URL http://arxiv.org/abs/1710.07035v1
PDF http://arxiv.org/pdf/1710.07035v1.pdf
PWC https://paperswithcode.com/paper/generative-adversarial-networks-an-overview
Repo https://github.com/ShutoAraki/EverybodyDanceNow
Framework none

Understanding Traffic Density from Large-Scale Web Camera Data

Title Understanding Traffic Density from Large-Scale Web Camera Data
Authors Shanghang Zhang, Guanhang Wu, João P. Costeira, José M. F. Moura
Abstract Understanding traffic density from large-scale web camera (webcam) videos is a challenging problem because such videos have low spatial and temporal resolution, high occlusion and large perspective. To deeply understand traffic density, we explore both deep learning based and optimization based methods. To avoid individual vehicle detection and tracking, both methods map the image into vehicle density map, one based on rank constrained regression and the other one based on fully convolution networks (FCN). The regression based method learns different weights for different blocks in the image to increase freedom degrees of weights and embed perspective information. The FCN based method jointly estimates vehicle density map and vehicle count with a residual learning framework to perform end-to-end dense prediction, allowing arbitrary image resolution, and adapting to different vehicle scales and perspectives. We analyze and compare both methods, and get insights from optimization based method to improve deep model. Since existing datasets do not cover all the challenges in our work, we collected and labelled a large-scale traffic video dataset, containing 60 million frames from 212 webcams. Both methods are extensively evaluated and compared on different counting tasks and datasets. FCN based method significantly reduces the mean absolute error from 10.99 to 5.31 on the public dataset TRANCOS compared with the state-of-the-art baseline.
Tasks
Published 2017-03-17
URL http://arxiv.org/abs/1703.05868v3
PDF http://arxiv.org/pdf/1703.05868v3.pdf
PWC https://paperswithcode.com/paper/understanding-traffic-density-from-large
Repo https://github.com/polltooh/traffic_video_analysis
Framework tf

Convergence Analysis of Deterministic Kernel-Based Quadrature Rules in Misspecified Settings

Title Convergence Analysis of Deterministic Kernel-Based Quadrature Rules in Misspecified Settings
Authors Motonobu Kanagawa, Bharath K. Sriperumbudur, Kenji Fukumizu
Abstract This paper presents a convergence analysis of kernel-based quadrature rules in misspecified settings, focusing on deterministic quadrature in Sobolev spaces. In particular, we deal with misspecified settings where a test integrand is less smooth than a Sobolev RKHS based on which a quadrature rule is constructed. We provide convergence guarantees based on two different assumptions on a quadrature rule: one on quadrature weights, and the other on design points. More precisely, we show that convergence rates can be derived (i) if the sum of absolute weights remains constant (or does not increase quickly), or (ii) if the minimum distance between design points does not decrease very quickly. As a consequence of the latter result, we derive a rate of convergence for Bayesian quadrature in misspecified settings. We reveal a condition on design points to make Bayesian quadrature robust to misspecification, and show that, under this condition, it may adaptively achieve the optimal rate of convergence in the Sobolev space of a lesser order (i.e., of the unknown smoothness of a test integrand), under a slightly stronger regularity condition on the integrand.
Tasks
Published 2017-09-01
URL http://arxiv.org/abs/1709.00147v2
PDF http://arxiv.org/pdf/1709.00147v2.pdf
PWC https://paperswithcode.com/paper/convergence-analysis-of-deterministic-kernel
Repo https://github.com/motonobuk/kernel-quadrature
Framework none

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Title Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Authors Han Xiao, Kashif Rasul, Roland Vollgraf
Abstract We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits. The dataset is freely available at https://github.com/zalandoresearch/fashion-mnist
Tasks
Published 2017-08-25
URL http://arxiv.org/abs/1708.07747v2
PDF http://arxiv.org/pdf/1708.07747v2.pdf
PWC https://paperswithcode.com/paper/fashion-mnist-a-novel-image-dataset-for
Repo https://github.com/harshit17chaudhary/SML_assignment_1
Framework tf

Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities

Title Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities
Authors Guo-Jun Qi
Abstract In this paper, we present the Lipschitz regularization theory and algorithms for a novel Loss-Sensitive Generative Adversarial Network (LS-GAN). Specifically, it trains a loss function to distinguish between real and fake samples by designated margins, while learning a generator alternately to produce realistic samples by minimizing their losses. The LS-GAN further regularizes its loss function with a Lipschitz regularity condition on the density of real data, yielding a regularized model that can better generalize to produce new data from a reasonable number of training examples than the classic GAN. We will further present a Generalized LS-GAN (GLS-GAN) and show it contains a large family of regularized GAN models, including both LS-GAN and Wasserstein GAN, as its special cases. Compared with the other GAN models, we will conduct experiments to show both LS-GAN and GLS-GAN exhibit competitive ability in generating new images in terms of the Minimum Reconstruction Error (MRE) assessed on a separate test set. We further extend the LS-GAN to a conditional form for supervised and semi-supervised learning problems, and demonstrate its outstanding performance on image classification tasks.
Tasks Image Classification, Image Generation
Published 2017-01-23
URL http://arxiv.org/abs/1701.06264v6
PDF http://arxiv.org/pdf/1701.06264v6.pdf
PWC https://paperswithcode.com/paper/loss-sensitive-generative-adversarial
Repo https://github.com/guojunq/lsgan
Framework torch

Are Emojis Predictable?

Title Are Emojis Predictable?
Authors Francesco Barbieri, Miguel Ballesteros, Horacio Saggion
Abstract Emojis are ideograms which are naturally combined with plain text to visually complement or condense the meaning of a message. Despite being widely used in social media, their underlying semantics have received little attention from a Natural Language Processing standpoint. In this paper, we investigate the relation between words and emojis, studying the novel task of predicting which emojis are evoked by text-based tweet messages. We train several models based on Long Short-Term Memory networks (LSTMs) in this task. Our experimental results show that our neural model outperforms two baselines as well as humans solving the same task, suggesting that computational models are able to better capture the underlying semantics of emojis.
Tasks
Published 2017-02-23
URL http://arxiv.org/abs/1702.07285v2
PDF http://arxiv.org/pdf/1702.07285v2.pdf
PWC https://paperswithcode.com/paper/are-emojis-predictable
Repo https://github.com/MMU-TDMLab/EmojiGraph
Framework none

Unmasking the abnormal events in video

Title Unmasking the abnormal events in video
Authors Radu Tudor Ionescu, Sorina Smeureanu, Bogdan Alexe, Marius Popescu
Abstract We propose a novel framework for abnormal event detection in video that requires no training sequences. Our framework is based on unmasking, a technique previously used for authorship verification in text documents, which we adapt to our task. We iteratively train a binary classifier to distinguish between two consecutive video sequences while removing at each step the most discriminant features. Higher training accuracy rates of the intermediately obtained classifiers represent abnormal events. To the best of our knowledge, this is the first work to apply unmasking for a computer vision task. We compare our method with several state-of-the-art supervised and unsupervised methods on four benchmark data sets. The empirical results indicate that our abnormal event detection framework can achieve state-of-the-art results, while running in real-time at 20 frames per second.
Tasks Abnormal Event Detection In Video
Published 2017-05-23
URL http://arxiv.org/abs/1705.08182v3
PDF http://arxiv.org/pdf/1705.08182v3.pdf
PWC https://paperswithcode.com/paper/unmasking-the-abnormal-events-in-video
Repo https://github.com/MYusha/Video-Anomaly-Detection
Framework none

A Simple and Accurate Syntax-Agnostic Neural Model for Dependency-based Semantic Role Labeling

Title A Simple and Accurate Syntax-Agnostic Neural Model for Dependency-based Semantic Role Labeling
Authors Diego Marcheggiani, Anton Frolov, Ivan Titov
Abstract We introduce a simple and accurate neural model for dependency-based semantic role labeling. Our model predicts predicate-argument dependencies relying on states of a bidirectional LSTM encoder. The semantic role labeler achieves competitive performance on English, even without any kind of syntactic information and only using local inference. However, when automatically predicted part-of-speech tags are provided as input, it substantially outperforms all previous local models and approaches the best reported results on the English CoNLL-2009 dataset. We also consider Chinese, Czech and Spanish where our approach also achieves competitive results. Syntactic parsers are unreliable on out-of-domain data, so standard (i.e., syntactically-informed) SRL models are hindered when tested in this setting. Our syntax-agnostic model appears more robust, resulting in the best reported results on standard out-of-domain test sets.
Tasks Semantic Role Labeling
Published 2017-01-10
URL http://arxiv.org/abs/1701.02593v2
PDF http://arxiv.org/pdf/1701.02593v2.pdf
PWC https://paperswithcode.com/paper/a-simple-and-accurate-syntax-agnostic-neural
Repo https://github.com/jungokasai/stagging_srl
Framework tf

Intention-Net: Integrating Planning and Deep Learning for Goal-Directed Autonomous Navigation

Title Intention-Net: Integrating Planning and Deep Learning for Goal-Directed Autonomous Navigation
Authors Wei Gao, David Hsu, Wee Sun Lee, Shengmei Shen, Karthikk Subramanian
Abstract How can a delivery robot navigate reliably to a destination in a new office building, with minimal prior information? To tackle this challenge, this paper introduces a two-level hierarchical approach, which integrates model-free deep learning and model-based path planning. At the low level, a neural-network motion controller, called the intention-net, is trained end-to-end to provide robust local navigation. The intention-net maps images from a single monocular camera and “intentions” directly to robot controls. At the high level, a path planner uses a crude map, e.g., a 2-D floor plan, to compute a path from the robot’s current location to the goal. The planned path provides intentions to the intention-net. Preliminary experiments suggest that the learned motion controller is robust against perceptual uncertainty and by integrating with a path planner, it generalizes effectively to new environments and goals.
Tasks Autonomous Navigation
Published 2017-10-16
URL http://arxiv.org/abs/1710.05627v2
PDF http://arxiv.org/pdf/1710.05627v2.pdf
PWC https://paperswithcode.com/paper/intention-net-integrating-planning-and-deep
Repo https://github.com/ayusefi/Localization-Papers
Framework none

MIDA: Multiple Imputation using Denoising Autoencoders

Title MIDA: Multiple Imputation using Denoising Autoencoders
Authors Lovedeep Gondara, Ke Wang
Abstract Missing data is a significant problem impacting all domains. State-of-the-art framework for minimizing missing data bias is multiple imputation, for which the choice of an imputation model remains nontrivial. We propose a multiple imputation model based on overcomplete deep denoising autoencoders. Our proposed model is capable of handling different data types, missingness patterns, missingness proportions and distributions. Evaluation on several real life datasets show our proposed model significantly outperforms current state-of-the-art methods under varying conditions while simultaneously improving end of the line analytics.
Tasks Denoising, Imputation
Published 2017-05-08
URL http://arxiv.org/abs/1705.02737v3
PDF http://arxiv.org/pdf/1705.02737v3.pdf
PWC https://paperswithcode.com/paper/mida-multiple-imputation-using-denoising
Repo https://github.com/HarryK24/MIDA-pytorch
Framework pytorch
comments powered by Disqus