July 29, 2019

2773 words 14 mins read

Paper Group AWR 89

Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering. Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation. Variational Attention for Sequence-to-Sequence Models. MoNoise: Modeling Noise Using a Modular Normalization System. Exploring the structure of a real-time, arbitrary neura …

Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering


Title	Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering
Authors	Vahid Kazemi, Ali Elqursh
Abstract	This paper presents a new baseline for visual question answering task. Given an image and a question in natural language, our model produces accurate answers according to the content of the image. Our model, while being architecturally simple and relatively small in terms of trainable parameters, sets a new state of the art on both unbalanced and balanced VQA benchmark. On VQA 1.0 open ended challenge, our model achieves 64.6% accuracy on the test-standard set without using additional data, an improvement of 0.4% over state of the art, and on newly released VQA 2.0, our model scores 59.7% on validation set outperforming best previously reported results by 0.5%. The results presented in this paper are especially interesting because very similar models have been tried before but significantly lower performance were reported. In light of the new results we hope to see more meaningful research on visual question answering in the future.
Tasks	Visual Question Answering
Published	2017-04-11
URL	http://arxiv.org/abs/1704.03162v2
PDF	http://arxiv.org/pdf/1704.03162v2.pdf
PWC	https://paperswithcode.com/paper/show-ask-attend-and-answer-a-strong-baseline
Repo	https://github.com/mshahbazi72/visual-question-answering
Framework	none

Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation


Title	Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation
Authors	Hongliang Yan, Yukang Ding, Peihua Li, Qilong Wang, Yong Xu, Wangmeng Zuo
Abstract	In domain adaptation, maximum mean discrepancy (MMD) has been widely adopted as a discrepancy metric between the distributions of source and target domains. However, existing MMD-based domain adaptation methods generally ignore the changes of class prior distributions, i.e., class weight bias across domains. This remains an open problem but ubiquitous for domain adaptation, which can be caused by changes in sample selection criteria and application scenarios. We show that MMD cannot account for class weight bias and results in degraded domain adaptation performance. To address this issue, a weighted MMD model is proposed in this paper. Specifically, we introduce class-specific auxiliary weights into the original MMD for exploiting the class prior probability on source and target domains, whose challenge lies in the fact that the class label in target domain is unavailable. To account for it, our proposed weighted MMD model is defined by introducing an auxiliary weight for each class in the source domain, and a classification EM algorithm is suggested by alternating between assigning the pseudo-labels, estimating auxiliary weights and updating model parameters. Extensive experiments demonstrate the superiority of our weighted MMD over conventional MMD for domain adaptation.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2017-05-01
URL	http://arxiv.org/abs/1705.00609v1
PDF	http://arxiv.org/pdf/1705.00609v1.pdf
PWC	https://paperswithcode.com/paper/mind-the-class-weight-bias-weighted-maximum
Repo	https://github.com/yhldhit/WMMD-Caffe
Framework	none

Variational Attention for Sequence-to-Sequence Models


Title	Variational Attention for Sequence-to-Sequence Models
Authors	Hareesh Bahuleyan, Lili Mou, Olga Vechtomova, Pascal Poupart
Abstract	The variational encoder-decoder (VED) encodes source information as a set of random variables using a neural network, which in turn is decoded into target data using another neural network. In natural language processing, sequence-to-sequence (Seq2Seq) models typically serve as encoder-decoder networks. When combined with a traditional (deterministic) attention mechanism, the variational latent space may be bypassed by the attention model, and thus becomes ineffective. In this paper, we propose a variational attention mechanism for VED, where the attention vector is also modeled as Gaussian distributed random variables. Results on two experiments show that, without loss of quality, our proposed method alleviates the bypassing phenomenon as it increases the diversity of generated sentences.
Tasks
Published	2017-12-21
URL	http://arxiv.org/abs/1712.08207v3
PDF	http://arxiv.org/pdf/1712.08207v3.pdf
PWC	https://paperswithcode.com/paper/variational-attention-for-sequence-to
Repo	https://github.com/keshavvinayak01/Dramatic-Chatbot
Framework	none

MoNoise: Modeling Noise Using a Modular Normalization System


Title	MoNoise: Modeling Noise Using a Modular Normalization System
Authors	Rob van der Goot, Gertjan van Noord
Abstract	We propose MoNoise: a normalization model focused on generalizability and efficiency, it aims at being easily reusable and adaptable. Normalization is the task of translating texts from a non- canonical domain to a more canonical domain, in our case: from social media data to standard language. Our proposed model is based on a modular candidate generation in which each module is responsible for a different type of normalization action. The most important generation modules are a spelling correction system and a word embeddings module. Depending on the definition of the normalization task, a static lookup list can be crucial for performance. We train a random forest classifier to rank the candidates, which generalizes well to all different types of normaliza- tion actions. Most features for the ranking originate from the generation modules; besides these features, N-gram features prove to be an important source of information. We show that MoNoise beats the state-of-the-art on different normalization benchmarks for English and Dutch, which all define the task of normalization slightly different.
Tasks	Lexical Normalization, Spelling Correction, Word Embeddings
Published	2017-10-10
URL	http://arxiv.org/abs/1710.03476v1
PDF	http://arxiv.org/pdf/1710.03476v1.pdf
PWC	https://paperswithcode.com/paper/monoise-modeling-noise-using-a-modular
Repo	https://github.com/wesselreijngoud/masterthesis2019
Framework	none

Exploring the structure of a real-time, arbitrary neural artistic stylization network


Title	Exploring the structure of a real-time, arbitrary neural artistic stylization network
Authors	Golnaz Ghiasi, Honglak Lee, Manjunath Kudlur, Vincent Dumoulin, Jonathon Shlens
Abstract	In this paper, we present a method which combines the flexibility of the neural algorithm of artistic style with the speed of fast style transfer networks to allow real-time stylization using any content/style image pair. We build upon recent work leveraging conditional instance normalization for multi-style transfer networks by learning to predict the conditional instance normalization parameters directly from a style image. The model is successfully trained on a corpus of roughly 80,000 paintings and is able to generalize to paintings previously unobserved. We demonstrate that the learned embedding space is smooth and contains a rich structure and organizes semantic information associated with paintings in an entirely unsupervised manner.
Tasks	Style Transfer
Published	2017-05-18
URL	http://arxiv.org/abs/1705.06830v2
PDF	http://arxiv.org/pdf/1705.06830v2.pdf
PWC	https://paperswithcode.com/paper/exploring-the-structure-of-a-real-time
Repo	https://github.com/telecombcn-dl/2018-dlai-team5
Framework	tf

Generative Adversarial Networks: An Overview


Title	Generative Adversarial Networks: An Overview
Authors	Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, Anil A Bharath
Abstract	Generative adversarial networks (GANs) provide a way to learn deep representations without extensively annotated training data. They achieve this through deriving backpropagation signals through a competitive process involving a pair of networks. The representations that can be learned by GANs may be used in a variety of applications, including image synthesis, semantic image editing, style transfer, image super-resolution and classification. The aim of this review paper is to provide an overview of GANs for the signal processing community, drawing on familiar analogies and concepts where possible. In addition to identifying different methods for training and constructing GANs, we also point to remaining challenges in their theory and application.
Tasks	Image Generation, Image Super-Resolution, Style Transfer, Super-Resolution
Published	2017-10-19
URL	http://arxiv.org/abs/1710.07035v1
PDF	http://arxiv.org/pdf/1710.07035v1.pdf
PWC	https://paperswithcode.com/paper/generative-adversarial-networks-an-overview
Repo	https://github.com/ShutoAraki/EverybodyDanceNow
Framework	none

Understanding Traffic Density from Large-Scale Web Camera Data


Title	Understanding Traffic Density from Large-Scale Web Camera Data
Authors	Shanghang Zhang, Guanhang Wu, João P. Costeira, José M. F. Moura
Abstract	Understanding traffic density from large-scale web camera (webcam) videos is a challenging problem because such videos have low spatial and temporal resolution, high occlusion and large perspective. To deeply understand traffic density, we explore both deep learning based and optimization based methods. To avoid individual vehicle detection and tracking, both methods map the image into vehicle density map, one based on rank constrained regression and the other one based on fully convolution networks (FCN). The regression based method learns different weights for different blocks in the image to increase freedom degrees of weights and embed perspective information. The FCN based method jointly estimates vehicle density map and vehicle count with a residual learning framework to perform end-to-end dense prediction, allowing arbitrary image resolution, and adapting to different vehicle scales and perspectives. We analyze and compare both methods, and get insights from optimization based method to improve deep model. Since existing datasets do not cover all the challenges in our work, we collected and labelled a large-scale traffic video dataset, containing 60 million frames from 212 webcams. Both methods are extensively evaluated and compared on different counting tasks and datasets. FCN based method significantly reduces the mean absolute error from 10.99 to 5.31 on the public dataset TRANCOS compared with the state-of-the-art baseline.
Tasks
Published	2017-03-17
URL	http://arxiv.org/abs/1703.05868v3
PDF	http://arxiv.org/pdf/1703.05868v3.pdf
PWC	https://paperswithcode.com/paper/understanding-traffic-density-from-large
Repo	https://github.com/polltooh/traffic_video_analysis
Framework	tf

Convergence Analysis of Deterministic Kernel-Based Quadrature Rules in Misspecified Settings


Title	Convergence Analysis of Deterministic Kernel-Based Quadrature Rules in Misspecified Settings
Authors	Motonobu Kanagawa, Bharath K. Sriperumbudur, Kenji Fukumizu
Abstract	This paper presents a convergence analysis of kernel-based quadrature rules in misspecified settings, focusing on deterministic quadrature in Sobolev spaces. In particular, we deal with misspecified settings where a test integrand is less smooth than a Sobolev RKHS based on which a quadrature rule is constructed. We provide convergence guarantees based on two different assumptions on a quadrature rule: one on quadrature weights, and the other on design points. More precisely, we show that convergence rates can be derived (i) if the sum of absolute weights remains constant (or does not increase quickly), or (ii) if the minimum distance between design points does not decrease very quickly. As a consequence of the latter result, we derive a rate of convergence for Bayesian quadrature in misspecified settings. We reveal a condition on design points to make Bayesian quadrature robust to misspecification, and show that, under this condition, it may adaptively achieve the optimal rate of convergence in the Sobolev space of a lesser order (i.e., of the unknown smoothness of a test integrand), under a slightly stronger regularity condition on the integrand.
Tasks
Published	2017-09-01
URL	http://arxiv.org/abs/1709.00147v2
PDF	http://arxiv.org/pdf/1709.00147v2.pdf
PWC	https://paperswithcode.com/paper/convergence-analysis-of-deterministic-kernel
Repo	https://github.com/motonobuk/kernel-quadrature
Framework	none

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms


Title	Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Authors	Han Xiao, Kashif Rasul, Roland Vollgraf
Abstract	We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits. The dataset is freely available at https://github.com/zalandoresearch/fashion-mnist
Tasks
Published	2017-08-25
URL	http://arxiv.org/abs/1708.07747v2
PDF	http://arxiv.org/pdf/1708.07747v2.pdf
PWC	https://paperswithcode.com/paper/fashion-mnist-a-novel-image-dataset-for
Repo	https://github.com/harshit17chaudhary/SML_assignment_1
Framework	tf

Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities


Title	Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities
Authors	Guo-Jun Qi
Abstract	In this paper, we present the Lipschitz regularization theory and algorithms for a novel Loss-Sensitive Generative Adversarial Network (LS-GAN). Specifically, it trains a loss function to distinguish between real and fake samples by designated margins, while learning a generator alternately to produce realistic samples by minimizing their losses. The LS-GAN further regularizes its loss function with a Lipschitz regularity condition on the density of real data, yielding a regularized model that can better generalize to produce new data from a reasonable number of training examples than the classic GAN. We will further present a Generalized LS-GAN (GLS-GAN) and show it contains a large family of regularized GAN models, including both LS-GAN and Wasserstein GAN, as its special cases. Compared with the other GAN models, we will conduct experiments to show both LS-GAN and GLS-GAN exhibit competitive ability in generating new images in terms of the Minimum Reconstruction Error (MRE) assessed on a separate test set. We further extend the LS-GAN to a conditional form for supervised and semi-supervised learning problems, and demonstrate its outstanding performance on image classification tasks.
Tasks	Image Classification, Image Generation
Published	2017-01-23
URL	http://arxiv.org/abs/1701.06264v6
PDF	http://arxiv.org/pdf/1701.06264v6.pdf
PWC	https://paperswithcode.com/paper/loss-sensitive-generative-adversarial
Repo	https://github.com/guojunq/lsgan
Framework	torch

Are Emojis Predictable?


Title	Are Emojis Predictable?
Authors	Francesco Barbieri, Miguel Ballesteros, Horacio Saggion
Abstract	Emojis are ideograms which are naturally combined with plain text to visually complement or condense the meaning of a message. Despite being widely used in social media, their underlying semantics have received little attention from a Natural Language Processing standpoint. In this paper, we investigate the relation between words and emojis, studying the novel task of predicting which emojis are evoked by text-based tweet messages. We train several models based on Long Short-Term Memory networks (LSTMs) in this task. Our experimental results show that our neural model outperforms two baselines as well as humans solving the same task, suggesting that computational models are able to better capture the underlying semantics of emojis.
Tasks
Published	2017-02-23
URL	http://arxiv.org/abs/1702.07285v2
PDF	http://arxiv.org/pdf/1702.07285v2.pdf
PWC	https://paperswithcode.com/paper/are-emojis-predictable
Repo	https://github.com/MMU-TDMLab/EmojiGraph
Framework	none

Unmasking the abnormal events in video


Title	Unmasking the abnormal events in video
Authors	Radu Tudor Ionescu, Sorina Smeureanu, Bogdan Alexe, Marius Popescu
Abstract	We propose a novel framework for abnormal event detection in video that requires no training sequences. Our framework is based on unmasking, a technique previously used for authorship verification in text documents, which we adapt to our task. We iteratively train a binary classifier to distinguish between two consecutive video sequences while removing at each step the most discriminant features. Higher training accuracy rates of the intermediately obtained classifiers represent abnormal events. To the best of our knowledge, this is the first work to apply unmasking for a computer vision task. We compare our method with several state-of-the-art supervised and unsupervised methods on four benchmark data sets. The empirical results indicate that our abnormal event detection framework can achieve state-of-the-art results, while running in real-time at 20 frames per second.
Tasks	Abnormal Event Detection In Video
Published	2017-05-23
URL	http://arxiv.org/abs/1705.08182v3
PDF	http://arxiv.org/pdf/1705.08182v3.pdf
PWC	https://paperswithcode.com/paper/unmasking-the-abnormal-events-in-video
Repo	https://github.com/MYusha/Video-Anomaly-Detection
Framework	none

A Simple and Accurate Syntax-Agnostic Neural Model for Dependency-based Semantic Role Labeling


Title	A Simple and Accurate Syntax-Agnostic Neural Model for Dependency-based Semantic Role Labeling
Authors	Diego Marcheggiani, Anton Frolov, Ivan Titov
Abstract	We introduce a simple and accurate neural model for dependency-based semantic role labeling. Our model predicts predicate-argument dependencies relying on states of a bidirectional LSTM encoder. The semantic role labeler achieves competitive performance on English, even without any kind of syntactic information and only using local inference. However, when automatically predicted part-of-speech tags are provided as input, it substantially outperforms all previous local models and approaches the best reported results on the English CoNLL-2009 dataset. We also consider Chinese, Czech and Spanish where our approach also achieves competitive results. Syntactic parsers are unreliable on out-of-domain data, so standard (i.e., syntactically-informed) SRL models are hindered when tested in this setting. Our syntax-agnostic model appears more robust, resulting in the best reported results on standard out-of-domain test sets.
Tasks	Semantic Role Labeling
Published	2017-01-10
URL	http://arxiv.org/abs/1701.02593v2
PDF	http://arxiv.org/pdf/1701.02593v2.pdf
PWC	https://paperswithcode.com/paper/a-simple-and-accurate-syntax-agnostic-neural
Repo	https://github.com/jungokasai/stagging_srl
Framework	tf


Title	Intention-Net: Integrating Planning and Deep Learning for Goal-Directed Autonomous Navigation
Authors	Wei Gao, David Hsu, Wee Sun Lee, Shengmei Shen, Karthikk Subramanian
Abstract	How can a delivery robot navigate reliably to a destination in a new office building, with minimal prior information? To tackle this challenge, this paper introduces a two-level hierarchical approach, which integrates model-free deep learning and model-based path planning. At the low level, a neural-network motion controller, called the intention-net, is trained end-to-end to provide robust local navigation. The intention-net maps images from a single monocular camera and “intentions” directly to robot controls. At the high level, a path planner uses a crude map, e.g., a 2-D floor plan, to compute a path from the robot’s current location to the goal. The planned path provides intentions to the intention-net. Preliminary experiments suggest that the learned motion controller is robust against perceptual uncertainty and by integrating with a path planner, it generalizes effectively to new environments and goals.
Tasks	Autonomous Navigation
Published	2017-10-16
URL	http://arxiv.org/abs/1710.05627v2
PDF	http://arxiv.org/pdf/1710.05627v2.pdf
PWC	https://paperswithcode.com/paper/intention-net-integrating-planning-and-deep
Repo	https://github.com/ayusefi/Localization-Papers
Framework	none

MIDA: Multiple Imputation using Denoising Autoencoders


Title	MIDA: Multiple Imputation using Denoising Autoencoders
Authors	Lovedeep Gondara, Ke Wang
Abstract	Missing data is a significant problem impacting all domains. State-of-the-art framework for minimizing missing data bias is multiple imputation, for which the choice of an imputation model remains nontrivial. We propose a multiple imputation model based on overcomplete deep denoising autoencoders. Our proposed model is capable of handling different data types, missingness patterns, missingness proportions and distributions. Evaluation on several real life datasets show our proposed model significantly outperforms current state-of-the-art methods under varying conditions while simultaneously improving end of the line analytics.
Tasks	Denoising, Imputation
Published	2017-05-08
URL	http://arxiv.org/abs/1705.02737v3
PDF	http://arxiv.org/pdf/1705.02737v3.pdf
PWC	https://paperswithcode.com/paper/mida-multiple-imputation-using-denoising
Repo	https://github.com/HarryK24/MIDA-pytorch
Framework	pytorch