Paper Group AWR 89
Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering. Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation. Variational Attention for Sequence-to-Sequence Models. MoNoise: Modeling Noise Using a Modular Normalization System. Exploring the structure of a real-time, arbitrary neura …
Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering
Title | Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering |
Authors | Vahid Kazemi, Ali Elqursh |
Abstract | This paper presents a new baseline for visual question answering task. Given an image and a question in natural language, our model produces accurate answers according to the content of the image. Our model, while being architecturally simple and relatively small in terms of trainable parameters, sets a new state of the art on both unbalanced and balanced VQA benchmark. On VQA 1.0 open ended challenge, our model achieves 64.6% accuracy on the test-standard set without using additional data, an improvement of 0.4% over state of the art, and on newly released VQA 2.0, our model scores 59.7% on validation set outperforming best previously reported results by 0.5%. The results presented in this paper are especially interesting because very similar models have been tried before but significantly lower performance were reported. In light of the new results we hope to see more meaningful research on visual question answering in the future. |
Tasks | Visual Question Answering |
Published | 2017-04-11 |
URL | http://arxiv.org/abs/1704.03162v2 |
http://arxiv.org/pdf/1704.03162v2.pdf | |
PWC | https://paperswithcode.com/paper/show-ask-attend-and-answer-a-strong-baseline |
Repo | https://github.com/mshahbazi72/visual-question-answering |
Framework | none |
Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation
Title | Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation |
Authors | Hongliang Yan, Yukang Ding, Peihua Li, Qilong Wang, Yong Xu, Wangmeng Zuo |
Abstract | In domain adaptation, maximum mean discrepancy (MMD) has been widely adopted as a discrepancy metric between the distributions of source and target domains. However, existing MMD-based domain adaptation methods generally ignore the changes of class prior distributions, i.e., class weight bias across domains. This remains an open problem but ubiquitous for domain adaptation, which can be caused by changes in sample selection criteria and application scenarios. We show that MMD cannot account for class weight bias and results in degraded domain adaptation performance. To address this issue, a weighted MMD model is proposed in this paper. Specifically, we introduce class-specific auxiliary weights into the original MMD for exploiting the class prior probability on source and target domains, whose challenge lies in the fact that the class label in target domain is unavailable. To account for it, our proposed weighted MMD model is defined by introducing an auxiliary weight for each class in the source domain, and a classification EM algorithm is suggested by alternating between assigning the pseudo-labels, estimating auxiliary weights and updating model parameters. Extensive experiments demonstrate the superiority of our weighted MMD over conventional MMD for domain adaptation. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2017-05-01 |
URL | http://arxiv.org/abs/1705.00609v1 |
http://arxiv.org/pdf/1705.00609v1.pdf | |
PWC | https://paperswithcode.com/paper/mind-the-class-weight-bias-weighted-maximum |
Repo | https://github.com/yhldhit/WMMD-Caffe |
Framework | none |
Variational Attention for Sequence-to-Sequence Models
Title | Variational Attention for Sequence-to-Sequence Models |
Authors | Hareesh Bahuleyan, Lili Mou, Olga Vechtomova, Pascal Poupart |
Abstract | The variational encoder-decoder (VED) encodes source information as a set of random variables using a neural network, which in turn is decoded into target data using another neural network. In natural language processing, sequence-to-sequence (Seq2Seq) models typically serve as encoder-decoder networks. When combined with a traditional (deterministic) attention mechanism, the variational latent space may be bypassed by the attention model, and thus becomes ineffective. In this paper, we propose a variational attention mechanism for VED, where the attention vector is also modeled as Gaussian distributed random variables. Results on two experiments show that, without loss of quality, our proposed method alleviates the bypassing phenomenon as it increases the diversity of generated sentences. |
Tasks | |
Published | 2017-12-21 |
URL | http://arxiv.org/abs/1712.08207v3 |
http://arxiv.org/pdf/1712.08207v3.pdf | |
PWC | https://paperswithcode.com/paper/variational-attention-for-sequence-to |
Repo | https://github.com/keshavvinayak01/Dramatic-Chatbot |
Framework | none |
MoNoise: Modeling Noise Using a Modular Normalization System
Title | MoNoise: Modeling Noise Using a Modular Normalization System |
Authors | Rob van der Goot, Gertjan van Noord |
Abstract | We propose MoNoise: a normalization model focused on generalizability and efficiency, it aims at being easily reusable and adaptable. Normalization is the task of translating texts from a non- canonical domain to a more canonical domain, in our case: from social media data to standard language. Our proposed model is based on a modular candidate generation in which each module is responsible for a different type of normalization action. The most important generation modules are a spelling correction system and a word embeddings module. Depending on the definition of the normalization task, a static lookup list can be crucial for performance. We train a random forest classifier to rank the candidates, which generalizes well to all different types of normaliza- tion actions. Most features for the ranking originate from the generation modules; besides these features, N-gram features prove to be an important source of information. We show that MoNoise beats the state-of-the-art on different normalization benchmarks for English and Dutch, which all define the task of normalization slightly different. |
Tasks | Lexical Normalization, Spelling Correction, Word Embeddings |
Published | 2017-10-10 |
URL | http://arxiv.org/abs/1710.03476v1 |
http://arxiv.org/pdf/1710.03476v1.pdf | |
PWC | https://paperswithcode.com/paper/monoise-modeling-noise-using-a-modular |
Repo | https://github.com/wesselreijngoud/masterthesis2019 |
Framework | none |
Exploring the structure of a real-time, arbitrary neural artistic stylization network
Title | Exploring the structure of a real-time, arbitrary neural artistic stylization network |
Authors | Golnaz Ghiasi, Honglak Lee, Manjunath Kudlur, Vincent Dumoulin, Jonathon Shlens |
Abstract | In this paper, we present a method which combines the flexibility of the neural algorithm of artistic style with the speed of fast style transfer networks to allow real-time stylization using any content/style image pair. We build upon recent work leveraging conditional instance normalization for multi-style transfer networks by learning to predict the conditional instance normalization parameters directly from a style image. The model is successfully trained on a corpus of roughly 80,000 paintings and is able to generalize to paintings previously unobserved. We demonstrate that the learned embedding space is smooth and contains a rich structure and organizes semantic information associated with paintings in an entirely unsupervised manner. |
Tasks | Style Transfer |
Published | 2017-05-18 |
URL | http://arxiv.org/abs/1705.06830v2 |
http://arxiv.org/pdf/1705.06830v2.pdf | |
PWC | https://paperswithcode.com/paper/exploring-the-structure-of-a-real-time |
Repo | https://github.com/telecombcn-dl/2018-dlai-team5 |
Framework | tf |
Generative Adversarial Networks: An Overview
Title | Generative Adversarial Networks: An Overview |
Authors | Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, Anil A Bharath |
Abstract | Generative adversarial networks (GANs) provide a way to learn deep representations without extensively annotated training data. They achieve this through deriving backpropagation signals through a competitive process involving a pair of networks. The representations that can be learned by GANs may be used in a variety of applications, including image synthesis, semantic image editing, style transfer, image super-resolution and classification. The aim of this review paper is to provide an overview of GANs for the signal processing community, drawing on familiar analogies and concepts where possible. In addition to identifying different methods for training and constructing GANs, we also point to remaining challenges in their theory and application. |
Tasks | Image Generation, Image Super-Resolution, Style Transfer, Super-Resolution |
Published | 2017-10-19 |
URL | http://arxiv.org/abs/1710.07035v1 |
http://arxiv.org/pdf/1710.07035v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-adversarial-networks-an-overview |
Repo | https://github.com/ShutoAraki/EverybodyDanceNow |
Framework | none |
Understanding Traffic Density from Large-Scale Web Camera Data
Title | Understanding Traffic Density from Large-Scale Web Camera Data |
Authors | Shanghang Zhang, Guanhang Wu, João P. Costeira, José M. F. Moura |
Abstract | Understanding traffic density from large-scale web camera (webcam) videos is a challenging problem because such videos have low spatial and temporal resolution, high occlusion and large perspective. To deeply understand traffic density, we explore both deep learning based and optimization based methods. To avoid individual vehicle detection and tracking, both methods map the image into vehicle density map, one based on rank constrained regression and the other one based on fully convolution networks (FCN). The regression based method learns different weights for different blocks in the image to increase freedom degrees of weights and embed perspective information. The FCN based method jointly estimates vehicle density map and vehicle count with a residual learning framework to perform end-to-end dense prediction, allowing arbitrary image resolution, and adapting to different vehicle scales and perspectives. We analyze and compare both methods, and get insights from optimization based method to improve deep model. Since existing datasets do not cover all the challenges in our work, we collected and labelled a large-scale traffic video dataset, containing 60 million frames from 212 webcams. Both methods are extensively evaluated and compared on different counting tasks and datasets. FCN based method significantly reduces the mean absolute error from 10.99 to 5.31 on the public dataset TRANCOS compared with the state-of-the-art baseline. |
Tasks | |
Published | 2017-03-17 |
URL | http://arxiv.org/abs/1703.05868v3 |
http://arxiv.org/pdf/1703.05868v3.pdf | |
PWC | https://paperswithcode.com/paper/understanding-traffic-density-from-large |
Repo | https://github.com/polltooh/traffic_video_analysis |
Framework | tf |
Convergence Analysis of Deterministic Kernel-Based Quadrature Rules in Misspecified Settings
Title | Convergence Analysis of Deterministic Kernel-Based Quadrature Rules in Misspecified Settings |
Authors | Motonobu Kanagawa, Bharath K. Sriperumbudur, Kenji Fukumizu |
Abstract | This paper presents a convergence analysis of kernel-based quadrature rules in misspecified settings, focusing on deterministic quadrature in Sobolev spaces. In particular, we deal with misspecified settings where a test integrand is less smooth than a Sobolev RKHS based on which a quadrature rule is constructed. We provide convergence guarantees based on two different assumptions on a quadrature rule: one on quadrature weights, and the other on design points. More precisely, we show that convergence rates can be derived (i) if the sum of absolute weights remains constant (or does not increase quickly), or (ii) if the minimum distance between design points does not decrease very quickly. As a consequence of the latter result, we derive a rate of convergence for Bayesian quadrature in misspecified settings. We reveal a condition on design points to make Bayesian quadrature robust to misspecification, and show that, under this condition, it may adaptively achieve the optimal rate of convergence in the Sobolev space of a lesser order (i.e., of the unknown smoothness of a test integrand), under a slightly stronger regularity condition on the integrand. |
Tasks | |
Published | 2017-09-01 |
URL | http://arxiv.org/abs/1709.00147v2 |
http://arxiv.org/pdf/1709.00147v2.pdf | |
PWC | https://paperswithcode.com/paper/convergence-analysis-of-deterministic-kernel |
Repo | https://github.com/motonobuk/kernel-quadrature |
Framework | none |
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Title | Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms |
Authors | Han Xiao, Kashif Rasul, Roland Vollgraf |
Abstract | We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits. The dataset is freely available at https://github.com/zalandoresearch/fashion-mnist |
Tasks | |
Published | 2017-08-25 |
URL | http://arxiv.org/abs/1708.07747v2 |
http://arxiv.org/pdf/1708.07747v2.pdf | |
PWC | https://paperswithcode.com/paper/fashion-mnist-a-novel-image-dataset-for |
Repo | https://github.com/harshit17chaudhary/SML_assignment_1 |
Framework | tf |
Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities
Title | Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities |
Authors | Guo-Jun Qi |
Abstract | In this paper, we present the Lipschitz regularization theory and algorithms for a novel Loss-Sensitive Generative Adversarial Network (LS-GAN). Specifically, it trains a loss function to distinguish between real and fake samples by designated margins, while learning a generator alternately to produce realistic samples by minimizing their losses. The LS-GAN further regularizes its loss function with a Lipschitz regularity condition on the density of real data, yielding a regularized model that can better generalize to produce new data from a reasonable number of training examples than the classic GAN. We will further present a Generalized LS-GAN (GLS-GAN) and show it contains a large family of regularized GAN models, including both LS-GAN and Wasserstein GAN, as its special cases. Compared with the other GAN models, we will conduct experiments to show both LS-GAN and GLS-GAN exhibit competitive ability in generating new images in terms of the Minimum Reconstruction Error (MRE) assessed on a separate test set. We further extend the LS-GAN to a conditional form for supervised and semi-supervised learning problems, and demonstrate its outstanding performance on image classification tasks. |
Tasks | Image Classification, Image Generation |
Published | 2017-01-23 |
URL | http://arxiv.org/abs/1701.06264v6 |
http://arxiv.org/pdf/1701.06264v6.pdf | |
PWC | https://paperswithcode.com/paper/loss-sensitive-generative-adversarial |
Repo | https://github.com/guojunq/lsgan |
Framework | torch |
Are Emojis Predictable?
Title | Are Emojis Predictable? |
Authors | Francesco Barbieri, Miguel Ballesteros, Horacio Saggion |
Abstract | Emojis are ideograms which are naturally combined with plain text to visually complement or condense the meaning of a message. Despite being widely used in social media, their underlying semantics have received little attention from a Natural Language Processing standpoint. In this paper, we investigate the relation between words and emojis, studying the novel task of predicting which emojis are evoked by text-based tweet messages. We train several models based on Long Short-Term Memory networks (LSTMs) in this task. Our experimental results show that our neural model outperforms two baselines as well as humans solving the same task, suggesting that computational models are able to better capture the underlying semantics of emojis. |
Tasks | |
Published | 2017-02-23 |
URL | http://arxiv.org/abs/1702.07285v2 |
http://arxiv.org/pdf/1702.07285v2.pdf | |
PWC | https://paperswithcode.com/paper/are-emojis-predictable |
Repo | https://github.com/MMU-TDMLab/EmojiGraph |
Framework | none |
Unmasking the abnormal events in video
Title | Unmasking the abnormal events in video |
Authors | Radu Tudor Ionescu, Sorina Smeureanu, Bogdan Alexe, Marius Popescu |
Abstract | We propose a novel framework for abnormal event detection in video that requires no training sequences. Our framework is based on unmasking, a technique previously used for authorship verification in text documents, which we adapt to our task. We iteratively train a binary classifier to distinguish between two consecutive video sequences while removing at each step the most discriminant features. Higher training accuracy rates of the intermediately obtained classifiers represent abnormal events. To the best of our knowledge, this is the first work to apply unmasking for a computer vision task. We compare our method with several state-of-the-art supervised and unsupervised methods on four benchmark data sets. The empirical results indicate that our abnormal event detection framework can achieve state-of-the-art results, while running in real-time at 20 frames per second. |
Tasks | Abnormal Event Detection In Video |
Published | 2017-05-23 |
URL | http://arxiv.org/abs/1705.08182v3 |
http://arxiv.org/pdf/1705.08182v3.pdf | |
PWC | https://paperswithcode.com/paper/unmasking-the-abnormal-events-in-video |
Repo | https://github.com/MYusha/Video-Anomaly-Detection |
Framework | none |
A Simple and Accurate Syntax-Agnostic Neural Model for Dependency-based Semantic Role Labeling
Title | A Simple and Accurate Syntax-Agnostic Neural Model for Dependency-based Semantic Role Labeling |
Authors | Diego Marcheggiani, Anton Frolov, Ivan Titov |
Abstract | We introduce a simple and accurate neural model for dependency-based semantic role labeling. Our model predicts predicate-argument dependencies relying on states of a bidirectional LSTM encoder. The semantic role labeler achieves competitive performance on English, even without any kind of syntactic information and only using local inference. However, when automatically predicted part-of-speech tags are provided as input, it substantially outperforms all previous local models and approaches the best reported results on the English CoNLL-2009 dataset. We also consider Chinese, Czech and Spanish where our approach also achieves competitive results. Syntactic parsers are unreliable on out-of-domain data, so standard (i.e., syntactically-informed) SRL models are hindered when tested in this setting. Our syntax-agnostic model appears more robust, resulting in the best reported results on standard out-of-domain test sets. |
Tasks | Semantic Role Labeling |
Published | 2017-01-10 |
URL | http://arxiv.org/abs/1701.02593v2 |
http://arxiv.org/pdf/1701.02593v2.pdf | |
PWC | https://paperswithcode.com/paper/a-simple-and-accurate-syntax-agnostic-neural |
Repo | https://github.com/jungokasai/stagging_srl |
Framework | tf |
Intention-Net: Integrating Planning and Deep Learning for Goal-Directed Autonomous Navigation
Title | Intention-Net: Integrating Planning and Deep Learning for Goal-Directed Autonomous Navigation |
Authors | Wei Gao, David Hsu, Wee Sun Lee, Shengmei Shen, Karthikk Subramanian |
Abstract | How can a delivery robot navigate reliably to a destination in a new office building, with minimal prior information? To tackle this challenge, this paper introduces a two-level hierarchical approach, which integrates model-free deep learning and model-based path planning. At the low level, a neural-network motion controller, called the intention-net, is trained end-to-end to provide robust local navigation. The intention-net maps images from a single monocular camera and “intentions” directly to robot controls. At the high level, a path planner uses a crude map, e.g., a 2-D floor plan, to compute a path from the robot’s current location to the goal. The planned path provides intentions to the intention-net. Preliminary experiments suggest that the learned motion controller is robust against perceptual uncertainty and by integrating with a path planner, it generalizes effectively to new environments and goals. |
Tasks | Autonomous Navigation |
Published | 2017-10-16 |
URL | http://arxiv.org/abs/1710.05627v2 |
http://arxiv.org/pdf/1710.05627v2.pdf | |
PWC | https://paperswithcode.com/paper/intention-net-integrating-planning-and-deep |
Repo | https://github.com/ayusefi/Localization-Papers |
Framework | none |
MIDA: Multiple Imputation using Denoising Autoencoders
Title | MIDA: Multiple Imputation using Denoising Autoencoders |
Authors | Lovedeep Gondara, Ke Wang |
Abstract | Missing data is a significant problem impacting all domains. State-of-the-art framework for minimizing missing data bias is multiple imputation, for which the choice of an imputation model remains nontrivial. We propose a multiple imputation model based on overcomplete deep denoising autoencoders. Our proposed model is capable of handling different data types, missingness patterns, missingness proportions and distributions. Evaluation on several real life datasets show our proposed model significantly outperforms current state-of-the-art methods under varying conditions while simultaneously improving end of the line analytics. |
Tasks | Denoising, Imputation |
Published | 2017-05-08 |
URL | http://arxiv.org/abs/1705.02737v3 |
http://arxiv.org/pdf/1705.02737v3.pdf | |
PWC | https://paperswithcode.com/paper/mida-multiple-imputation-using-denoising |
Repo | https://github.com/HarryK24/MIDA-pytorch |
Framework | pytorch |