October 21, 2019

3396 words 16 mins read

Paper Group AWR 148

Paper Group AWR 148

ModaNet: A Large-Scale Street Fashion Dataset with Polygon Annotations. Perceptual Image Quality Assessment through Spectral Analysis of Error Representations. Memorization Precedes Generation: Learning Unsupervised GANs with Memory Networks. JavaScript Convolutional Neural Networks for Keyword Spotting in the Browser: An Experimental Analysis. Qua …

ModaNet: A Large-Scale Street Fashion Dataset with Polygon Annotations

Title ModaNet: A Large-Scale Street Fashion Dataset with Polygon Annotations
Authors Shuai Zheng, Fan Yang, M. Hadi Kiapour, Robinson Piramuthu
Abstract Understanding clothes from a single image has strong commercial and cultural impacts on modern societies. However, this task remains a challenging computer vision problem due to wide variations in the appearance, style, brand and layering of clothing items. We present a new database called ModaNet, a large-scale collection of images based on Paperdoll dataset. Our dataset provides 55,176 street images, fully annotated with polygons on top of the 1 million weakly annotated street images in Paperdoll. ModaNet aims to provide a technical benchmark to fairly evaluate the progress of applying the latest computer vision techniques that rely on large data for fashion understanding. The rich annotation of the dataset allows to measure the performance of state-of-the-art algorithms for object detection, semantic segmentation and polygon prediction on street fashion images in detail. The polygon-based annotation dataset has been released https://github.com/eBay/modanet, we also host the leaderboard at EvalAI: https://evalai.cloudcv.org/featured-challenges/136/overview.
Tasks Object Detection, Semantic Segmentation
Published 2018-07-03
URL http://arxiv.org/abs/1807.01394v4
PDF http://arxiv.org/pdf/1807.01394v4.pdf
PWC https://paperswithcode.com/paper/modanet-a-large-scale-street-fashion-dataset
Repo https://github.com/eBay/modanet
Framework none

Perceptual Image Quality Assessment through Spectral Analysis of Error Representations

Title Perceptual Image Quality Assessment through Spectral Analysis of Error Representations
Authors Dogancan Temel, Ghassan AlRegib
Abstract In this paper, we analyze the statistics of error signals to assess the perceived quality of images. Specifically, we focus on the magnitude spectrum of error images obtained from the difference of reference and distorted images. Analyzing spectral statistics over grayscale images partially models interference in spatial harmonic distortion exhibited by the visual system but it overlooks color information, selective and hierarchical nature of visual system. To overcome these shortcomings, we introduce an image quality assessment algorithm based on the Spectral Understanding of Multi-scale and Multi-channel Error Representations, denoted as SUMMER. We validate the quality assessment performance over 3 databases with around 30 distortion types. These distortion types are grouped into 7 main categories as compression artifact, image noise, color artifact, communication error, blur, global and local distortions. In total, we benchmark the performance of 17 algorithms along with the proposed algorithm using 5 performance metrics that measure linearity, monotonicity, accuracy, and consistency. In addition to experiments with standard performance metrics, we analyze the distribution of objective and subjective scores with histogram difference metrics and scatter plots. Moreover, we analyze the classification performance of quality assessment algorithms along with their statistical significance tests. Based on our experiments, SUMMER significantly outperforms majority of the compared methods in all benchmark categories
Tasks Image Quality Assessment
Published 2018-10-14
URL http://arxiv.org/abs/1810.05964v2
PDF http://arxiv.org/pdf/1810.05964v2.pdf
PWC https://paperswithcode.com/paper/perceptual-image-quality-assessment-through
Repo https://github.com/olivesgatech/SUMMER
Framework none

Memorization Precedes Generation: Learning Unsupervised GANs with Memory Networks

Title Memorization Precedes Generation: Learning Unsupervised GANs with Memory Networks
Authors Youngjin Kim, Minjung Kim, Gunhee Kim
Abstract We propose an approach to address two issues that commonly occur during training of unsupervised GANs. First, since GANs use only a continuous latent distribution to embed multiple classes or clusters of data, they often do not correctly handle the structural discontinuity between disparate classes in a latent space. Second, discriminators of GANs easily forget about past generated samples by generators, incurring instability during adversarial training. We argue that these two infamous problems of unsupervised GAN training can be largely alleviated by a learnable memory network to which both generators and discriminators can access. Generators can effectively learn representation of training samples to understand underlying cluster distributions of data, which ease the structure discontinuity problem. At the same time, discriminators can better memorize clusters of previously generated samples, which mitigate the forgetting problem. We propose a novel end-to-end GAN model named memoryGAN, which involves a memory network that is unsupervisedly trainable and integrable to many existing GAN models. With evaluations on multiple datasets such as Fashion-MNIST, CelebA, CIFAR10, and Chairs, we show that our model is probabilistically interpretable, and generates realistic image samples of high visual fidelity. The memoryGAN also achieves the state-of-the-art inception scores over unsupervised GAN models on the CIFAR10 dataset, without any optimization tricks and weaker divergences.
Tasks
Published 2018-03-05
URL http://arxiv.org/abs/1803.01500v2
PDF http://arxiv.org/pdf/1803.01500v2.pdf
PWC https://paperswithcode.com/paper/memorization-precedes-generation-learning
Repo https://github.com/whyjay/memoryGAN
Framework tf

JavaScript Convolutional Neural Networks for Keyword Spotting in the Browser: An Experimental Analysis

Title JavaScript Convolutional Neural Networks for Keyword Spotting in the Browser: An Experimental Analysis
Authors Jaejun Lee, Raphael Tang, Jimmy Lin
Abstract Used for simple commands recognition on devices from smart routers to mobile phones, keyword spotting systems are everywhere. Ubiquitous as well are web applications, which have grown in popularity and complexity over the last decade with significant improvements in usability under cross-platform conditions. However, despite their obvious advantage in natural language interaction, voice-enabled web applications are still far and few between. In this work, we attempt to bridge this gap by bringing keyword spotting capabilities directly into the browser. To our knowledge, we are the first to demonstrate a fully-functional implementation of convolutional neural networks in pure JavaScript that runs in any standards-compliant browser. We also apply network slimming, a model compression technique, to explore the accuracy-efficiency tradeoffs, reporting latency measurements on a range of devices and software. Overall, our robust, cross-device implementation for keyword spotting realizes a new paradigm for serving neural network applications, and one of our slim models reduces latency by 66% with a minimal decrease in accuracy of 4% from 94% to 90%.
Tasks Keyword Spotting, Model Compression
Published 2018-10-30
URL http://arxiv.org/abs/1810.12859v1
PDF http://arxiv.org/pdf/1810.12859v1.pdf
PWC https://paperswithcode.com/paper/javascript-convolutional-neural-networks-for
Repo https://github.com/castorini/honkling
Framework pytorch

Quantitative Fine-Grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian

Title Quantitative Fine-Grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian
Authors Filip Klubička, Antonio Toral, Víctor M. Sánchez-Cartagena
Abstract This paper presents a quantitative fine-grained manual evaluation approach to comparing the performance of different machine translation (MT) systems. We build upon the well-established Multidimensional Quality Metrics (MQM) error taxonomy and implement a novel method that assesses whether the differences in performance for MQM error types between different MT systems are statistically significant. We conduct a case study for English-to-Croatian, a language direction that involves translating into a morphologically rich language, for which we compare three MT systems belonging to different paradigms: pure phrase-based, factored phrase-based and neural. First, we design an MQM-compliant error taxonomy tailored to the relevant linguistic phenomena of Slavic languages, which made the annotation process feasible and accurate. Errors in MT outputs were then annotated by two annotators following this taxonomy. Subsequently, we carried out a statistical analysis which showed that the best-performing system (neural) reduces the errors produced by the worst system (pure phrase-based) by more than half (54%). Moreover, we conducted an additional analysis of agreement errors in which we distinguished between short (phrase-level) and long distance (sentence-level) errors. We discovered that phrase-based MT approaches are of limited use for long distance agreement phenomena, for which neural MT was found to be especially effective.
Tasks Machine Translation
Published 2018-02-02
URL http://arxiv.org/abs/1802.01451v1
PDF http://arxiv.org/pdf/1802.01451v1.pdf
PWC https://paperswithcode.com/paper/quantitative-fine-grained-human-evaluation-of
Repo https://github.com/GreenParachute/mqm-eng-cro
Framework none
Title Parameter sharing between dependency parsers for related languages
Authors Miryam de Lhoneux, Johannes Bjerva, Isabelle Augenstein, Anders Søgaard
Abstract Previous work has suggested that parameter sharing between transition-based neural dependency parsers for related languages can lead to better performance, but there is no consensus on what parameters to share. We present an evaluation of 27 different parameter sharing strategies across 10 languages, representing five pairs of related languages, each pair from a different language family. We find that sharing transition classifier parameters always helps, whereas the usefulness of sharing word and/or character LSTM parameters varies. Based on this result, we propose an architecture where the transition classifier is shared, and the sharing of word and character parameters is controlled by a parameter that can be tuned on validation data. This model is linguistically motivated and obtains significant improvements over a monolingually trained baseline. We also find that sharing transition classifier parameters helps when training a parser on unrelated language pairs, but we find that, in the case of unrelated languages, sharing too many parameters does not help.
Tasks
Published 2018-08-27
URL http://arxiv.org/abs/1808.09055v2
PDF http://arxiv.org/pdf/1808.09055v2.pdf
PWC https://paperswithcode.com/paper/parameter-sharing-between-dependency-parsers
Repo https://github.com/coastalcph/uuparser
Framework none

Cross-Domain Image Matching with Deep Feature Maps

Title Cross-Domain Image Matching with Deep Feature Maps
Authors Bailey Kong, James Supancic, Deva Ramanan, Charless C. Fowlkes
Abstract We investigate the problem of automatically determining what type of shoe left an impression found at a crime scene. This recognition problem is made difficult by the variability in types of crime scene evidence (ranging from traces of dust or oil on hard surfaces to impressions made in soil) and the lack of comprehensive databases of shoe outsole tread patterns. We find that mid-level features extracted by pre-trained convolutional neural nets are surprisingly effective descriptors for this specialized domains. However, the choice of similarity measure for matching exemplars to a query image is essential to good performance. For matching multi-channel deep features, we propose the use of multi-channel normalized cross-correlation and analyze its effectiveness. Our proposed metric significantly improves performance in matching crime scene shoeprints to laboratory test impressions. We also show its effectiveness in other cross-domain image retrieval problems: matching facade images to segmentation labels and aerial photos to map images. Finally, we introduce a discriminatively trained variant and fine-tune our system through our proposed metric, obtaining state-of-the-art performance.
Tasks Image Retrieval
Published 2018-04-06
URL http://arxiv.org/abs/1804.02367v2
PDF http://arxiv.org/pdf/1804.02367v2.pdf
PWC https://paperswithcode.com/paper/cross-domain-image-matching-with-deep-feature
Repo https://github.com/bkong/MCNCC
Framework none

Polite Dialogue Generation Without Parallel Data

Title Polite Dialogue Generation Without Parallel Data
Authors Tong Niu, Mohit Bansal
Abstract Stylistic dialogue response generation, with valuable applications in personality-based conversational agents, is a challenging task because the response needs to be fluent, contextually-relevant, as well as paralinguistically accurate. Moreover, parallel datasets for regular-to-stylistic pairs are usually unavailable. We present three weakly-supervised models that can generate diverse polite (or rude) dialogue responses without parallel data. Our late fusion model (Fusion) merges the decoder of an encoder-attention-decoder dialogue model with a language model trained on stand-alone polite utterances. Our label-fine-tuning (LFT) model prepends to each source sequence a politeness-score scaled label (predicted by our state-of-the-art politeness classifier) during training, and at test time is able to generate polite, neutral, and rude responses by simply scaling the label embedding by the corresponding score. Our reinforcement learning model (Polite-RL) encourages politeness generation by assigning rewards proportional to the politeness classifier score of the sampled response. We also present two retrieval-based polite dialogue model baselines. Human evaluation validates that while the Fusion and the retrieval-based models achieve politeness with poorer context-relevance, the LFT and Polite-RL models can produce significantly more polite responses without sacrificing dialogue quality.
Tasks Dialogue Generation, Language Modelling
Published 2018-05-08
URL http://arxiv.org/abs/1805.03162v1
PDF http://arxiv.org/pdf/1805.03162v1.pdf
PWC https://paperswithcode.com/paper/polite-dialogue-generation-without-parallel
Repo https://github.com/WolfNiu/polite-dialogue-generation
Framework tf

The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints

Title The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints
Authors Andrew Hundt, Varun Jain, Chia-Hung Lin, Chris Paxton, Gregory D. Hager
Abstract A robot can now grasp an object more effectively than ever before, but once it has the object what happens next? We show that a mild relaxation of the task and workspace constraints implicit in existing object grasping datasets can cause neural network based grasping algorithms to fail on even a simple block stacking task when executed under more realistic circumstances. To address this, we introduce the JHU CoSTAR Block Stacking Dataset (BSD), where a robot interacts with 5.1 cm colored blocks to complete an order-fulfillment style block stacking task. It contains dynamic scenes and real time-series data in a less constrained environment than comparable datasets. There are nearly 12,000 stacking attempts and over 2 million frames of real data. We discuss the ways in which this dataset provides a valuable resource for a broad range of other topics of investigation. We find that hand-designed neural networks that work on prior datasets do not generalize to this task. Thus, to establish a baseline for this dataset, we demonstrate an automated search of neural network based models using a novel multiple-input HyperTree MetaModel, and find a final model which makes reasonable 3D pose predictions for grasping and stacking on our dataset. The CoSTAR BSD, code, and instructions are available at https://sites.google.com/site/costardataset.
Tasks Neural Architecture Search, Time Series
Published 2018-10-27
URL http://arxiv.org/abs/1810.11714v2
PDF http://arxiv.org/pdf/1810.11714v2.pdf
PWC https://paperswithcode.com/paper/training-frankensteins-creature-to-stack
Repo https://github.com/ahundt/enas
Framework tf

Implicit Generative Modeling of Random Noise during Training for Adversarial Robustness

Title Implicit Generative Modeling of Random Noise during Training for Adversarial Robustness
Authors Priyadarshini Panda, Kaushik Roy
Abstract We introduce a Noise-based prior Learning (NoL) approach for training neural networks that are intrinsically robust to adversarial attacks. We find that the implicit generative modeling of random noise with the same loss function used during posterior maximization, improves a model’s understanding of the data manifold furthering adversarial robustness. We evaluate our approach’s efficacy and provide a simplistic visualization tool for understanding adversarial data, using Principal Component Analysis. Our analysis reveals that adversarial robustness, in general, manifests in models with higher variance along the high-ranked principal components. We show that models learnt with our approach perform remarkably well against a wide-range of attacks. Furthermore, combining NoL with state-of-the-art adversarial training extends the robustness of a model, even beyond what it is adversarially trained for, in both white-box and black-box attack scenarios.
Tasks
Published 2018-07-05
URL https://arxiv.org/abs/1807.02188v4
PDF https://arxiv.org/pdf/1807.02188v4.pdf
PWC https://paperswithcode.com/paper/implicit-generative-modeling-of-random-noise
Repo https://github.com/panda1230/Adversarial_NoiseLearning_NoL
Framework pytorch

Lifelong Learning of Spatiotemporal Representations with Dual-Memory Recurrent Self-Organization

Title Lifelong Learning of Spatiotemporal Representations with Dual-Memory Recurrent Self-Organization
Authors German I. Parisi, Jun Tani, Cornelius Weber, Stefan Wermter
Abstract Artificial autonomous agents and robots interacting in complex environments are required to continually acquire and fine-tune knowledge over sustained periods of time. The ability to learn from continuous streams of information is referred to as lifelong learning and represents a long-standing challenge for neural network models due to catastrophic forgetting. Computational models of lifelong learning typically alleviate catastrophic forgetting in experimental scenarios with given datasets of static images and limited complexity, thereby differing significantly from the conditions artificial agents are exposed to. In more natural settings, sequential information may become progressively available over time and access to previous experience may be restricted. In this paper, we propose a dual-memory self-organizing architecture for lifelong learning scenarios. The architecture comprises two growing recurrent networks with the complementary tasks of learning object instances (episodic memory) and categories (semantic memory). Both growing networks can expand in response to novel sensory experience: the episodic memory learns fine-grained spatiotemporal representations of object instances in an unsupervised fashion while the semantic memory uses task-relevant signals to regulate structural plasticity levels and develop more compact representations from episodic experience. For the consolidation of knowledge in the absence of external sensory input, the episodic memory periodically replays trajectories of neural reactivations. We evaluate the proposed model on the CORe50 benchmark dataset for continuous object recognition, showing that we significantly outperform current methods of lifelong learning in three different incremental learning scenarios
Tasks Active Learning, Continuous Object Recognition, Object Recognition
Published 2018-05-28
URL http://arxiv.org/abs/1805.10966v4
PDF http://arxiv.org/pdf/1805.10966v4.pdf
PWC https://paperswithcode.com/paper/lifelong-learning-of-spatiotemporal
Repo https://github.com/giparisi/GDM
Framework none

Computing recommendations via a Knowledge Graph-aware Autoencoder

Title Computing recommendations via a Knowledge Graph-aware Autoencoder
Authors Vito Bellini, Angelo Schiavone, Tommaso Di Noia, Azzurra Ragone, Eugenio Di Sciascio
Abstract In the last years, deep learning has shown to be a game-changing technology in artificial intelligence thanks to the numerous successes it reached in diverse application fields. Among others, the use of deep learning for the recommendation problem, although new, looks quite promising due to its positive performances in terms of accuracy of recommendation results. In a recommendation setting, in order to predict user ratings on unknown items a possible configuration of a deep neural network is that of autoencoders tipically used to produce a lower dimensionality representation of the original data. In this paper we present KG-AUTOENCODER, an autoencoder that bases the structure of its neural network on the semanticsaware topology of a knowledge graph thus providing a label for neurons in the hidden layer that are eventually used to build a user profile and then compute recommendations. We show the effectiveness of KG-AUTOENCODER in terms of accuracy, diversity and novelty by comparing with state of the art recommendation algorithms.
Tasks
Published 2018-07-13
URL https://arxiv.org/abs/1807.05006v1
PDF https://arxiv.org/pdf/1807.05006v1.pdf
PWC https://paperswithcode.com/paper/computing-recommendations-via-a-knowledge
Repo https://github.com/sisinflab/SEMAUTO-2.0
Framework tf

GILE: A Generalized Input-Label Embedding for Text Classification

Title GILE: A Generalized Input-Label Embedding for Text Classification
Authors Nikolaos Pappas, James Henderson
Abstract Neural text classification models typically treat output labels as categorical variables which lack description and semantics. This forces their parametrization to be dependent on the label set size, and, hence, they are unable to scale to large label sets and generalize to unseen ones. Existing joint input-label text models overcome these issues by exploiting label descriptions, but they are unable to capture complex label relationships, have rigid parametrization, and their gains on unseen labels happen often at the expense of weak performance on the labels seen during training. In this paper, we propose a new input-label model which generalizes over previous such models, addresses their limitations and does not compromise performance on seen labels. The model consists of a joint non-linear input-label embedding with controllable capacity and a joint-space-dependent classification unit which is trained with cross-entropy loss to optimize classification performance. We evaluate models on full-resource and low- or zero-resource text classification of multilingual news and biomedical text with a large label set. Our model outperforms monolingual and multilingual models which do not leverage label semantics and previous joint input-label space models in both scenarios.
Tasks Multi-Task Learning, Text Classification, Zero-Shot Learning
Published 2018-06-16
URL http://arxiv.org/abs/1806.06219v3
PDF http://arxiv.org/pdf/1806.06219v3.pdf
PWC https://paperswithcode.com/paper/gile-a-generalized-input-label-embedding-for
Repo https://github.com/idiap/gile
Framework none

Multi-Hop Knowledge Graph Reasoning with Reward Shaping

Title Multi-Hop Knowledge Graph Reasoning with Reward Shaping
Authors Xi Victoria Lin, Richard Socher, Caiming Xiong
Abstract Multi-hop reasoning is an effective approach for query answering (QA) over incomplete knowledge graphs (KGs). The problem can be formulated in a reinforcement learning (RL) setup, where a policy-based agent sequentially extends its inference path until it reaches a target. However, in an incomplete KG environment, the agent receives low-quality rewards corrupted by false negatives in the training data, which harms generalization at test time. Furthermore, since no golden action sequence is used for training, the agent can be misled by spurious search trajectories that incidentally lead to the correct answer. We propose two modeling advances to address both issues: (1) we reduce the impact of false negative supervision by adopting a pretrained one-hop embedding model to estimate the reward of unobserved facts; (2) we counter the sensitivity to spurious paths of on-policy RL by forcing the agent to explore a diverse set of paths using randomly generated edge masks. Our approach significantly improves over existing path-based KGQA models on several benchmark datasets and is comparable or better than embedding-based models.
Tasks Knowledge Graphs
Published 2018-08-31
URL http://arxiv.org/abs/1808.10568v2
PDF http://arxiv.org/pdf/1808.10568v2.pdf
PWC https://paperswithcode.com/paper/multi-hop-knowledge-graph-reasoning-with
Repo https://github.com/salesforce/MultiHopKG
Framework pytorch

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Title BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Authors Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
Abstract We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
Tasks Common Sense Reasoning, Cross-Lingual Natural Language Inference, Named Entity Recognition, Natural Language Inference, Question Answering, Sentence Classification
Published 2018-10-11
URL https://arxiv.org/abs/1810.04805v2
PDF https://arxiv.org/pdf/1810.04805v2.pdf
PWC https://paperswithcode.com/paper/bert-pre-training-of-deep-bidirectional
Repo https://github.com/pfecht/bert-exploration
Framework pytorch
comments powered by Disqus