October 21, 2019

3396 words 16 mins read

Paper Group AWR 148

ModaNet: A Large-Scale Street Fashion Dataset with Polygon Annotations. Perceptual Image Quality Assessment through Spectral Analysis of Error Representations. Memorization Precedes Generation: Learning Unsupervised GANs with Memory Networks. JavaScript Convolutional Neural Networks for Keyword Spotting in the Browser: An Experimental Analysis. Qua …

ModaNet: A Large-Scale Street Fashion Dataset with Polygon Annotations


Title	ModaNet: A Large-Scale Street Fashion Dataset with Polygon Annotations
Authors	Shuai Zheng, Fan Yang, M. Hadi Kiapour, Robinson Piramuthu
Abstract	Understanding clothes from a single image has strong commercial and cultural impacts on modern societies. However, this task remains a challenging computer vision problem due to wide variations in the appearance, style, brand and layering of clothing items. We present a new database called ModaNet, a large-scale collection of images based on Paperdoll dataset. Our dataset provides 55,176 street images, fully annotated with polygons on top of the 1 million weakly annotated street images in Paperdoll. ModaNet aims to provide a technical benchmark to fairly evaluate the progress of applying the latest computer vision techniques that rely on large data for fashion understanding. The rich annotation of the dataset allows to measure the performance of state-of-the-art algorithms for object detection, semantic segmentation and polygon prediction on street fashion images in detail. The polygon-based annotation dataset has been released https://github.com/eBay/modanet, we also host the leaderboard at EvalAI: https://evalai.cloudcv.org/featured-challenges/136/overview.
Tasks	Object Detection, Semantic Segmentation
Published	2018-07-03
URL	http://arxiv.org/abs/1807.01394v4
PDF	http://arxiv.org/pdf/1807.01394v4.pdf
PWC	https://paperswithcode.com/paper/modanet-a-large-scale-street-fashion-dataset
Repo	https://github.com/eBay/modanet
Framework	none

Perceptual Image Quality Assessment through Spectral Analysis of Error Representations


Title	Perceptual Image Quality Assessment through Spectral Analysis of Error Representations
Authors	Dogancan Temel, Ghassan AlRegib
Abstract	In this paper, we analyze the statistics of error signals to assess the perceived quality of images. Specifically, we focus on the magnitude spectrum of error images obtained from the difference of reference and distorted images. Analyzing spectral statistics over grayscale images partially models interference in spatial harmonic distortion exhibited by the visual system but it overlooks color information, selective and hierarchical nature of visual system. To overcome these shortcomings, we introduce an image quality assessment algorithm based on the Spectral Understanding of Multi-scale and Multi-channel Error Representations, denoted as SUMMER. We validate the quality assessment performance over 3 databases with around 30 distortion types. These distortion types are grouped into 7 main categories as compression artifact, image noise, color artifact, communication error, blur, global and local distortions. In total, we benchmark the performance of 17 algorithms along with the proposed algorithm using 5 performance metrics that measure linearity, monotonicity, accuracy, and consistency. In addition to experiments with standard performance metrics, we analyze the distribution of objective and subjective scores with histogram difference metrics and scatter plots. Moreover, we analyze the classification performance of quality assessment algorithms along with their statistical significance tests. Based on our experiments, SUMMER significantly outperforms majority of the compared methods in all benchmark categories
Tasks	Image Quality Assessment
Published	2018-10-14
URL	http://arxiv.org/abs/1810.05964v2
PDF	http://arxiv.org/pdf/1810.05964v2.pdf
PWC	https://paperswithcode.com/paper/perceptual-image-quality-assessment-through
Repo	https://github.com/olivesgatech/SUMMER
Framework	none

Memorization Precedes Generation: Learning Unsupervised GANs with Memory Networks


Title	Memorization Precedes Generation: Learning Unsupervised GANs with Memory Networks
Authors	Youngjin Kim, Minjung Kim, Gunhee Kim
Abstract	We propose an approach to address two issues that commonly occur during training of unsupervised GANs. First, since GANs use only a continuous latent distribution to embed multiple classes or clusters of data, they often do not correctly handle the structural discontinuity between disparate classes in a latent space. Second, discriminators of GANs easily forget about past generated samples by generators, incurring instability during adversarial training. We argue that these two infamous problems of unsupervised GAN training can be largely alleviated by a learnable memory network to which both generators and discriminators can access. Generators can effectively learn representation of training samples to understand underlying cluster distributions of data, which ease the structure discontinuity problem. At the same time, discriminators can better memorize clusters of previously generated samples, which mitigate the forgetting problem. We propose a novel end-to-end GAN model named memoryGAN, which involves a memory network that is unsupervisedly trainable and integrable to many existing GAN models. With evaluations on multiple datasets such as Fashion-MNIST, CelebA, CIFAR10, and Chairs, we show that our model is probabilistically interpretable, and generates realistic image samples of high visual fidelity. The memoryGAN also achieves the state-of-the-art inception scores over unsupervised GAN models on the CIFAR10 dataset, without any optimization tricks and weaker divergences.
Tasks
Published	2018-03-05
URL	http://arxiv.org/abs/1803.01500v2
PDF	http://arxiv.org/pdf/1803.01500v2.pdf
PWC	https://paperswithcode.com/paper/memorization-precedes-generation-learning
Repo	https://github.com/whyjay/memoryGAN
Framework	tf

JavaScript Convolutional Neural Networks for Keyword Spotting in the Browser: An Experimental Analysis


Title	JavaScript Convolutional Neural Networks for Keyword Spotting in the Browser: An Experimental Analysis
Authors	Jaejun Lee, Raphael Tang, Jimmy Lin
Abstract	Used for simple commands recognition on devices from smart routers to mobile phones, keyword spotting systems are everywhere. Ubiquitous as well are web applications, which have grown in popularity and complexity over the last decade with significant improvements in usability under cross-platform conditions. However, despite their obvious advantage in natural language interaction, voice-enabled web applications are still far and few between. In this work, we attempt to bridge this gap by bringing keyword spotting capabilities directly into the browser. To our knowledge, we are the first to demonstrate a fully-functional implementation of convolutional neural networks in pure JavaScript that runs in any standards-compliant browser. We also apply network slimming, a model compression technique, to explore the accuracy-efficiency tradeoffs, reporting latency measurements on a range of devices and software. Overall, our robust, cross-device implementation for keyword spotting realizes a new paradigm for serving neural network applications, and one of our slim models reduces latency by 66% with a minimal decrease in accuracy of 4% from 94% to 90%.
Tasks	Keyword Spotting, Model Compression
Published	2018-10-30
URL	http://arxiv.org/abs/1810.12859v1
PDF	http://arxiv.org/pdf/1810.12859v1.pdf
PWC	https://paperswithcode.com/paper/javascript-convolutional-neural-networks-for
Repo	https://github.com/castorini/honkling
Framework	pytorch

Quantitative Fine-Grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian


Title	Quantitative Fine-Grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian
Authors	Filip Klubička, Antonio Toral, Víctor M. Sánchez-Cartagena
Abstract	This paper presents a quantitative fine-grained manual evaluation approach to comparing the performance of different machine translation (MT) systems. We build upon the well-established Multidimensional Quality Metrics (MQM) error taxonomy and implement a novel method that assesses whether the differences in performance for MQM error types between different MT systems are statistically significant. We conduct a case study for English-to-Croatian, a language direction that involves translating into a morphologically rich language, for which we compare three MT systems belonging to different paradigms: pure phrase-based, factored phrase-based and neural. First, we design an MQM-compliant error taxonomy tailored to the relevant linguistic phenomena of Slavic languages, which made the annotation process feasible and accurate. Errors in MT outputs were then annotated by two annotators following this taxonomy. Subsequently, we carried out a statistical analysis which showed that the best-performing system (neural) reduces the errors produced by the worst system (pure phrase-based) by more than half (54%). Moreover, we conducted an additional analysis of agreement errors in which we distinguished between short (phrase-level) and long distance (sentence-level) errors. We discovered that phrase-based MT approaches are of limited use for long distance agreement phenomena, for which neural MT was found to be especially effective.
Tasks	Machine Translation
Published	2018-02-02
URL	http://arxiv.org/abs/1802.01451v1
PDF	http://arxiv.org/pdf/1802.01451v1.pdf
PWC	https://paperswithcode.com/paper/quantitative-fine-grained-human-evaluation-of
Repo	https://github.com/GreenParachute/mqm-eng-cro
Framework	none


Title	Parameter sharing between dependency parsers for related languages
Authors	Miryam de Lhoneux, Johannes Bjerva, Isabelle Augenstein, Anders Søgaard
Abstract	Previous work has suggested that parameter sharing between transition-based neural dependency parsers for related languages can lead to better performance, but there is no consensus on what parameters to share. We present an evaluation of 27 different parameter sharing strategies across 10 languages, representing five pairs of related languages, each pair from a different language family. We find that sharing transition classifier parameters always helps, whereas the usefulness of sharing word and/or character LSTM parameters varies. Based on this result, we propose an architecture where the transition classifier is shared, and the sharing of word and character parameters is controlled by a parameter that can be tuned on validation data. This model is linguistically motivated and obtains significant improvements over a monolingually trained baseline. We also find that sharing transition classifier parameters helps when training a parser on unrelated language pairs, but we find that, in the case of unrelated languages, sharing too many parameters does not help.
Tasks
Published	2018-08-27
URL	http://arxiv.org/abs/1808.09055v2
PDF	http://arxiv.org/pdf/1808.09055v2.pdf
PWC	https://paperswithcode.com/paper/parameter-sharing-between-dependency-parsers
Repo	https://github.com/coastalcph/uuparser
Framework	none

Cross-Domain Image Matching with Deep Feature Maps


Title	Cross-Domain Image Matching with Deep Feature Maps
Authors	Bailey Kong, James Supancic, Deva Ramanan, Charless C. Fowlkes
Abstract	We investigate the problem of automatically determining what type of shoe left an impression found at a crime scene. This recognition problem is made difficult by the variability in types of crime scene evidence (ranging from traces of dust or oil on hard surfaces to impressions made in soil) and the lack of comprehensive databases of shoe outsole tread patterns. We find that mid-level features extracted by pre-trained convolutional neural nets are surprisingly effective descriptors for this specialized domains. However, the choice of similarity measure for matching exemplars to a query image is essential to good performance. For matching multi-channel deep features, we propose the use of multi-channel normalized cross-correlation and analyze its effectiveness. Our proposed metric significantly improves performance in matching crime scene shoeprints to laboratory test impressions. We also show its effectiveness in other cross-domain image retrieval problems: matching facade images to segmentation labels and aerial photos to map images. Finally, we introduce a discriminatively trained variant and fine-tune our system through our proposed metric, obtaining state-of-the-art performance.
Tasks	Image Retrieval
Published	2018-04-06
URL	http://arxiv.org/abs/1804.02367v2
PDF	http://arxiv.org/pdf/1804.02367v2.pdf
PWC	https://paperswithcode.com/paper/cross-domain-image-matching-with-deep-feature
Repo	https://github.com/bkong/MCNCC
Framework	none

Polite Dialogue Generation Without Parallel Data


Title	Polite Dialogue Generation Without Parallel Data
Authors	Tong Niu, Mohit Bansal
Abstract	Stylistic dialogue response generation, with valuable applications in personality-based conversational agents, is a challenging task because the response needs to be fluent, contextually-relevant, as well as paralinguistically accurate. Moreover, parallel datasets for regular-to-stylistic pairs are usually unavailable. We present three weakly-supervised models that can generate diverse polite (or rude) dialogue responses without parallel data. Our late fusion model (Fusion) merges the decoder of an encoder-attention-decoder dialogue model with a language model trained on stand-alone polite utterances. Our label-fine-tuning (LFT) model prepends to each source sequence a politeness-score scaled label (predicted by our state-of-the-art politeness classifier) during training, and at test time is able to generate polite, neutral, and rude responses by simply scaling the label embedding by the corresponding score. Our reinforcement learning model (Polite-RL) encourages politeness generation by assigning rewards proportional to the politeness classifier score of the sampled response. We also present two retrieval-based polite dialogue model baselines. Human evaluation validates that while the Fusion and the retrieval-based models achieve politeness with poorer context-relevance, the LFT and Polite-RL models can produce significantly more polite responses without sacrificing dialogue quality.
Tasks	Dialogue Generation, Language Modelling
Published	2018-05-08
URL	http://arxiv.org/abs/1805.03162v1
PDF	http://arxiv.org/pdf/1805.03162v1.pdf
PWC	https://paperswithcode.com/paper/polite-dialogue-generation-without-parallel
Repo	https://github.com/WolfNiu/polite-dialogue-generation
Framework	tf

The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints


Title	The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints
Authors	Andrew Hundt, Varun Jain, Chia-Hung Lin, Chris Paxton, Gregory D. Hager
Abstract	A robot can now grasp an object more effectively than ever before, but once it has the object what happens next? We show that a mild relaxation of the task and workspace constraints implicit in existing object grasping datasets can cause neural network based grasping algorithms to fail on even a simple block stacking task when executed under more realistic circumstances. To address this, we introduce the JHU CoSTAR Block Stacking Dataset (BSD), where a robot interacts with 5.1 cm colored blocks to complete an order-fulfillment style block stacking task. It contains dynamic scenes and real time-series data in a less constrained environment than comparable datasets. There are nearly 12,000 stacking attempts and over 2 million frames of real data. We discuss the ways in which this dataset provides a valuable resource for a broad range of other topics of investigation. We find that hand-designed neural networks that work on prior datasets do not generalize to this task. Thus, to establish a baseline for this dataset, we demonstrate an automated search of neural network based models using a novel multiple-input HyperTree MetaModel, and find a final model which makes reasonable 3D pose predictions for grasping and stacking on our dataset. The CoSTAR BSD, code, and instructions are available at https://sites.google.com/site/costardataset.
Tasks	Neural Architecture Search, Time Series
Published	2018-10-27
URL	http://arxiv.org/abs/1810.11714v2
PDF	http://arxiv.org/pdf/1810.11714v2.pdf
PWC	https://paperswithcode.com/paper/training-frankensteins-creature-to-stack
Repo	https://github.com/ahundt/enas
Framework	tf

Implicit Generative Modeling of Random Noise during Training for Adversarial Robustness


Title	Implicit Generative Modeling of Random Noise during Training for Adversarial Robustness
Authors	Priyadarshini Panda, Kaushik Roy
Abstract	We introduce a Noise-based prior Learning (NoL) approach for training neural networks that are intrinsically robust to adversarial attacks. We find that the implicit generative modeling of random noise with the same loss function used during posterior maximization, improves a model’s understanding of the data manifold furthering adversarial robustness. We evaluate our approach’s efficacy and provide a simplistic visualization tool for understanding adversarial data, using Principal Component Analysis. Our analysis reveals that adversarial robustness, in general, manifests in models with higher variance along the high-ranked principal components. We show that models learnt with our approach perform remarkably well against a wide-range of attacks. Furthermore, combining NoL with state-of-the-art adversarial training extends the robustness of a model, even beyond what it is adversarially trained for, in both white-box and black-box attack scenarios.
Tasks
Published	2018-07-05
URL	https://arxiv.org/abs/1807.02188v4
PDF	https://arxiv.org/pdf/1807.02188v4.pdf
PWC	https://paperswithcode.com/paper/implicit-generative-modeling-of-random-noise
Repo	https://github.com/panda1230/Adversarial_NoiseLearning_NoL
Framework	pytorch

Lifelong Learning of Spatiotemporal Representations with Dual-Memory Recurrent Self-Organization


Title	Lifelong Learning of Spatiotemporal Representations with Dual-Memory Recurrent Self-Organization
Authors	German I. Parisi, Jun Tani, Cornelius Weber, Stefan Wermter
Abstract	Artificial autonomous agents and robots interacting in complex environments are required to continually acquire and fine-tune knowledge over sustained periods of time. The ability to learn from continuous streams of information is referred to as lifelong learning and represents a long-standing challenge for neural network models due to catastrophic forgetting. Computational models of lifelong learning typically alleviate catastrophic forgetting in experimental scenarios with given datasets of static images and limited complexity, thereby differing significantly from the conditions artificial agents are exposed to. In more natural settings, sequential information may become progressively available over time and access to previous experience may be restricted. In this paper, we propose a dual-memory self-organizing architecture for lifelong learning scenarios. The architecture comprises two growing recurrent networks with the complementary tasks of learning object instances (episodic memory) and categories (semantic memory). Both growing networks can expand in response to novel sensory experience: the episodic memory learns fine-grained spatiotemporal representations of object instances in an unsupervised fashion while the semantic memory uses task-relevant signals to regulate structural plasticity levels and develop more compact representations from episodic experience. For the consolidation of knowledge in the absence of external sensory input, the episodic memory periodically replays trajectories of neural reactivations. We evaluate the proposed model on the CORe50 benchmark dataset for continuous object recognition, showing that we significantly outperform current methods of lifelong learning in three different incremental learning scenarios
Tasks	Active Learning, Continuous Object Recognition, Object Recognition
Published	2018-05-28
URL	http://arxiv.org/abs/1805.10966v4
PDF	http://arxiv.org/pdf/1805.10966v4.pdf
PWC	https://paperswithcode.com/paper/lifelong-learning-of-spatiotemporal
Repo	https://github.com/giparisi/GDM
Framework	none

Computing recommendations via a Knowledge Graph-aware Autoencoder


Title	Computing recommendations via a Knowledge Graph-aware Autoencoder
Authors	Vito Bellini, Angelo Schiavone, Tommaso Di Noia, Azzurra Ragone, Eugenio Di Sciascio
Abstract	In the last years, deep learning has shown to be a game-changing technology in artificial intelligence thanks to the numerous successes it reached in diverse application fields. Among others, the use of deep learning for the recommendation problem, although new, looks quite promising due to its positive performances in terms of accuracy of recommendation results. In a recommendation setting, in order to predict user ratings on unknown items a possible configuration of a deep neural network is that of autoencoders tipically used to produce a lower dimensionality representation of the original data. In this paper we present KG-AUTOENCODER, an autoencoder that bases the structure of its neural network on the semanticsaware topology of a knowledge graph thus providing a label for neurons in the hidden layer that are eventually used to build a user profile and then compute recommendations. We show the effectiveness of KG-AUTOENCODER in terms of accuracy, diversity and novelty by comparing with state of the art recommendation algorithms.
Tasks
Published	2018-07-13
URL	https://arxiv.org/abs/1807.05006v1
PDF	https://arxiv.org/pdf/1807.05006v1.pdf
PWC	https://paperswithcode.com/paper/computing-recommendations-via-a-knowledge
Repo	https://github.com/sisinflab/SEMAUTO-2.0
Framework	tf

GILE: A Generalized Input-Label Embedding for Text Classification


Title	GILE: A Generalized Input-Label Embedding for Text Classification
Authors	Nikolaos Pappas, James Henderson
Abstract	Neural text classification models typically treat output labels as categorical variables which lack description and semantics. This forces their parametrization to be dependent on the label set size, and, hence, they are unable to scale to large label sets and generalize to unseen ones. Existing joint input-label text models overcome these issues by exploiting label descriptions, but they are unable to capture complex label relationships, have rigid parametrization, and their gains on unseen labels happen often at the expense of weak performance on the labels seen during training. In this paper, we propose a new input-label model which generalizes over previous such models, addresses their limitations and does not compromise performance on seen labels. The model consists of a joint non-linear input-label embedding with controllable capacity and a joint-space-dependent classification unit which is trained with cross-entropy loss to optimize classification performance. We evaluate models on full-resource and low- or zero-resource text classification of multilingual news and biomedical text with a large label set. Our model outperforms monolingual and multilingual models which do not leverage label semantics and previous joint input-label space models in both scenarios.
Tasks	Multi-Task Learning, Text Classification, Zero-Shot Learning
Published	2018-06-16
URL	http://arxiv.org/abs/1806.06219v3
PDF	http://arxiv.org/pdf/1806.06219v3.pdf
PWC	https://paperswithcode.com/paper/gile-a-generalized-input-label-embedding-for
Repo	https://github.com/idiap/gile
Framework	none

Multi-Hop Knowledge Graph Reasoning with Reward Shaping


Title	Multi-Hop Knowledge Graph Reasoning with Reward Shaping
Authors	Xi Victoria Lin, Richard Socher, Caiming Xiong
Abstract	Multi-hop reasoning is an effective approach for query answering (QA) over incomplete knowledge graphs (KGs). The problem can be formulated in a reinforcement learning (RL) setup, where a policy-based agent sequentially extends its inference path until it reaches a target. However, in an incomplete KG environment, the agent receives low-quality rewards corrupted by false negatives in the training data, which harms generalization at test time. Furthermore, since no golden action sequence is used for training, the agent can be misled by spurious search trajectories that incidentally lead to the correct answer. We propose two modeling advances to address both issues: (1) we reduce the impact of false negative supervision by adopting a pretrained one-hop embedding model to estimate the reward of unobserved facts; (2) we counter the sensitivity to spurious paths of on-policy RL by forcing the agent to explore a diverse set of paths using randomly generated edge masks. Our approach significantly improves over existing path-based KGQA models on several benchmark datasets and is comparable or better than embedding-based models.
Tasks	Knowledge Graphs
Published	2018-08-31
URL	http://arxiv.org/abs/1808.10568v2
PDF	http://arxiv.org/pdf/1808.10568v2.pdf
PWC	https://paperswithcode.com/paper/multi-hop-knowledge-graph-reasoning-with
Repo	https://github.com/salesforce/MultiHopKG
Framework	pytorch

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding


Title	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Authors	Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
Abstract	We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
Tasks	Common Sense Reasoning, Cross-Lingual Natural Language Inference, Named Entity Recognition, Natural Language Inference, Question Answering, Sentence Classification
Published	2018-10-11
URL	https://arxiv.org/abs/1810.04805v2
PDF	https://arxiv.org/pdf/1810.04805v2.pdf
PWC	https://paperswithcode.com/paper/bert-pre-training-of-deep-bidirectional
Repo	https://github.com/pfecht/bert-exploration
Framework	pytorch