October 16, 2019

2888 words 14 mins read

Paper Group NAWR 15

HFT-CNN: Learning Hierarchical Category Structure for Multi-label Short Text Categorization. Neural Sign Language Translation. Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline. MGAN: Training Generative Adversarial Nets with Multiple Generators. Evaluation Phonemic Transcription of Low-Resource Tonal Languages for Language …

HFT-CNN: Learning Hierarchical Category Structure for Multi-label Short Text Categorization


Title	HFT-CNN: Learning Hierarchical Category Structure for Multi-label Short Text Categorization
Authors	Kazuya Shimura, Jiyi Li, Fumiyo Fukumoto
Abstract	We focus on the multi-label categorization task for short texts and explore the use of a hierarchical structure (HS) of categories. In contrast to the existing work using non-hierarchical flat model, the method leverages the hierarchical relations between the pre-defined categories to tackle the data sparsity problem. The lower the HS level, the less the categorization performance. Because the number of training data per category in a lower level is much smaller than that in an upper level. We propose an approach which can effectively utilize the data in the upper levels to contribute the categorization in the lower levels by applying the Convolutional Neural Network (CNN) with a fine-tuning technique. The results using two benchmark datasets show that proposed method, Hierarchical Fine-Tuning based CNN (HFT-CNN) is competitive with the state-of-the-art CNN based methods.
Tasks	Extreme Multi-Label Classification, Multi-Label Classification, Text Categorization
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1093/
PDF	https://www.aclweb.org/anthology/D18-1093
PWC	https://paperswithcode.com/paper/hft-cnn-learning-hierarchical-category
Repo	https://github.com/ShimShim46/HFT-CNN
Framework	none

Neural Sign Language Translation


Title	Neural Sign Language Translation
Authors	Necati Cihan Camgoz, Simon Hadfield, Oscar Koller, Hermann Ney, Richard Bowden
Abstract	Sign Language Recognition (SLR) has been an active research field for the last two decades. However, most research to date has considered SLR as a naive gesture recognition problem. SLR seeks to recognize a sequence of continuous signs but neglects the underlying rich grammatical and linguistic structures of sign language that differ from spoken language. In contrast, we introduce the Sign Language Translation (SLT) problem. Here, the objective is to generate spoken language translations from sign language videos, taking into account the different word orders and grammar. We formalize SLT in the framework of Neural Machine Translation (NMT) for both end-to-end and pretrained settings (using expert knowledge). This allows us to jointly learn the spatial representations, the underlying language model, and the mapping between sign and spoken language. To evaluate the performance of Neural SLT, we collected the first publicly available Continuous SLT dataset, RWTH-PHOENIX-Weather 2014T. It provides spoken language translations and gloss level annotations for German Sign Language videos of weather broadcasts. Our dataset contains over .95M frames with >67K signs from a sign vocabulary of >1K and >99K words from a German vocabulary of >2.8K. We report quantitative and qualitative results for various SLT setups to underpin future research in this newly established field. The upper bound for translation performance is calculated at 19.26 BLEU-4, while our end-to-end frame-level and gloss-level tokenization networks were able to achieve 9.58 and 18.13 respectively.
Tasks	Gesture Recognition, Language Modelling, Machine Translation, Sign Language Recognition, Sign Language Translation, Tokenization
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Camgoz_Neural_Sign_Language_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Camgoz_Neural_Sign_Language_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/neural-sign-language-translation
Repo	https://github.com/neccam/nslt
Framework	tf

Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline


Title	Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline
Authors	Kawin Ethayarajh
Abstract	Using a random walk model of text generation, Arora et al. (2017) proposed a strong baseline for computing sentence embeddings: take a weighted average of word embeddings and modify with SVD. This simple method even outperforms far more complex approaches such as LSTMs on textual similarity tasks. In this paper, we first show that word vector length has a confounding effect on the probability of a sentence being generated in Arora et al.{'}s model. We propose a random walk model that is robust to this confound, where the probability of word generation is inversely related to the angular distance between the word and sentence embeddings. Our approach beats Arora et al.{'}s by up to 44.4{%} on textual similarity tasks and is competitive with state-of-the-art methods. Unlike Arora et al.{'}s method, ours requires no hyperparameter tuning, which means it can be used when there is no labelled data.
Tasks	Representation Learning, Sentence Embeddings, Text Generation, Transfer Learning, Word Embeddings
Published	2018-07-01
URL	https://www.aclweb.org/anthology/W18-3012/
PDF	https://www.aclweb.org/anthology/W18-3012
PWC	https://paperswithcode.com/paper/unsupervised-random-walk-sentence-embeddings
Repo	https://github.com/kawine/usif
Framework	none

MGAN: Training Generative Adversarial Nets with Multiple Generators


Title	MGAN: Training Generative Adversarial Nets with Multiple Generators
Authors	Quan Hoang, Tu Dinh Nguyen, Trung Le, Dinh Phung
Abstract	We propose in this paper a new approach to train the Generative Adversarial Nets (GANs) with a mixture of generators to overcome the mode collapsing problem. The main intuition is to employ multiple generators, instead of using a single one as in the original GAN. The idea is simple, yet proven to be extremely effective at covering diverse data modes, easily overcoming the mode collapsing problem and delivering state-of-the-art results. A minimax formulation was able to establish among a classifier, a discriminator, and a set of generators in a similar spirit with GAN. Generators create samples that are intended to come from the same distribution as the training data, whilst the discriminator determines whether samples are true data or generated by generators, and the classifier specifies which generator a sample comes from. The distinguishing feature is that internal samples are created from multiple generators, and then one of them will be randomly selected as final output similar to the mechanism of a probabilistic mixture model. We term our method Mixture Generative Adversarial Nets (MGAN). We develop theoretical analysis to prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between the mixture of generators’ distributions and the empirical data distribution is minimal, whilst the JSD among generators’ distributions is maximal, hence effectively avoiding the mode collapsing problem. By utilizing parameter sharing, our proposed model adds minimal computational cost to the standard GAN, and thus can also efficiently scale to large-scale datasets. We conduct extensive experiments on synthetic 2D data and natural image databases (CIFAR-10, STL-10 and ImageNet) to demonstrate the superior performance of our MGAN in achieving state-of-the-art Inception scores over latest baselines, generating diverse and appealing recognizable objects at different resolutions, and specializing in capturing different types of objects by the generators.
Tasks
Published	2018-01-01
URL	https://openreview.net/forum?id=rkmu5b0a-
PDF	https://openreview.net/pdf?id=rkmu5b0a-
PWC	https://paperswithcode.com/paper/mgan-training-generative-adversarial-nets
Repo	https://github.com/qhoangdl/MGAN
Framework	tf

Evaluation Phonemic Transcription of Low-Resource Tonal Languages for Language Documentation


Title	Evaluation Phonemic Transcription of Low-Resource Tonal Languages for Language Documentation
Authors	Oliver Adams, Trevor Cohn, Graham Neubig, Hilaria Cruz, Steven Bird, Alexis Michaud
Abstract
Tasks	Acoustic Modelling, Language Modelling, Speech Recognition
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1530/
PDF	https://www.aclweb.org/anthology/L18-1530
PWC	https://paperswithcode.com/paper/evaluation-phonemic-transcription-of-low
Repo	https://github.com/oadams/persephone
Framework	tf

Progressive Attention Guided Recurrent Network for Salient Object Detection


Title	Progressive Attention Guided Recurrent Network for Salient Object Detection
Authors	Xiaoning Zhang, Tiantian Wang, Jinqing Qi, Huchuan Lu, Gang Wang
Abstract	Effective convolutional features play an important role in saliency estimation but how to learn powerful features for saliency is still a challenging task. FCN-based methods directly apply multi-level convolutional features without distinction, which leads to sub-optimal results due to the distraction from redundant details. In this paper, we propose a novel attention guided network which selectively integrates multi-level contextual information in a progressive manner. Attentive features generated by our network can alleviate distraction of background thus achieve better performance. On the other hand, it is observed that most of existing algorithms conduct salient object detection by exploiting side-output features of the backbone feature extraction network. However, shallower layers of backbone network lack the ability to obtain global semantic information, which limits the effective feature learning. To address the problem, we introduce multi-path recurrent feedback to enhance our proposed progressive attention driven framework. Through multi-path recurrent connections, global semantic information from the top convolutional layer is transferred to shallower layers, which intrinsically refines the entire network. Experimental results on six benchmark datasets demonstrate that our algorithm performs favorably against the state-of-the-art approaches.
Tasks	Object Detection, Saliency Prediction, Salient Object Detection
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_Progressive_Attention_Guided_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_Progressive_Attention_Guided_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/progressive-attention-guided-recurrent
Repo	https://github.com/zhangxiaoning666/PAGR
Framework	none

Wide Contextual Residual Network with Active Learning for Remote Sensing Image Classification


Title	Wide Contextual Residual Network with Active Learning for Remote Sensing Image Classification
Authors	Shengjie Liu, Haowen Luo, Ying Tu, Zhi He, Jun Li
Abstract	In this paper, we propose a wide contextual residual network (WCRN) with active learning (AL) for remote sensing image (RSI) classification. Although ResNets have achieved great success in various applications (e.g. RSI classification), its performance is limited by the requirement of abundant labeled samples. As it is very difficult and expensive to obtain class labels in real world, we integrate the proposed WCRN with AL to improve its generalization by using the most informative training samples. Specifically, we first design a wide contextual residual network for RSI classification. We then integrate it with AL to achieve good machine generalization with limited number of training sampling. Experimental results on the University of Pavia and Flevoland datasets demonstrate that the proposed WCRN with AL can significantly reduce the needs of samples.
Tasks	Active Learning, Classification Of Hyperspectral Images, Hyperspectral Image Classification, Image Classification, Remote Sensing Image Classification
Published	2018-07-22
URL	https://ieeexplore.ieee.org/document/8517855
PDF	https://www.researchgate.net/publication/328991664_Wide_Contextual_Residual_Network_with_Active_Learning_for_Remote_Sensing_Image_Classification
PWC	https://paperswithcode.com/paper/wide-contextual-residual-network-with-active
Repo	https://github.com/codeRimoe/DL_for_RSIs
Framework	tf

Training RNNs as Fast as CNNs


Title	Training RNNs as Fast as CNNs
Authors	Tao Lei, Yu Zhang, Yoav Artzi
Abstract	Common recurrent neural network architectures scale poorly due to the intrinsic difficulty in parallelizing their state computations. In this work, we propose the Simple Recurrent Unit (SRU) architecture, a recurrent unit that simplifies the computation and exposes more parallelism. In SRU, the majority of computation for each step is independent of the recurrence and can be easily parallelized. SRU is as fast as a convolutional layer and 5-10x faster than an optimized LSTM implementation. We study SRUs on a wide range of applications, including classification, question answering, language modeling, translation and speech recognition. Our experiments demonstrate the effectiveness of SRU and the trade-off it enables between speed and performance.
Tasks	Language Modelling, Question Answering, Speech Recognition
Published	2018-01-01
URL	https://openreview.net/forum?id=rJBiunlAW
PDF	https://openreview.net/pdf?id=rJBiunlAW
PWC	https://paperswithcode.com/paper/training-rnns-as-fast-as-cnns
Repo	https://github.com/asappresearch/sru
Framework	pytorch

Extracting Commonsense Properties from Embeddings with Limited Human Guidance


Title	Extracting Commonsense Properties from Embeddings with Limited Human Guidance
Authors	Yiben Yang, Larry Birnbaum, Ji-Ping Wang, Doug Downey
Abstract	Intelligent systems require common sense, but automatically extracting this knowledge from text can be difficult. We propose and assess methods for extracting one type of commonsense knowledge, object-property comparisons, from pre-trained embeddings. In experiments, we show that our approach exceeds the accuracy of previous work but requires substantially less hand-annotated knowledge. Further, we show that an active learning approach that synthesizes common-sense queries can boost accuracy.
Tasks	Active Learning, Common Sense Reasoning, Zero-Shot Learning
Published	2018-07-01
URL	https://www.aclweb.org/anthology/P18-2102/
PDF	https://www.aclweb.org/anthology/P18-2102
PWC	https://paperswithcode.com/paper/extracting-commonsense-properties-from
Repo	https://github.com/yangyiben/PCE
Framework	pytorch

Deep Diffeomorphic Transformer Networks


Title	Deep Diffeomorphic Transformer Networks
Authors	Nicki Skafte Detlefsen, Oren Freifeld, SÃ¸ren Hauberg
Abstract	Spatial Transformer layers allow neural networks, at least in principle, to be invariant to large spatial transformations in image data. The model has, however, seen limited uptake as most practical implementations support only transformations that are too restricted, e.g. affine or homographic maps, and/or destructive maps, such as thin plate splines. We investigate the use of ï¬exible diffeomorphic image transformations within such networks and demonstrate that significant performance gains can be attained over currently-used models. The learned transformations are found to be both simple and intuitive, thereby providing insights into individual problem domains. With the proposed framework, a standard convolutional neural network matches state-of-the-art results on face veriï¬cation with only two extra lines of simple TensorFlow code.
Tasks
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Detlefsen_Deep_Diffeomorphic_Transformer_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Detlefsen_Deep_Diffeomorphic_Transformer_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/deep-diffeomorphic-transformer-networks
Repo	https://github.com/SkafteNicki/ddtn
Framework	tf

How Time Matters: Learning Time-Decay Attention for Contextual Spoken Language Understanding in Dialogues


Title	How Time Matters: Learning Time-Decay Attention for Contextual Spoken Language Understanding in Dialogues
Authors	Shang-Yu Su, Pei-Chieh Yuan, Yun-Nung Chen
Abstract	Spoken language understanding (SLU) is an essential component in conversational systems. Most SLU components treats each utterance independently, and then the following components aggregate the multi-turn information in the separate phases. In order to avoid error propagation and effectively utilize contexts, prior work leveraged history for contextual SLU. However, most previous models only paid attention to the related content in history utterances, ignoring their temporal information. In the dialogues, it is intuitive that the most recent utterances are more important than the least recent ones, in other words, time-aware attention should be in a decaying manner. Therefore, this paper designs and investigates various types of time-decay attention on the sentence-level and speaker-level, and further proposes a flexible universal time-decay attention mechanism. The experiments on the benchmark Dialogue State Tracking Challenge (DSTC4) dataset show that the proposed time-decay attention mechanisms significantly improve the state-of-the-art model for contextual understanding performance.
Tasks	Dialogue State Tracking, Image Captioning, Machine Translation, Slot Filling, Spoken Dialogue Systems, Spoken Language Understanding
Published	2018-06-01
URL	https://www.aclweb.org/anthology/N18-1194/
PDF	https://www.aclweb.org/anthology/N18-1194
PWC	https://paperswithcode.com/paper/how-time-matters-learning-time-decay
Repo	https://github.com/MiuLab/Time-Decay-SLU
Framework	tf

Deep Photo Enhancer: Unpaired Learning for Image Enhancement From Photographs With GANs


Title	Deep Photo Enhancer: Unpaired Learning for Image Enhancement From Photographs With GANs
Authors	Yu-Sheng Chen, Yu-Ching Wang, Man-Hsin Kao, Yung-Yu Chuang
Abstract	This paper proposes an unpaired learning method for image enhancement. Given a set of photographs with the desired characteristics, the proposed method learns a photo enhancer which transforms an input image into an enhanced image with those characteristics. The method is based on the framework of two-way generative adversarial networks (GANs) with several improvements. First, we augment the U-Net with global features and show that it is more effective. The global U-Net acts as the generator in our GAN model. Second, we improve Wasserstein GAN (WGAN) with an adaptive weighting scheme. With this scheme, training converges faster and better, and is less sensitive to parameters than WGAN-GP. Finally, we propose to use individual batch normalization layers for generators in two-way GANs. It helps generators better adapt to their own input distributions. All together, they significantly improve the stability of GAN training for our application. Both quantitative and visual results show that the proposed method is effective for enhancing images.
Tasks	Image Enhancement
Published	2018-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2018/html/Chen_Deep_Photo_Enhancer_CVPR_2018_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2018/papers/Chen_Deep_Photo_Enhancer_CVPR_2018_paper.pdf
PWC	https://paperswithcode.com/paper/deep-photo-enhancer-unpaired-learning-for
Repo	https://github.com/nothinglo/Deep-Photo-Enhancer
Framework	tf

Deep One-Class Classification


Title	Deep One-Class Classification
Authors	Lukas Ruff, Robert Vandermeulen, Nico Goernitz, Lucas Deecke, Shoaib Ahmed Siddiqui, Alexander Binder, Emmanuel Müller, Marius Kloft
Abstract	Despite the great advances made by deep learning in many machine learning problems, there is a relative dearth of deep learning approaches for anomaly detection. Those approaches which do exist involve networks trained to perform a task other than anomaly detection, namely generative models or compression, which are in turn adapted for use in anomaly detection; they are not trained on an anomaly detection based objective. In this paper we introduce a new anomaly detection method—Deep Support Vector Data Description—, which is trained on an anomaly detection based objective. The adaptation to the deep regime necessitates that our neural network and training procedure satisfy certain properties, which we demonstrate theoretically. We show the effectiveness of our method on MNIST and CIFAR-10 image benchmark datasets as well as on the detection of adversarial examples of GTSRB stop signs.
Tasks	Anomaly Detection
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2483
PDF	http://proceedings.mlr.press/v80/ruff18a/ruff18a.pdf
PWC	https://paperswithcode.com/paper/deep-one-class-classification
Repo	https://github.com/lukasruff/Deep-SVDD
Framework	pytorch

Active Learning for Non-Parametric Regression Using Purely Random Trees


Title	Active Learning for Non-Parametric Regression Using Purely Random Trees
Authors	Jack Goetz, Ambuj Tewari, Paul Zimmerman
Abstract	Active learning is the task of using labelled data to select additional points to label, with the goal of fitting the most accurate model with a fixed budget of labelled points. In binary classification active learning is known to produce faster rates than passive learning for a broad range of settings. However in regression restrictive structure and tailored methods were previously needed to obtain theoretically superior performance. In this paper we propose an intuitive tree based active learning algorithm for non-parametric regression with provable improvement over random sampling. When implemented with Mondrian Trees our algorithm is tuning parameter free, consistent and minimax optimal for Lipschitz functions.
Tasks	Active Learning
Published	2018-12-01
URL	http://papers.nips.cc/paper/7520-active-learning-for-non-parametric-regression-using-purely-random-trees
PDF	http://papers.nips.cc/paper/7520-active-learning-for-non-parametric-regression-using-purely-random-trees.pdf
PWC	https://paperswithcode.com/paper/active-learning-for-non-parametric-regression
Repo	https://github.com/jackrgoetz/Mondrian_Tree_AL
Framework	none

Graph Classification using Structural Attention


Title	Graph Classification using Structural Attention
Authors	John Boaz Lee, Ryan Rossi, Xiangnan Kong
Abstract	Graph classification is a problem with practical applications in many different domains. To solve this problem, one usually calculates certain graph statistics (i.e., graph features) that help discriminate between graphs of different classes. When calculating such features, most existing approaches process the entire graph. In a graphlet-based approach, for instance, the entire graph is processed to get the total count of different graphlets or subgraphs. In many real-world applications, however, graphs can be noisy with discriminative patterns confined to certain regions in the graph only. In this work, we study the problem of attention-based graph classification. The use of attention allows us to focus on small but informative parts of the graph, avoiding noise in the rest of the graph. We present a novel RNN model, called the Graph Attention Model (GAM), that processes only a portion of the graph by adaptively selecting a sequence of “informative” nodes. Experimental results on multiple real-world datasets show that the proposed method is competitive against various well-known methods in graph classification even though our method is limited to only a portion of the graph.
Tasks	Graph Classification
Published	2018-07-19
URL	https://www.kdd.org/kdd2018/accepted-papers/view/graph-classification-using-structural-attention
PDF	http://ryanrossi.com/pubs/KDD18-graph-attention-model.pdf
PWC	https://paperswithcode.com/paper/graph-classification-using-structural
Repo	https://github.com/benedekrozemberczki/GAM
Framework	pytorch