October 20, 2019

2870 words 14 mins read

Paper Group AWR 171

Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction. Mixtures of Skewed Matrix Variate Bilinear Factor Analyzers. Abstractive Summarization Using Attentive Neural Techniques. Applying Machine Learning To Maize Traits Prediction. Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search. Numeracy …

Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction


Title	Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction
Authors	Kazuma Hashimoto, Yoshimasa Tsuruoka
Abstract	A major obstacle in reinforcement learning-based sentence generation is the large action space whose size is equal to the vocabulary size of the target-side language. To improve the efficiency of reinforcement learning, we present a novel approach for reducing the action space based on dynamic vocabulary prediction. Our method first predicts a fixed-size small vocabulary for each input to generate its target sentence. The input-specific vocabularies are then used at supervised and reinforcement learning steps, and also at test time. In our experiments on six machine translation and two image captioning datasets, our method achieves faster reinforcement learning ($\sim$2.7x faster) with less GPU memory ($\sim$2.3x less) than the full-vocabulary counterpart. The reinforcement learning with our method consistently leads to significant improvement of BLEU scores, and the scores are equal to or better than those of baselines using the full vocabularies, with faster decoding time ($\sim$3x faster) on CPUs.
Tasks	Image Captioning, Machine Translation
Published	2018-09-05
URL	http://arxiv.org/abs/1809.01694v2
PDF	http://arxiv.org/pdf/1809.01694v2.pdf
PWC	https://paperswithcode.com/paper/accelerated-reinforcement-learning-for
Repo	https://github.com/hassyGo/NLG-RL
Framework	pytorch

Mixtures of Skewed Matrix Variate Bilinear Factor Analyzers


Title	Mixtures of Skewed Matrix Variate Bilinear Factor Analyzers
Authors	Michael P. B. Gallaugher, Paul D. McNicholas
Abstract	In recent years, data have become increasingly higher dimensional and, therefore, an increased need has arisen for dimension reduction techniques for clustering. Although such techniques are firmly established in the literature for multivariate data, there is a relative paucity in the area of matrix variate, or three-way, data. Furthermore, the few methods that are available all assume matrix variate normality, which is not always sensible if cluster skewness or excess kurtosis is present. Mixtures of bilinear factor analyzers using skewed matrix variate distributions are proposed. In all, four such mixture models are presented, based on matrix variate skew-t, generalized hyperbolic, variance-gamma, and normal inverse Gaussian distributions, respectively.
Tasks	Dimensionality Reduction
Published	2018-09-07
URL	https://arxiv.org/abs/1809.02385v3
PDF	https://arxiv.org/pdf/1809.02385v3.pdf
PWC	https://paperswithcode.com/paper/mixtures-of-skewed-matrix-variate-bilinear
Repo	https://github.com/nikpocuca/MatrixVariate.jl
Framework	none

Abstractive Summarization Using Attentive Neural Techniques


Title	Abstractive Summarization Using Attentive Neural Techniques
Authors	Jacob Krantz, Jugal Kalita
Abstract	In a world of proliferating data, the ability to rapidly summarize text is growing in importance. Automatic summarization of text can be thought of as a sequence to sequence problem. Another area of natural language processing that solves a sequence to sequence problem is machine translation, which is rapidly evolving due to the development of attention-based encoder-decoder networks. This work applies these modern techniques to abstractive summarization. We perform analysis on various attention mechanisms for summarization with the goal of developing an approach and architecture aimed at improving the state of the art. In particular, we modify and optimize a translation model with self-attention for generating abstractive sentence summaries. The effectiveness of this base model along with attention variants is compared and analyzed in the context of standardized evaluation sets and test metrics. However, we show that these metrics are limited in their ability to effectively score abstractive summaries, and propose a new approach based on the intuition that an abstractive model requires an abstractive evaluation.
Tasks	Abstractive Text Summarization, Machine Translation
Published	2018-10-20
URL	http://arxiv.org/abs/1810.08838v1
PDF	http://arxiv.org/pdf/1810.08838v1.pdf
PWC	https://paperswithcode.com/paper/abstractive-summarization-using-attentive
Repo	https://github.com/jacobkrantz/VertMetric
Framework	none

Applying Machine Learning To Maize Traits Prediction


Title	Applying Machine Learning To Maize Traits Prediction
Authors	Binbin Shi, Xupeng Chen
Abstract	Heterosis is the improved or increased function of any biological quality in a hybrid offspring. We have studied yet the largest maize SNP dataset for traits prediction. We develop linear and non-linear models which consider relationships between different hybrids as well as other effect. Specially designed model proved to be efficient and robust in prediction maize’s traits.
Tasks
Published	2018-08-20
URL	http://arxiv.org/abs/1808.06275v1
PDF	http://arxiv.org/pdf/1808.06275v1.pdf
PWC	https://paperswithcode.com/paper/applying-machine-learning-to-maize-traits
Repo	https://github.com/james20141606/eMaize
Framework	none

Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search


Title	Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search
Authors	Elias Jääsaari, Ville Hyvönen, Teemu Roos
Abstract	Approximate nearest neighbor algorithms are used to speed up nearest neighbor search in a wide array of applications. However, current indexing methods feature several hyperparameters that need to be tuned to reach an acceptable accuracy–speed trade-off. A grid search in the parameter space is often impractically slow due to a time-consuming index-building procedure. Therefore, we propose an algorithm for automatically tuning the hyperparameters of indexing methods based on randomized space-partitioning trees. In particular, we present results using randomized k-d trees, random projection trees and randomized PCA trees. The tuning algorithm adds minimal overhead to the index-building process but is able to find the optimal hyperparameters accurately. We demonstrate that the algorithm is significantly faster than existing approaches, and that the indexing methods used are competitive with the state-of-the-art methods in query time while being faster to build.
Tasks
Published	2018-12-18
URL	http://arxiv.org/abs/1812.07484v1
PDF	http://arxiv.org/pdf/1812.07484v1.pdf
PWC	https://paperswithcode.com/paper/efficient-autotuning-of-hyperparameters-in
Repo	https://github.com/vioshyvo/mrpt
Framework	none

Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers


Title	Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers
Authors	Georgios P. Spithourakis, Sebastian Riedel
Abstract	Numeracy is the ability to understand and work with numbers. It is a necessary skill for composing and understanding documents in clinical, scientific, and other technical domains. In this paper, we explore different strategies for modelling numerals with language models, such as memorisation and digit-by-digit composition, and propose a novel neural architecture that uses a continuous probability density function to model numerals from an open vocabulary. Our evaluation on clinical and scientific datasets shows that using hierarchical models to distinguish numerals from words improves a perplexity metric on the subset of numerals by 2 and 4 orders of magnitude, respectively, over non-hierarchical models. A combination of strategies can further improve perplexity. Our continuous probability density function model reduces mean absolute percentage errors by 18% and 54% in comparison to the second best strategy for each dataset, respectively.
Tasks
Published	2018-05-21
URL	http://arxiv.org/abs/1805.08154v1
PDF	http://arxiv.org/pdf/1805.08154v1.pdf
PWC	https://paperswithcode.com/paper/numeracy-for-language-models-evaluating-and
Repo	https://github.com/uclmr/numerate-language-models
Framework	none

Recurrent Slice Networks for 3D Segmentation of Point Clouds


Title	Recurrent Slice Networks for 3D Segmentation of Point Clouds
Authors	Qiangui Huang, Weiyue Wang, Ulrich Neumann
Abstract	Point clouds are an efficient data format for 3D data. However, existing 3D segmentation methods for point clouds either do not model local dependencies \cite{pointnet} or require added computations \cite{kd-net,pointnet2}. This work presents a novel 3D segmentation framework, RSNet\footnote{Codes are released here https://github.com/qianguih/RSNet}, to efficiently model local structures in point clouds. The key component of the RSNet is a lightweight local dependency module. It is a combination of a novel slice pooling layer, Recurrent Neural Network (RNN) layers, and a slice unpooling layer. The slice pooling layer is designed to project features of unordered points onto an ordered sequence of feature vectors so that traditional end-to-end learning algorithms (RNNs) can be applied. The performance of RSNet is validated by comprehensive experiments on the S3DIS\cite{stanford}, ScanNet\cite{scannet}, and ShapeNet \cite{shapenet} datasets. In its simplest form, RSNets surpass all previous state-of-the-art methods on these benchmarks. And comparisons against previous state-of-the-art methods \cite{pointnet, pointnet2} demonstrate the efficiency of RSNets.
Tasks
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04402v2
PDF	http://arxiv.org/pdf/1802.04402v2.pdf
PWC	https://paperswithcode.com/paper/recurrent-slice-networks-for-3d-segmentation
Repo	https://github.com/qianguih/RSNet
Framework	pytorch

MobileFaceNets: Efficient CNNs for Accurate Real-Time Face Verification on Mobile Devices


Title	MobileFaceNets: Efficient CNNs for Accurate Real-Time Face Verification on Mobile Devices
Authors	Sheng Chen, Yang Liu, Xiang Gao, Zhen Han
Abstract	Face Analysis Project on MXNet
Tasks	Face Verification
Published	2018-04-20
URL	http://arxiv.org/abs/1804.07573v4
PDF	http://arxiv.org/pdf/1804.07573v4.pdf
PWC	https://paperswithcode.com/paper/mobilefacenets-efficient-cnns-for-accurate
Repo	https://github.com/deepinsight/insightface
Framework	mxnet

Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration


Title	Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Authors	Evan Zheran Liu, Kelvin Guu, Panupong Pasupat, Tianlin Shi, Percy Liang
Abstract	Reinforcement learning (RL) agents improve through trial-and-error, but when reward is sparse and the agent cannot discover successful action sequences, learning stagnates. This has been a notable problem in training deep RL agents to perform web-based tasks, such as booking flights or replying to emails, where a single mistake can ruin the entire sequence of actions. A common remedy is to “warm-start” the agent by pre-training it to mimic expert demonstrations, but this is prone to overfitting. Instead, we propose to constrain exploration using demonstrations. From each demonstration, we induce high-level “workflows” which constrain the allowable actions at each time step to be similar to those in the demonstration (e.g., “Step 1: click on a textbox; Step 2: enter some text”). Our exploration policy then learns to identify successful workflows and samples actions that satisfy these workflows. Workflows prune out bad exploration directions and accelerate the agent’s ability to discover rewards. We use our approach to train a novel neural policy designed to handle the semi-structured nature of websites, and evaluate on a suite of web tasks, including the recent World of Bits benchmark. We achieve new state-of-the-art results, and show that workflow-guided exploration improves sample efficiency over behavioral cloning by more than 100x.
Tasks
Published	2018-02-24
URL	http://arxiv.org/abs/1802.08802v1
PDF	http://arxiv.org/pdf/1802.08802v1.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-on-web-interfaces
Repo	https://github.com/zbyte64/pytorch-fuzzdom
Framework	pytorch

Generative Adversarial Active Learning for Unsupervised Outlier Detection


Title	Generative Adversarial Active Learning for Unsupervised Outlier Detection
Authors	Yezheng Liu, Zhe Li, Chong Zhou, Yuanchun Jiang, Jianshan Sun, Meng Wang, Xiangnan He
Abstract	Outlier detection is an important topic in machine learning and has been used in a wide range of applications. In this paper, we approach outlier detection as a binary-classification issue by sampling potential outliers from a uniform reference distribution. However, due to the sparsity of data in high-dimensional space, a limited number of potential outliers may fail to provide sufficient information to assist the classifier in describing a boundary that can separate outliers from normal data effectively. To address this, we propose a novel Single-Objective Generative Adversarial Active Learning (SO-GAAL) method for outlier detection, which can directly generate informative potential outliers based on the mini-max game between a generator and a discriminator. Moreover, to prevent the generator from falling into the mode collapsing problem, the stop node of training should be determined when SO-GAAL is able to provide sufficient information. But without any prior information, it is extremely difficult for SO-GAAL. Therefore, we expand the network structure of SO-GAAL from a single generator to multiple generators with different objectives (MO-GAAL), which can generate a reasonable reference distribution for the whole dataset. We empirically compare the proposed approach with several state-of-the-art outlier detection methods on both synthetic and real-world datasets. The results show that MO-GAAL outperforms its competitors in the majority of cases, especially for datasets with various cluster types or high irrelevant variable ratio.
Tasks	Active Learning, Outlier Detection
Published	2018-09-28
URL	http://arxiv.org/abs/1809.10816v4
PDF	http://arxiv.org/pdf/1809.10816v4.pdf
PWC	https://paperswithcode.com/paper/generative-adversarial-active-learning-for
Repo	https://github.com/leibinghe/GAAL-based-outlier-detection
Framework	tf

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition


Title	StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
Authors	Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Limin Wang, Shilei Wen
Abstract	Despite the success of deep learning for static image understanding, it remains unclear what are the most effective network architectures for the spatial-temporal modeling in videos. In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatial temporal network (StNet) architecture for both local and global spatial-temporal modeling in videos. Particularly, StNet stacks N successive video frames into a \emph{super-image} which has 3N channels and applies 2D convolution on super-images to capture local spatial-temporal relationship. To model global spatial-temporal relationship, we apply temporal convolution on the local spatial-temporal feature maps. Specifically, a novel temporal Xception block is proposed in StNet. It employs a separate channel-wise and temporal-wise convolution over the feature sequence of video. Extensive experiments on the Kinetics dataset demonstrate that our framework outperforms several state-of-the-art approaches in action recognition and can strike a satisfying trade-off between recognition accuracy and model complexity. We further demonstrate the generalization performance of the leaned video representations on the UCF101 dataset.
Tasks	Temporal Action Localization
Published	2018-11-05
URL	http://arxiv.org/abs/1811.01549v3
PDF	http://arxiv.org/pdf/1811.01549v3.pdf
PWC	https://paperswithcode.com/paper/stnet-local-and-global-spatial-temporal
Repo	https://github.com/hyperfraise/StNet
Framework	pytorch

Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation


Title	Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation
Authors	Antonio Toral, Sheila Castilho, Ke Hu, Andy Way
Abstract	We reassess a recent study (Hassan et al., 2018) that claimed that machine translation (MT) has reached human parity for the translation of news from Chinese into English, using pairwise ranking and considering three variables that were not taken into account in that previous study: the language in which the source side of the test set was originally written, the translation proficiency of the evaluators, and the provision of inter-sentential context. If we consider only original source text (i.e. not translated from another language, or translationese), then we find evidence showing that human parity has not been achieved. We compare the judgments of professional translators against those of non-experts and discover that those of the experts result in higher inter-annotator agreement and better discrimination between human and machine translations. In addition, we analyse the human translations of the test set and identify important translation issues. Finally, based on these findings, we provide a set of recommendations for future human evaluations of MT.
Tasks	Machine Translation
Published	2018-08-30
URL	http://arxiv.org/abs/1808.10432v1
PDF	http://arxiv.org/pdf/1808.10432v1.pdf
PWC	https://paperswithcode.com/paper/attaining-the-unattainable-reassessing-claims
Repo	https://github.com/antot/human_parity_mt
Framework	none

Link Prediction Based on Graph Neural Networks


Title	Link Prediction Based on Graph Neural Networks
Authors	Muhan Zhang, Yixin Chen
Abstract	Link prediction is a key problem for network-structured data. Link prediction heuristics use some score functions, such as common neighbors and Katz index, to measure the likelihood of links. They have obtained wide practical uses due to their simplicity, interpretability, and for some of them, scalability. However, every heuristic has a strong assumption on when two nodes are likely to link, which limits their effectiveness on networks where these assumptions fail. In this regard, a more reasonable way should be learning a suitable heuristic from a given network instead of using predefined ones. By extracting a local subgraph around each target link, we aim to learn a function mapping the subgraph patterns to link existence, thus automatically learning a `heuristic’ that suits the current network. In this paper, we study this heuristic learning paradigm for link prediction. First, we develop a novel $\gamma$-decaying heuristic theory. The theory unifies a wide range of heuristics in a single framework, and proves that all these heuristics can be well approximated from local subgraphs. Our results show that local subgraphs reserve rich information related to link existence. Second, based on the $\gamma$-decaying theory, we propose a new algorithm to learn heuristics from local subgraphs using a graph neural network (GNN). Its experimental results show unprecedented performance, working consistently well on a wide range of problems. \|
Tasks	Link Prediction
Published	2018-02-27
URL	http://arxiv.org/abs/1802.09691v3
PDF	http://arxiv.org/pdf/1802.09691v3.pdf
PWC	https://paperswithcode.com/paper/link-prediction-based-on-graph-neural
Repo	https://github.com/muhanzhang/SEAL
Framework	none

Long Short-Term Memory with Dynamic Skip Connections


Title	Long Short-Term Memory with Dynamic Skip Connections
Authors	Tao Gui, Qi Zhang, Lujun Zhao, Yaosong Lin, Minlong Peng, Jingjing Gong, Xuanjing Huang
Abstract	In recent years, long short-term memory (LSTM) has been successfully used to model sequential data of variable length. However, LSTM can still experience difficulty in capturing long-term dependencies. In this work, we tried to alleviate this problem by introducing a dynamic skip connection, which can learn to directly connect two dependent words. Since there is no dependency information in the training data, we propose a novel reinforcement learning-based method to model the dependency relationship and connect dependent words. The proposed model computes the recurrent transition functions based on the skip connections, which provides a dynamic skipping advantage over RNNs that always tackle entire sentences sequentially. Our experimental results on three natural language processing tasks demonstrate that the proposed method can achieve better performance than existing methods. In the number prediction experiment, the proposed model outperformed LSTM with respect to accuracy by nearly 20%.
Tasks	Named Entity Recognition, Sentiment Analysis
Published	2018-11-09
URL	http://arxiv.org/abs/1811.03873v1
PDF	http://arxiv.org/pdf/1811.03873v1.pdf
PWC	https://paperswithcode.com/paper/long-short-term-memory-with-dynamic-skip
Repo	https://github.com/lecholin/DynamicLSTM
Framework	tf

Recurrent CNN for 3D Gaze Estimation using Appearance and Shape Cues


Title	Recurrent CNN for 3D Gaze Estimation using Appearance and Shape Cues
Authors	Cristina Palmero, Javier Selva, Mohammad Ali Bagheri, Sergio Escalera
Abstract	Gaze behavior is an important non-verbal cue in social signal processing and human-computer interaction. In this paper, we tackle the problem of person- and head pose-independent 3D gaze estimation from remote cameras, using a multi-modal recurrent convolutional neural network (CNN). We propose to combine face, eyes region, and face landmarks as individual streams in a CNN to estimate gaze in still images. Then, we exploit the dynamic nature of gaze by feeding the learned features of all the frames in a sequence to a many-to-one recurrent module that predicts the 3D gaze vector of the last frame. Our multi-modal static solution is evaluated on a wide range of head poses and gaze directions, achieving a significant improvement of 14.6% over the state of the art on EYEDIAP dataset, further improved by 4% when the temporal modality is included.
Tasks	Gaze Estimation
Published	2018-05-08
URL	http://arxiv.org/abs/1805.03064v3
PDF	http://arxiv.org/pdf/1805.03064v3.pdf
PWC	https://paperswithcode.com/paper/recurrent-cnn-for-3d-gaze-estimation-using
Repo	https://github.com/crisie/CRNN-Gaze
Framework	tf