October 20, 2019

2870 words 14 mins read

Paper Group AWR 171

Paper Group AWR 171

Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction. Mixtures of Skewed Matrix Variate Bilinear Factor Analyzers. Abstractive Summarization Using Attentive Neural Techniques. Applying Machine Learning To Maize Traits Prediction. Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search. Numeracy …

Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction

Title Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction
Authors Kazuma Hashimoto, Yoshimasa Tsuruoka
Abstract A major obstacle in reinforcement learning-based sentence generation is the large action space whose size is equal to the vocabulary size of the target-side language. To improve the efficiency of reinforcement learning, we present a novel approach for reducing the action space based on dynamic vocabulary prediction. Our method first predicts a fixed-size small vocabulary for each input to generate its target sentence. The input-specific vocabularies are then used at supervised and reinforcement learning steps, and also at test time. In our experiments on six machine translation and two image captioning datasets, our method achieves faster reinforcement learning ($\sim$2.7x faster) with less GPU memory ($\sim$2.3x less) than the full-vocabulary counterpart. The reinforcement learning with our method consistently leads to significant improvement of BLEU scores, and the scores are equal to or better than those of baselines using the full vocabularies, with faster decoding time ($\sim$3x faster) on CPUs.
Tasks Image Captioning, Machine Translation
Published 2018-09-05
URL http://arxiv.org/abs/1809.01694v2
PDF http://arxiv.org/pdf/1809.01694v2.pdf
PWC https://paperswithcode.com/paper/accelerated-reinforcement-learning-for
Repo https://github.com/hassyGo/NLG-RL
Framework pytorch

Mixtures of Skewed Matrix Variate Bilinear Factor Analyzers

Title Mixtures of Skewed Matrix Variate Bilinear Factor Analyzers
Authors Michael P. B. Gallaugher, Paul D. McNicholas
Abstract In recent years, data have become increasingly higher dimensional and, therefore, an increased need has arisen for dimension reduction techniques for clustering. Although such techniques are firmly established in the literature for multivariate data, there is a relative paucity in the area of matrix variate, or three-way, data. Furthermore, the few methods that are available all assume matrix variate normality, which is not always sensible if cluster skewness or excess kurtosis is present. Mixtures of bilinear factor analyzers using skewed matrix variate distributions are proposed. In all, four such mixture models are presented, based on matrix variate skew-t, generalized hyperbolic, variance-gamma, and normal inverse Gaussian distributions, respectively.
Tasks Dimensionality Reduction
Published 2018-09-07
URL https://arxiv.org/abs/1809.02385v3
PDF https://arxiv.org/pdf/1809.02385v3.pdf
PWC https://paperswithcode.com/paper/mixtures-of-skewed-matrix-variate-bilinear
Repo https://github.com/nikpocuca/MatrixVariate.jl
Framework none

Abstractive Summarization Using Attentive Neural Techniques

Title Abstractive Summarization Using Attentive Neural Techniques
Authors Jacob Krantz, Jugal Kalita
Abstract In a world of proliferating data, the ability to rapidly summarize text is growing in importance. Automatic summarization of text can be thought of as a sequence to sequence problem. Another area of natural language processing that solves a sequence to sequence problem is machine translation, which is rapidly evolving due to the development of attention-based encoder-decoder networks. This work applies these modern techniques to abstractive summarization. We perform analysis on various attention mechanisms for summarization with the goal of developing an approach and architecture aimed at improving the state of the art. In particular, we modify and optimize a translation model with self-attention for generating abstractive sentence summaries. The effectiveness of this base model along with attention variants is compared and analyzed in the context of standardized evaluation sets and test metrics. However, we show that these metrics are limited in their ability to effectively score abstractive summaries, and propose a new approach based on the intuition that an abstractive model requires an abstractive evaluation.
Tasks Abstractive Text Summarization, Machine Translation
Published 2018-10-20
URL http://arxiv.org/abs/1810.08838v1
PDF http://arxiv.org/pdf/1810.08838v1.pdf
PWC https://paperswithcode.com/paper/abstractive-summarization-using-attentive
Repo https://github.com/jacobkrantz/VertMetric
Framework none

Applying Machine Learning To Maize Traits Prediction

Title Applying Machine Learning To Maize Traits Prediction
Authors Binbin Shi, Xupeng Chen
Abstract Heterosis is the improved or increased function of any biological quality in a hybrid offspring. We have studied yet the largest maize SNP dataset for traits prediction. We develop linear and non-linear models which consider relationships between different hybrids as well as other effect. Specially designed model proved to be efficient and robust in prediction maize’s traits.
Tasks
Published 2018-08-20
URL http://arxiv.org/abs/1808.06275v1
PDF http://arxiv.org/pdf/1808.06275v1.pdf
PWC https://paperswithcode.com/paper/applying-machine-learning-to-maize-traits
Repo https://github.com/james20141606/eMaize
Framework none
Title Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search
Authors Elias Jääsaari, Ville Hyvönen, Teemu Roos
Abstract Approximate nearest neighbor algorithms are used to speed up nearest neighbor search in a wide array of applications. However, current indexing methods feature several hyperparameters that need to be tuned to reach an acceptable accuracy–speed trade-off. A grid search in the parameter space is often impractically slow due to a time-consuming index-building procedure. Therefore, we propose an algorithm for automatically tuning the hyperparameters of indexing methods based on randomized space-partitioning trees. In particular, we present results using randomized k-d trees, random projection trees and randomized PCA trees. The tuning algorithm adds minimal overhead to the index-building process but is able to find the optimal hyperparameters accurately. We demonstrate that the algorithm is significantly faster than existing approaches, and that the indexing methods used are competitive with the state-of-the-art methods in query time while being faster to build.
Tasks
Published 2018-12-18
URL http://arxiv.org/abs/1812.07484v1
PDF http://arxiv.org/pdf/1812.07484v1.pdf
PWC https://paperswithcode.com/paper/efficient-autotuning-of-hyperparameters-in
Repo https://github.com/vioshyvo/mrpt
Framework none

Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers

Title Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers
Authors Georgios P. Spithourakis, Sebastian Riedel
Abstract Numeracy is the ability to understand and work with numbers. It is a necessary skill for composing and understanding documents in clinical, scientific, and other technical domains. In this paper, we explore different strategies for modelling numerals with language models, such as memorisation and digit-by-digit composition, and propose a novel neural architecture that uses a continuous probability density function to model numerals from an open vocabulary. Our evaluation on clinical and scientific datasets shows that using hierarchical models to distinguish numerals from words improves a perplexity metric on the subset of numerals by 2 and 4 orders of magnitude, respectively, over non-hierarchical models. A combination of strategies can further improve perplexity. Our continuous probability density function model reduces mean absolute percentage errors by 18% and 54% in comparison to the second best strategy for each dataset, respectively.
Tasks
Published 2018-05-21
URL http://arxiv.org/abs/1805.08154v1
PDF http://arxiv.org/pdf/1805.08154v1.pdf
PWC https://paperswithcode.com/paper/numeracy-for-language-models-evaluating-and
Repo https://github.com/uclmr/numerate-language-models
Framework none

Recurrent Slice Networks for 3D Segmentation of Point Clouds

Title Recurrent Slice Networks for 3D Segmentation of Point Clouds
Authors Qiangui Huang, Weiyue Wang, Ulrich Neumann
Abstract Point clouds are an efficient data format for 3D data. However, existing 3D segmentation methods for point clouds either do not model local dependencies \cite{pointnet} or require added computations \cite{kd-net,pointnet2}. This work presents a novel 3D segmentation framework, RSNet\footnote{Codes are released here https://github.com/qianguih/RSNet}, to efficiently model local structures in point clouds. The key component of the RSNet is a lightweight local dependency module. It is a combination of a novel slice pooling layer, Recurrent Neural Network (RNN) layers, and a slice unpooling layer. The slice pooling layer is designed to project features of unordered points onto an ordered sequence of feature vectors so that traditional end-to-end learning algorithms (RNNs) can be applied. The performance of RSNet is validated by comprehensive experiments on the S3DIS\cite{stanford}, ScanNet\cite{scannet}, and ShapeNet \cite{shapenet} datasets. In its simplest form, RSNets surpass all previous state-of-the-art methods on these benchmarks. And comparisons against previous state-of-the-art methods \cite{pointnet, pointnet2} demonstrate the efficiency of RSNets.
Tasks
Published 2018-02-13
URL http://arxiv.org/abs/1802.04402v2
PDF http://arxiv.org/pdf/1802.04402v2.pdf
PWC https://paperswithcode.com/paper/recurrent-slice-networks-for-3d-segmentation
Repo https://github.com/qianguih/RSNet
Framework pytorch

MobileFaceNets: Efficient CNNs for Accurate Real-Time Face Verification on Mobile Devices

Title MobileFaceNets: Efficient CNNs for Accurate Real-Time Face Verification on Mobile Devices
Authors Sheng Chen, Yang Liu, Xiang Gao, Zhen Han
Abstract Face Analysis Project on MXNet
Tasks Face Verification
Published 2018-04-20
URL http://arxiv.org/abs/1804.07573v4
PDF http://arxiv.org/pdf/1804.07573v4.pdf
PWC https://paperswithcode.com/paper/mobilefacenets-efficient-cnns-for-accurate
Repo https://github.com/deepinsight/insightface
Framework mxnet

Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

Title Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Authors Evan Zheran Liu, Kelvin Guu, Panupong Pasupat, Tianlin Shi, Percy Liang
Abstract Reinforcement learning (RL) agents improve through trial-and-error, but when reward is sparse and the agent cannot discover successful action sequences, learning stagnates. This has been a notable problem in training deep RL agents to perform web-based tasks, such as booking flights or replying to emails, where a single mistake can ruin the entire sequence of actions. A common remedy is to “warm-start” the agent by pre-training it to mimic expert demonstrations, but this is prone to overfitting. Instead, we propose to constrain exploration using demonstrations. From each demonstration, we induce high-level “workflows” which constrain the allowable actions at each time step to be similar to those in the demonstration (e.g., “Step 1: click on a textbox; Step 2: enter some text”). Our exploration policy then learns to identify successful workflows and samples actions that satisfy these workflows. Workflows prune out bad exploration directions and accelerate the agent’s ability to discover rewards. We use our approach to train a novel neural policy designed to handle the semi-structured nature of websites, and evaluate on a suite of web tasks, including the recent World of Bits benchmark. We achieve new state-of-the-art results, and show that workflow-guided exploration improves sample efficiency over behavioral cloning by more than 100x.
Tasks
Published 2018-02-24
URL http://arxiv.org/abs/1802.08802v1
PDF http://arxiv.org/pdf/1802.08802v1.pdf
PWC https://paperswithcode.com/paper/reinforcement-learning-on-web-interfaces
Repo https://github.com/zbyte64/pytorch-fuzzdom
Framework pytorch

Generative Adversarial Active Learning for Unsupervised Outlier Detection

Title Generative Adversarial Active Learning for Unsupervised Outlier Detection
Authors Yezheng Liu, Zhe Li, Chong Zhou, Yuanchun Jiang, Jianshan Sun, Meng Wang, Xiangnan He
Abstract Outlier detection is an important topic in machine learning and has been used in a wide range of applications. In this paper, we approach outlier detection as a binary-classification issue by sampling potential outliers from a uniform reference distribution. However, due to the sparsity of data in high-dimensional space, a limited number of potential outliers may fail to provide sufficient information to assist the classifier in describing a boundary that can separate outliers from normal data effectively. To address this, we propose a novel Single-Objective Generative Adversarial Active Learning (SO-GAAL) method for outlier detection, which can directly generate informative potential outliers based on the mini-max game between a generator and a discriminator. Moreover, to prevent the generator from falling into the mode collapsing problem, the stop node of training should be determined when SO-GAAL is able to provide sufficient information. But without any prior information, it is extremely difficult for SO-GAAL. Therefore, we expand the network structure of SO-GAAL from a single generator to multiple generators with different objectives (MO-GAAL), which can generate a reasonable reference distribution for the whole dataset. We empirically compare the proposed approach with several state-of-the-art outlier detection methods on both synthetic and real-world datasets. The results show that MO-GAAL outperforms its competitors in the majority of cases, especially for datasets with various cluster types or high irrelevant variable ratio.
Tasks Active Learning, Outlier Detection
Published 2018-09-28
URL http://arxiv.org/abs/1809.10816v4
PDF http://arxiv.org/pdf/1809.10816v4.pdf
PWC https://paperswithcode.com/paper/generative-adversarial-active-learning-for
Repo https://github.com/leibinghe/GAAL-based-outlier-detection
Framework tf

StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

Title StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
Authors Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Limin Wang, Shilei Wen
Abstract Despite the success of deep learning for static image understanding, it remains unclear what are the most effective network architectures for the spatial-temporal modeling in videos. In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatial temporal network (StNet) architecture for both local and global spatial-temporal modeling in videos. Particularly, StNet stacks N successive video frames into a \emph{super-image} which has 3N channels and applies 2D convolution on super-images to capture local spatial-temporal relationship. To model global spatial-temporal relationship, we apply temporal convolution on the local spatial-temporal feature maps. Specifically, a novel temporal Xception block is proposed in StNet. It employs a separate channel-wise and temporal-wise convolution over the feature sequence of video. Extensive experiments on the Kinetics dataset demonstrate that our framework outperforms several state-of-the-art approaches in action recognition and can strike a satisfying trade-off between recognition accuracy and model complexity. We further demonstrate the generalization performance of the leaned video representations on the UCF101 dataset.
Tasks Temporal Action Localization
Published 2018-11-05
URL http://arxiv.org/abs/1811.01549v3
PDF http://arxiv.org/pdf/1811.01549v3.pdf
PWC https://paperswithcode.com/paper/stnet-local-and-global-spatial-temporal
Repo https://github.com/hyperfraise/StNet
Framework pytorch

Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation

Title Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation
Authors Antonio Toral, Sheila Castilho, Ke Hu, Andy Way
Abstract We reassess a recent study (Hassan et al., 2018) that claimed that machine translation (MT) has reached human parity for the translation of news from Chinese into English, using pairwise ranking and considering three variables that were not taken into account in that previous study: the language in which the source side of the test set was originally written, the translation proficiency of the evaluators, and the provision of inter-sentential context. If we consider only original source text (i.e. not translated from another language, or translationese), then we find evidence showing that human parity has not been achieved. We compare the judgments of professional translators against those of non-experts and discover that those of the experts result in higher inter-annotator agreement and better discrimination between human and machine translations. In addition, we analyse the human translations of the test set and identify important translation issues. Finally, based on these findings, we provide a set of recommendations for future human evaluations of MT.
Tasks Machine Translation
Published 2018-08-30
URL http://arxiv.org/abs/1808.10432v1
PDF http://arxiv.org/pdf/1808.10432v1.pdf
PWC https://paperswithcode.com/paper/attaining-the-unattainable-reassessing-claims
Repo https://github.com/antot/human_parity_mt
Framework none
Title Link Prediction Based on Graph Neural Networks
Authors Muhan Zhang, Yixin Chen
Abstract Link prediction is a key problem for network-structured data. Link prediction heuristics use some score functions, such as common neighbors and Katz index, to measure the likelihood of links. They have obtained wide practical uses due to their simplicity, interpretability, and for some of them, scalability. However, every heuristic has a strong assumption on when two nodes are likely to link, which limits their effectiveness on networks where these assumptions fail. In this regard, a more reasonable way should be learning a suitable heuristic from a given network instead of using predefined ones. By extracting a local subgraph around each target link, we aim to learn a function mapping the subgraph patterns to link existence, thus automatically learning a `heuristic’ that suits the current network. In this paper, we study this heuristic learning paradigm for link prediction. First, we develop a novel $\gamma$-decaying heuristic theory. The theory unifies a wide range of heuristics in a single framework, and proves that all these heuristics can be well approximated from local subgraphs. Our results show that local subgraphs reserve rich information related to link existence. Second, based on the $\gamma$-decaying theory, we propose a new algorithm to learn heuristics from local subgraphs using a graph neural network (GNN). Its experimental results show unprecedented performance, working consistently well on a wide range of problems. |
Tasks Link Prediction
Published 2018-02-27
URL http://arxiv.org/abs/1802.09691v3
PDF http://arxiv.org/pdf/1802.09691v3.pdf
PWC https://paperswithcode.com/paper/link-prediction-based-on-graph-neural
Repo https://github.com/muhanzhang/SEAL
Framework none

Long Short-Term Memory with Dynamic Skip Connections

Title Long Short-Term Memory with Dynamic Skip Connections
Authors Tao Gui, Qi Zhang, Lujun Zhao, Yaosong Lin, Minlong Peng, Jingjing Gong, Xuanjing Huang
Abstract In recent years, long short-term memory (LSTM) has been successfully used to model sequential data of variable length. However, LSTM can still experience difficulty in capturing long-term dependencies. In this work, we tried to alleviate this problem by introducing a dynamic skip connection, which can learn to directly connect two dependent words. Since there is no dependency information in the training data, we propose a novel reinforcement learning-based method to model the dependency relationship and connect dependent words. The proposed model computes the recurrent transition functions based on the skip connections, which provides a dynamic skipping advantage over RNNs that always tackle entire sentences sequentially. Our experimental results on three natural language processing tasks demonstrate that the proposed method can achieve better performance than existing methods. In the number prediction experiment, the proposed model outperformed LSTM with respect to accuracy by nearly 20%.
Tasks Named Entity Recognition, Sentiment Analysis
Published 2018-11-09
URL http://arxiv.org/abs/1811.03873v1
PDF http://arxiv.org/pdf/1811.03873v1.pdf
PWC https://paperswithcode.com/paper/long-short-term-memory-with-dynamic-skip
Repo https://github.com/lecholin/DynamicLSTM
Framework tf

Recurrent CNN for 3D Gaze Estimation using Appearance and Shape Cues

Title Recurrent CNN for 3D Gaze Estimation using Appearance and Shape Cues
Authors Cristina Palmero, Javier Selva, Mohammad Ali Bagheri, Sergio Escalera
Abstract Gaze behavior is an important non-verbal cue in social signal processing and human-computer interaction. In this paper, we tackle the problem of person- and head pose-independent 3D gaze estimation from remote cameras, using a multi-modal recurrent convolutional neural network (CNN). We propose to combine face, eyes region, and face landmarks as individual streams in a CNN to estimate gaze in still images. Then, we exploit the dynamic nature of gaze by feeding the learned features of all the frames in a sequence to a many-to-one recurrent module that predicts the 3D gaze vector of the last frame. Our multi-modal static solution is evaluated on a wide range of head poses and gaze directions, achieving a significant improvement of 14.6% over the state of the art on EYEDIAP dataset, further improved by 4% when the temporal modality is included.
Tasks Gaze Estimation
Published 2018-05-08
URL http://arxiv.org/abs/1805.03064v3
PDF http://arxiv.org/pdf/1805.03064v3.pdf
PWC https://paperswithcode.com/paper/recurrent-cnn-for-3d-gaze-estimation-using
Repo https://github.com/crisie/CRNN-Gaze
Framework tf
comments powered by Disqus