Paper Group AWR 171
Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction. Mixtures of Skewed Matrix Variate Bilinear Factor Analyzers. Abstractive Summarization Using Attentive Neural Techniques. Applying Machine Learning To Maize Traits Prediction. Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search. Numeracy …
Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction
Title | Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction |
Authors | Kazuma Hashimoto, Yoshimasa Tsuruoka |
Abstract | A major obstacle in reinforcement learning-based sentence generation is the large action space whose size is equal to the vocabulary size of the target-side language. To improve the efficiency of reinforcement learning, we present a novel approach for reducing the action space based on dynamic vocabulary prediction. Our method first predicts a fixed-size small vocabulary for each input to generate its target sentence. The input-specific vocabularies are then used at supervised and reinforcement learning steps, and also at test time. In our experiments on six machine translation and two image captioning datasets, our method achieves faster reinforcement learning ($\sim$2.7x faster) with less GPU memory ($\sim$2.3x less) than the full-vocabulary counterpart. The reinforcement learning with our method consistently leads to significant improvement of BLEU scores, and the scores are equal to or better than those of baselines using the full vocabularies, with faster decoding time ($\sim$3x faster) on CPUs. |
Tasks | Image Captioning, Machine Translation |
Published | 2018-09-05 |
URL | http://arxiv.org/abs/1809.01694v2 |
http://arxiv.org/pdf/1809.01694v2.pdf | |
PWC | https://paperswithcode.com/paper/accelerated-reinforcement-learning-for |
Repo | https://github.com/hassyGo/NLG-RL |
Framework | pytorch |
Mixtures of Skewed Matrix Variate Bilinear Factor Analyzers
Title | Mixtures of Skewed Matrix Variate Bilinear Factor Analyzers |
Authors | Michael P. B. Gallaugher, Paul D. McNicholas |
Abstract | In recent years, data have become increasingly higher dimensional and, therefore, an increased need has arisen for dimension reduction techniques for clustering. Although such techniques are firmly established in the literature for multivariate data, there is a relative paucity in the area of matrix variate, or three-way, data. Furthermore, the few methods that are available all assume matrix variate normality, which is not always sensible if cluster skewness or excess kurtosis is present. Mixtures of bilinear factor analyzers using skewed matrix variate distributions are proposed. In all, four such mixture models are presented, based on matrix variate skew-t, generalized hyperbolic, variance-gamma, and normal inverse Gaussian distributions, respectively. |
Tasks | Dimensionality Reduction |
Published | 2018-09-07 |
URL | https://arxiv.org/abs/1809.02385v3 |
https://arxiv.org/pdf/1809.02385v3.pdf | |
PWC | https://paperswithcode.com/paper/mixtures-of-skewed-matrix-variate-bilinear |
Repo | https://github.com/nikpocuca/MatrixVariate.jl |
Framework | none |
Abstractive Summarization Using Attentive Neural Techniques
Title | Abstractive Summarization Using Attentive Neural Techniques |
Authors | Jacob Krantz, Jugal Kalita |
Abstract | In a world of proliferating data, the ability to rapidly summarize text is growing in importance. Automatic summarization of text can be thought of as a sequence to sequence problem. Another area of natural language processing that solves a sequence to sequence problem is machine translation, which is rapidly evolving due to the development of attention-based encoder-decoder networks. This work applies these modern techniques to abstractive summarization. We perform analysis on various attention mechanisms for summarization with the goal of developing an approach and architecture aimed at improving the state of the art. In particular, we modify and optimize a translation model with self-attention for generating abstractive sentence summaries. The effectiveness of this base model along with attention variants is compared and analyzed in the context of standardized evaluation sets and test metrics. However, we show that these metrics are limited in their ability to effectively score abstractive summaries, and propose a new approach based on the intuition that an abstractive model requires an abstractive evaluation. |
Tasks | Abstractive Text Summarization, Machine Translation |
Published | 2018-10-20 |
URL | http://arxiv.org/abs/1810.08838v1 |
http://arxiv.org/pdf/1810.08838v1.pdf | |
PWC | https://paperswithcode.com/paper/abstractive-summarization-using-attentive |
Repo | https://github.com/jacobkrantz/VertMetric |
Framework | none |
Applying Machine Learning To Maize Traits Prediction
Title | Applying Machine Learning To Maize Traits Prediction |
Authors | Binbin Shi, Xupeng Chen |
Abstract | Heterosis is the improved or increased function of any biological quality in a hybrid offspring. We have studied yet the largest maize SNP dataset for traits prediction. We develop linear and non-linear models which consider relationships between different hybrids as well as other effect. Specially designed model proved to be efficient and robust in prediction maize’s traits. |
Tasks | |
Published | 2018-08-20 |
URL | http://arxiv.org/abs/1808.06275v1 |
http://arxiv.org/pdf/1808.06275v1.pdf | |
PWC | https://paperswithcode.com/paper/applying-machine-learning-to-maize-traits |
Repo | https://github.com/james20141606/eMaize |
Framework | none |
Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search
Title | Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search |
Authors | Elias Jääsaari, Ville Hyvönen, Teemu Roos |
Abstract | Approximate nearest neighbor algorithms are used to speed up nearest neighbor search in a wide array of applications. However, current indexing methods feature several hyperparameters that need to be tuned to reach an acceptable accuracy–speed trade-off. A grid search in the parameter space is often impractically slow due to a time-consuming index-building procedure. Therefore, we propose an algorithm for automatically tuning the hyperparameters of indexing methods based on randomized space-partitioning trees. In particular, we present results using randomized k-d trees, random projection trees and randomized PCA trees. The tuning algorithm adds minimal overhead to the index-building process but is able to find the optimal hyperparameters accurately. We demonstrate that the algorithm is significantly faster than existing approaches, and that the indexing methods used are competitive with the state-of-the-art methods in query time while being faster to build. |
Tasks | |
Published | 2018-12-18 |
URL | http://arxiv.org/abs/1812.07484v1 |
http://arxiv.org/pdf/1812.07484v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-autotuning-of-hyperparameters-in |
Repo | https://github.com/vioshyvo/mrpt |
Framework | none |
Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers
Title | Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers |
Authors | Georgios P. Spithourakis, Sebastian Riedel |
Abstract | Numeracy is the ability to understand and work with numbers. It is a necessary skill for composing and understanding documents in clinical, scientific, and other technical domains. In this paper, we explore different strategies for modelling numerals with language models, such as memorisation and digit-by-digit composition, and propose a novel neural architecture that uses a continuous probability density function to model numerals from an open vocabulary. Our evaluation on clinical and scientific datasets shows that using hierarchical models to distinguish numerals from words improves a perplexity metric on the subset of numerals by 2 and 4 orders of magnitude, respectively, over non-hierarchical models. A combination of strategies can further improve perplexity. Our continuous probability density function model reduces mean absolute percentage errors by 18% and 54% in comparison to the second best strategy for each dataset, respectively. |
Tasks | |
Published | 2018-05-21 |
URL | http://arxiv.org/abs/1805.08154v1 |
http://arxiv.org/pdf/1805.08154v1.pdf | |
PWC | https://paperswithcode.com/paper/numeracy-for-language-models-evaluating-and |
Repo | https://github.com/uclmr/numerate-language-models |
Framework | none |
Recurrent Slice Networks for 3D Segmentation of Point Clouds
Title | Recurrent Slice Networks for 3D Segmentation of Point Clouds |
Authors | Qiangui Huang, Weiyue Wang, Ulrich Neumann |
Abstract | Point clouds are an efficient data format for 3D data. However, existing 3D segmentation methods for point clouds either do not model local dependencies \cite{pointnet} or require added computations \cite{kd-net,pointnet2}. This work presents a novel 3D segmentation framework, RSNet\footnote{Codes are released here https://github.com/qianguih/RSNet}, to efficiently model local structures in point clouds. The key component of the RSNet is a lightweight local dependency module. It is a combination of a novel slice pooling layer, Recurrent Neural Network (RNN) layers, and a slice unpooling layer. The slice pooling layer is designed to project features of unordered points onto an ordered sequence of feature vectors so that traditional end-to-end learning algorithms (RNNs) can be applied. The performance of RSNet is validated by comprehensive experiments on the S3DIS\cite{stanford}, ScanNet\cite{scannet}, and ShapeNet \cite{shapenet} datasets. In its simplest form, RSNets surpass all previous state-of-the-art methods on these benchmarks. And comparisons against previous state-of-the-art methods \cite{pointnet, pointnet2} demonstrate the efficiency of RSNets. |
Tasks | |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04402v2 |
http://arxiv.org/pdf/1802.04402v2.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-slice-networks-for-3d-segmentation |
Repo | https://github.com/qianguih/RSNet |
Framework | pytorch |
MobileFaceNets: Efficient CNNs for Accurate Real-Time Face Verification on Mobile Devices
Title | MobileFaceNets: Efficient CNNs for Accurate Real-Time Face Verification on Mobile Devices |
Authors | Sheng Chen, Yang Liu, Xiang Gao, Zhen Han |
Abstract | Face Analysis Project on MXNet |
Tasks | Face Verification |
Published | 2018-04-20 |
URL | http://arxiv.org/abs/1804.07573v4 |
http://arxiv.org/pdf/1804.07573v4.pdf | |
PWC | https://paperswithcode.com/paper/mobilefacenets-efficient-cnns-for-accurate |
Repo | https://github.com/deepinsight/insightface |
Framework | mxnet |
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Title | Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration |
Authors | Evan Zheran Liu, Kelvin Guu, Panupong Pasupat, Tianlin Shi, Percy Liang |
Abstract | Reinforcement learning (RL) agents improve through trial-and-error, but when reward is sparse and the agent cannot discover successful action sequences, learning stagnates. This has been a notable problem in training deep RL agents to perform web-based tasks, such as booking flights or replying to emails, where a single mistake can ruin the entire sequence of actions. A common remedy is to “warm-start” the agent by pre-training it to mimic expert demonstrations, but this is prone to overfitting. Instead, we propose to constrain exploration using demonstrations. From each demonstration, we induce high-level “workflows” which constrain the allowable actions at each time step to be similar to those in the demonstration (e.g., “Step 1: click on a textbox; Step 2: enter some text”). Our exploration policy then learns to identify successful workflows and samples actions that satisfy these workflows. Workflows prune out bad exploration directions and accelerate the agent’s ability to discover rewards. We use our approach to train a novel neural policy designed to handle the semi-structured nature of websites, and evaluate on a suite of web tasks, including the recent World of Bits benchmark. We achieve new state-of-the-art results, and show that workflow-guided exploration improves sample efficiency over behavioral cloning by more than 100x. |
Tasks | |
Published | 2018-02-24 |
URL | http://arxiv.org/abs/1802.08802v1 |
http://arxiv.org/pdf/1802.08802v1.pdf | |
PWC | https://paperswithcode.com/paper/reinforcement-learning-on-web-interfaces |
Repo | https://github.com/zbyte64/pytorch-fuzzdom |
Framework | pytorch |
Generative Adversarial Active Learning for Unsupervised Outlier Detection
Title | Generative Adversarial Active Learning for Unsupervised Outlier Detection |
Authors | Yezheng Liu, Zhe Li, Chong Zhou, Yuanchun Jiang, Jianshan Sun, Meng Wang, Xiangnan He |
Abstract | Outlier detection is an important topic in machine learning and has been used in a wide range of applications. In this paper, we approach outlier detection as a binary-classification issue by sampling potential outliers from a uniform reference distribution. However, due to the sparsity of data in high-dimensional space, a limited number of potential outliers may fail to provide sufficient information to assist the classifier in describing a boundary that can separate outliers from normal data effectively. To address this, we propose a novel Single-Objective Generative Adversarial Active Learning (SO-GAAL) method for outlier detection, which can directly generate informative potential outliers based on the mini-max game between a generator and a discriminator. Moreover, to prevent the generator from falling into the mode collapsing problem, the stop node of training should be determined when SO-GAAL is able to provide sufficient information. But without any prior information, it is extremely difficult for SO-GAAL. Therefore, we expand the network structure of SO-GAAL from a single generator to multiple generators with different objectives (MO-GAAL), which can generate a reasonable reference distribution for the whole dataset. We empirically compare the proposed approach with several state-of-the-art outlier detection methods on both synthetic and real-world datasets. The results show that MO-GAAL outperforms its competitors in the majority of cases, especially for datasets with various cluster types or high irrelevant variable ratio. |
Tasks | Active Learning, Outlier Detection |
Published | 2018-09-28 |
URL | http://arxiv.org/abs/1809.10816v4 |
http://arxiv.org/pdf/1809.10816v4.pdf | |
PWC | https://paperswithcode.com/paper/generative-adversarial-active-learning-for |
Repo | https://github.com/leibinghe/GAAL-based-outlier-detection |
Framework | tf |
StNet: Local and Global Spatial-Temporal Modeling for Action Recognition
Title | StNet: Local and Global Spatial-Temporal Modeling for Action Recognition |
Authors | Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Limin Wang, Shilei Wen |
Abstract | Despite the success of deep learning for static image understanding, it remains unclear what are the most effective network architectures for the spatial-temporal modeling in videos. In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatial temporal network (StNet) architecture for both local and global spatial-temporal modeling in videos. Particularly, StNet stacks N successive video frames into a \emph{super-image} which has 3N channels and applies 2D convolution on super-images to capture local spatial-temporal relationship. To model global spatial-temporal relationship, we apply temporal convolution on the local spatial-temporal feature maps. Specifically, a novel temporal Xception block is proposed in StNet. It employs a separate channel-wise and temporal-wise convolution over the feature sequence of video. Extensive experiments on the Kinetics dataset demonstrate that our framework outperforms several state-of-the-art approaches in action recognition and can strike a satisfying trade-off between recognition accuracy and model complexity. We further demonstrate the generalization performance of the leaned video representations on the UCF101 dataset. |
Tasks | Temporal Action Localization |
Published | 2018-11-05 |
URL | http://arxiv.org/abs/1811.01549v3 |
http://arxiv.org/pdf/1811.01549v3.pdf | |
PWC | https://paperswithcode.com/paper/stnet-local-and-global-spatial-temporal |
Repo | https://github.com/hyperfraise/StNet |
Framework | pytorch |
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation
Title | Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation |
Authors | Antonio Toral, Sheila Castilho, Ke Hu, Andy Way |
Abstract | We reassess a recent study (Hassan et al., 2018) that claimed that machine translation (MT) has reached human parity for the translation of news from Chinese into English, using pairwise ranking and considering three variables that were not taken into account in that previous study: the language in which the source side of the test set was originally written, the translation proficiency of the evaluators, and the provision of inter-sentential context. If we consider only original source text (i.e. not translated from another language, or translationese), then we find evidence showing that human parity has not been achieved. We compare the judgments of professional translators against those of non-experts and discover that those of the experts result in higher inter-annotator agreement and better discrimination between human and machine translations. In addition, we analyse the human translations of the test set and identify important translation issues. Finally, based on these findings, we provide a set of recommendations for future human evaluations of MT. |
Tasks | Machine Translation |
Published | 2018-08-30 |
URL | http://arxiv.org/abs/1808.10432v1 |
http://arxiv.org/pdf/1808.10432v1.pdf | |
PWC | https://paperswithcode.com/paper/attaining-the-unattainable-reassessing-claims |
Repo | https://github.com/antot/human_parity_mt |
Framework | none |
Link Prediction Based on Graph Neural Networks
Title | Link Prediction Based on Graph Neural Networks |
Authors | Muhan Zhang, Yixin Chen |
Abstract | Link prediction is a key problem for network-structured data. Link prediction heuristics use some score functions, such as common neighbors and Katz index, to measure the likelihood of links. They have obtained wide practical uses due to their simplicity, interpretability, and for some of them, scalability. However, every heuristic has a strong assumption on when two nodes are likely to link, which limits their effectiveness on networks where these assumptions fail. In this regard, a more reasonable way should be learning a suitable heuristic from a given network instead of using predefined ones. By extracting a local subgraph around each target link, we aim to learn a function mapping the subgraph patterns to link existence, thus automatically learning a `heuristic’ that suits the current network. In this paper, we study this heuristic learning paradigm for link prediction. First, we develop a novel $\gamma$-decaying heuristic theory. The theory unifies a wide range of heuristics in a single framework, and proves that all these heuristics can be well approximated from local subgraphs. Our results show that local subgraphs reserve rich information related to link existence. Second, based on the $\gamma$-decaying theory, we propose a new algorithm to learn heuristics from local subgraphs using a graph neural network (GNN). Its experimental results show unprecedented performance, working consistently well on a wide range of problems. | |
Tasks | Link Prediction |
Published | 2018-02-27 |
URL | http://arxiv.org/abs/1802.09691v3 |
http://arxiv.org/pdf/1802.09691v3.pdf | |
PWC | https://paperswithcode.com/paper/link-prediction-based-on-graph-neural |
Repo | https://github.com/muhanzhang/SEAL |
Framework | none |
Long Short-Term Memory with Dynamic Skip Connections
Title | Long Short-Term Memory with Dynamic Skip Connections |
Authors | Tao Gui, Qi Zhang, Lujun Zhao, Yaosong Lin, Minlong Peng, Jingjing Gong, Xuanjing Huang |
Abstract | In recent years, long short-term memory (LSTM) has been successfully used to model sequential data of variable length. However, LSTM can still experience difficulty in capturing long-term dependencies. In this work, we tried to alleviate this problem by introducing a dynamic skip connection, which can learn to directly connect two dependent words. Since there is no dependency information in the training data, we propose a novel reinforcement learning-based method to model the dependency relationship and connect dependent words. The proposed model computes the recurrent transition functions based on the skip connections, which provides a dynamic skipping advantage over RNNs that always tackle entire sentences sequentially. Our experimental results on three natural language processing tasks demonstrate that the proposed method can achieve better performance than existing methods. In the number prediction experiment, the proposed model outperformed LSTM with respect to accuracy by nearly 20%. |
Tasks | Named Entity Recognition, Sentiment Analysis |
Published | 2018-11-09 |
URL | http://arxiv.org/abs/1811.03873v1 |
http://arxiv.org/pdf/1811.03873v1.pdf | |
PWC | https://paperswithcode.com/paper/long-short-term-memory-with-dynamic-skip |
Repo | https://github.com/lecholin/DynamicLSTM |
Framework | tf |
Recurrent CNN for 3D Gaze Estimation using Appearance and Shape Cues
Title | Recurrent CNN for 3D Gaze Estimation using Appearance and Shape Cues |
Authors | Cristina Palmero, Javier Selva, Mohammad Ali Bagheri, Sergio Escalera |
Abstract | Gaze behavior is an important non-verbal cue in social signal processing and human-computer interaction. In this paper, we tackle the problem of person- and head pose-independent 3D gaze estimation from remote cameras, using a multi-modal recurrent convolutional neural network (CNN). We propose to combine face, eyes region, and face landmarks as individual streams in a CNN to estimate gaze in still images. Then, we exploit the dynamic nature of gaze by feeding the learned features of all the frames in a sequence to a many-to-one recurrent module that predicts the 3D gaze vector of the last frame. Our multi-modal static solution is evaluated on a wide range of head poses and gaze directions, achieving a significant improvement of 14.6% over the state of the art on EYEDIAP dataset, further improved by 4% when the temporal modality is included. |
Tasks | Gaze Estimation |
Published | 2018-05-08 |
URL | http://arxiv.org/abs/1805.03064v3 |
http://arxiv.org/pdf/1805.03064v3.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-cnn-for-3d-gaze-estimation-using |
Repo | https://github.com/crisie/CRNN-Gaze |
Framework | tf |