January 31, 2020

3093 words 15 mins read

Paper Group ANR 116

Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization. Conditioned Query Generation for Task-Oriented Dialogue Systems. Low-resource Deep Entity Resolution with Transfer and Active Learning. NCLS: Neural Cross-Lingual Summarization. Revisit Lmser and its further development based on convolutional layers. Mixout: Effective Regulariza …

Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization


Title	Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization
Authors	Rong Ge, Zhize Li, Weiyao Wang, Xiang Wang
Abstract	Variance reduction techniques like SVRG provide simple and fast algorithms for optimizing a convex finite-sum objective. For nonconvex objectives, these techniques can also find a first-order stationary point (with small gradient). However, in nonconvex optimization it is often crucial to find a second-order stationary point (with small gradient and almost PSD hessian). In this paper, we show that Stabilized SVRG (a simple variant of SVRG) can find an $\epsilon$-second-order stationary point using only $\widetilde{O}(n^{2/3}/\epsilon^2+n/\epsilon^{1.5})$ stochastic gradients. To our best knowledge, this is the first second-order guarantee for a simple variant of SVRG. The running time almost matches the known guarantees for finding $\epsilon$-first-order stationary points.
Tasks
Published	2019-05-01
URL	http://arxiv.org/abs/1905.00529v1
PDF	http://arxiv.org/pdf/1905.00529v1.pdf
PWC	https://paperswithcode.com/paper/stabilized-svrg-simple-variance-reduction-for
Repo
Framework

Conditioned Query Generation for Task-Oriented Dialogue Systems


Title	Conditioned Query Generation for Task-Oriented Dialogue Systems
Authors	Stéphane d’Ascoli, Alice Coucke, Francesco Caltagirone, Alexandre Caulier, Marc Lelarge
Abstract	Scarcity of training data for task-oriented dialogue systems is a well known problem that is usually tackled with costly and time-consuming manual data annotation. An alternative solution is to rely on automatic text generation which, although less accurate than human supervision, has the advantage of being cheap and fast. In this paper we propose a novel controlled data generation method that could be used as a training augmentation framework for closed-domain dialogue. Our contribution is twofold. First we show how to optimally train and control the generation of intent-specific sentences using a conditional variational autoencoder. Then we introduce a novel protocol called query transfer that allows to leverage a broad, unlabelled dataset to extract relevant information. Comparison with two different baselines shows that our method, in the appropriate regime, consistently improves the diversity of the generated queries without compromising their quality.
Tasks	Task-Oriented Dialogue Systems, Text Generation
Published	2019-11-09
URL	https://arxiv.org/abs/1911.03698v1
PDF	https://arxiv.org/pdf/1911.03698v1.pdf
PWC	https://paperswithcode.com/paper/conditioned-query-generation-for-task
Repo
Framework

Low-resource Deep Entity Resolution with Transfer and Active Learning


Title	Low-resource Deep Entity Resolution with Transfer and Active Learning
Authors	Jungo Kasai, Kun Qian, Sairam Gurajada, Yunyao Li, Lucian Popa
Abstract	Entity resolution (ER) is the task of identifying different representations of the same real-world entities across databases. It is a key step for knowledge base creation and text mining. Recent adaptation of deep learning methods for ER mitigates the need for dataset-specific feature engineering by constructing distributed representations of entity records. While these methods achieve state-of-the-art performance over benchmark data, they require large amounts of labeled data, which are typically unavailable in realistic ER applications. In this paper, we develop a deep learning-based method that targets low-resource settings for ER through a novel combination of transfer learning and active learning. We design an architecture that allows us to learn a transferable model from a high-resource setting to a low-resource one. To further adapt to the target dataset, we incorporate active learning that carefully selects a few informative examples to fine-tune the transferred model. Empirical evaluation demonstrates that our method achieves comparable, if not better, performance compared to state-of-the-art learning-based methods while using an order of magnitude fewer labels.
Tasks	Active Learning, Entity Resolution, Feature Engineering, Transfer Learning
Published	2019-06-17
URL	https://arxiv.org/abs/1906.08042v1
PDF	https://arxiv.org/pdf/1906.08042v1.pdf
PWC	https://paperswithcode.com/paper/low-resource-deep-entity-resolution-with
Repo
Framework

NCLS: Neural Cross-Lingual Summarization


Title	NCLS: Neural Cross-Lingual Summarization
Authors	Junnan Zhu, Qian Wang, Yining Wang, Yu Zhou, Jiajun Zhang, Shaonan Wang, Chengqing Zong
Abstract	Cross-lingual summarization (CLS) is the task to produce a summary in one particular language for a source document in a different language. Existing methods simply divide this task into two steps: summarization and translation, leading to the problem of error propagation. To handle that, we present an end-to-end CLS framework, which we refer to as Neural Cross-Lingual Summarization (NCLS), for the first time. Moreover, we propose to further improve NCLS by incorporating two related tasks, monolingual summarization and machine translation, into the training process of CLS under multi-task learning. Due to the lack of supervised CLS data, we propose a round-trip translation strategy to acquire two high-quality large-scale CLS datasets based on existing monolingual summarization datasets. Experimental results have shown that our NCLS achieves remarkable improvement over traditional pipeline methods on both English-to-Chinese and Chinese-to-English CLS human-corrected test sets. In addition, NCLS with multi-task learning can further significantly improve the quality of generated summaries. We make our dataset and code publicly available here: http://www.nlpr.ia.ac.cn/cip/dataset.htm.
Tasks	Machine Translation, Multi-Task Learning
Published	2019-08-31
URL	https://arxiv.org/abs/1909.00156v1
PDF	https://arxiv.org/pdf/1909.00156v1.pdf
PWC	https://paperswithcode.com/paper/ncls-neural-cross-lingual-summarization
Repo
Framework

Revisit Lmser and its further development based on convolutional layers


Title	Revisit Lmser and its further development based on convolutional layers
Authors	Wenjing Huang, Shikui Tu, Lei Xu
Abstract	Proposed in 1991, Least Mean Square Error Reconstruction for self-organizing network, shortly Lmser, was a further development of the traditional auto-encoder (AE) by folding the architecture with respect to the central coding layer and thus leading to the features of symmetric weights and neurons, as well as jointly supervised and unsupervised learning. However, its advantages were only demonstrated in a one-hidden-layer implementation due to the lack of computing resources and big data at that time. In this paper, we revisit Lmser from the perspective of deep learning, develop Lmser network based on multiple convolutional layers, which is more suitable for image-related tasks, and confirm several Lmser functions with preliminary demonstrations on image recognition, reconstruction, association recall, and so on. Experiments demonstrate that Lmser indeed works as indicated in the original paper, and it has promising performance in various applications.
Tasks
Published	2019-04-12
URL	http://arxiv.org/abs/1904.06307v1
PDF	http://arxiv.org/pdf/1904.06307v1.pdf
PWC	https://paperswithcode.com/paper/revisit-lmser-and-its-further-development
Repo
Framework

Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models


Title	Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Authors	Cheolhyoung Lee, Kyunghyun Cho, Wanmo Kang
Abstract	In natural language processing, it has been observed recently that generalization could be greatly improved by finetuning a large-scale language model pretrained on a large unlabeled corpus. Despite its recent success and wide adoption, finetuning a large pretrained language model on a downstream task is prone to degenerate performance when there are only a small number of training instances available. In this paper, we introduce a new regularization technique, to which we refer as “mixout”, motivated by dropout. Mixout stochastically mixes the parameters of two models. We show that our mixout technique regularizes learning to minimize the deviation from one of the two models and that the strength of regularization adapts along the optimization trajectory. We empirically evaluate the proposed mixout and its variants on finetuning a pretrained language model on downstream tasks. More specifically, we demonstrate that the stability of finetuning and the average accuracy greatly increase when we use the proposed approach to regularize finetuning of BERT on downstream tasks in GLUE.
Tasks	Language Modelling
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11299v2
PDF	https://arxiv.org/pdf/1909.11299v2.pdf
PWC	https://paperswithcode.com/paper/mixout-effective-regularization-to-finetune
Repo
Framework

$t$-$k$-means: A $k$-means Variant with Robustness and Stability


Title	$t$-$k$-means: A $k$-means Variant with Robustness and Stability
Authors	Yang Zhang, Qingtao Tang, Yiming Li, Weipeng Huang, Shutao Xia
Abstract	Lloyd’s $k$-means algorithm is one of the most classical clustering method, which is widely used in data mining or as a data pre-processing procedure. However, due to the thin-tailed property of the Gaussian distribution, $k$-means suffers from relatively poor performance on the heavy-tailed data or outliers. In addition, $k$-means have a relatively weak stability, $i.e.$ its result has a large variance, which reduces the credibility of the model. In this paper, we propose a robust and stable $k$-means variant, the $t$-$k$-means, as well as its fast version in solving the flat clustering problem. Theoretically, we detail the derivations of $t$-$k$-means and analyze its robustness and stability from the aspect of loss function, influence function and the expression of clustering center. A large number of experiments are conducted, which empirically demonstrates that our method has empirical soundness while preserving running efficiency.
Tasks
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07442v2
PDF	https://arxiv.org/pdf/1907.07442v2.pdf
PWC	https://paperswithcode.com/paper/t-k-means-a-k-means-variant-with-robustness
Repo
Framework

Improving Outbreak Detection with Stacking of Statistical Surveillance Methods


Title	Improving Outbreak Detection with Stacking of Statistical Surveillance Methods
Authors	Moritz Kulessa, Eneldo Loza Mencía, Johannes Fürnkranz
Abstract	Epidemiologists use a variety of statistical algorithms for the early detection of outbreaks. The practical usefulness of such methods highly depends on the trade-off between the detection rate of outbreaks and the chances of raising a false alarm. Recent research has shown that the use of machine learning for the fusion of multiple statistical algorithms improves outbreak detection. Instead of relying only on the binary output (alarm or no alarm) of the statistical algorithms, we propose to make use of their p-values for training a fusion classifier. In addition, we also show that adding additional features and adapting the labeling of an epidemic period may further improve performance. For comparison and evaluation, a new measure is introduced which captures the performance of an outbreak detection method with respect to a low rate of false alarms more precisely than previous works. Our results on synthetic data show that it is challenging to improve the performance with a trainable fusion method based on machine learning. In particular, the use of a fusion classifier that is only based on binary outputs of the statistical surveillance methods can make the overall performance worse than directly using the underlying algorithms. However, the use of p-values and additional information for the learning is promising, enabling to identify more valuable patterns to detect outbreaks.
Tasks
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07464v1
PDF	https://arxiv.org/pdf/1907.07464v1.pdf
PWC	https://paperswithcode.com/paper/improving-outbreak-detection-with-stacking-of
Repo
Framework

Language Independent Sentiment Analysis


Title	Language Independent Sentiment Analysis
Authors	Muhammad Haroon Shakeel, Turki Alghamidi, Safi Faizullah, Imdadullah Khan
Abstract	Social media platforms and online forums generate rapid and increasing amount of textual data. Businesses, government agencies, and media organizations seek to perform sentiment analysis on this rich text data. The results of these analytics are used for adapting marketing strategies, customizing products, security and various other decision makings. Sentiment analysis has been extensively studied and various methods have been developed for it with great success. These methods, however apply to texts written in a specific language. This limits applicability to a limited demographic and a specific geographic region. In this paper we propose a general approach for sentiment analysis on data containing texts from multiple languages. This enables all the applications to utilize the results of sentiment analysis in a language oblivious or language-independent fashion.
Tasks	Sentiment Analysis
Published	2019-12-27
URL	https://arxiv.org/abs/1912.11973v2
PDF	https://arxiv.org/pdf/1912.11973v2.pdf
PWC	https://paperswithcode.com/paper/language-independent-sentiment-analysis
Repo
Framework

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity


Title	Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity
Authors	Aaron Sidford, Mengdi Wang, Lin F. Yang, Yinyu Ye
Abstract	In this paper, we settle the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors. Given a stochastic game with discount factor $\gamma\in(0,1)$ we provide an algorithm that computes an $\epsilon$-optimal strategy with high-probability given $\tilde{O}((1 - \gamma)^{-3} \epsilon^{-2})$ samples from the transition function for each state-action-pair. Our algorithm runs in time nearly linear in the number of samples and uses space nearly linear in the number of state-action pairs. As stochastic games generalize Markov decision processes (MDPs) our runtime and sample complexities are optimal due to Azar et al (2013). We achieve our results by showing how to generalize a near-optimal Q-learning based algorithms for MDP, in particular Sidford et al (2018), to two-player strategy computation algorithms. This overcomes limitations of standard Q-learning and strategy iteration or alternating minimization based approaches and we hope will pave the way for future reinforcement learning results by facilitating the extension of MDP results to multi-agent settings with little loss.
Tasks	Q-Learning
Published	2019-08-29
URL	https://arxiv.org/abs/1908.11071v1
PDF	https://arxiv.org/pdf/1908.11071v1.pdf
PWC	https://paperswithcode.com/paper/solving-discounted-stochastic-two-player
Repo
Framework

Improving Textual Network Embedding with Global Attention via Optimal Transport


Title	Improving Textual Network Embedding with Global Attention via Optimal Transport
Authors	Liqun Chen, Guoyin Wang, Chenyang Tao, Dinghan Shen, Pengyu Cheng, Xinyuan Zhang, Wenlin Wang, Yizhe Zhang, Lawrence Carin
Abstract	Constituting highly informative network embeddings is an important tool for network analysis. It encodes network topology, along with other useful side information, into low-dimensional node-based feature representations that can be exploited by statistical modeling. This work focuses on learning context-aware network embeddings augmented with text data. We reformulate the network-embedding problem, and present two novel strategies to improve over traditional attention mechanisms: ($i$) a content-aware sparse attention module based on optimal transport, and ($ii$) a high-level attention parsing module. Our approach yields naturally sparse and self-normalized relational inference. It can capture long-term interactions between sequences, thus addressing the challenges faced by existing textual network embedding schemes. Extensive experiments are conducted to demonstrate our model can consistently outperform alternative state-of-the-art methods.
Tasks	Network Embedding
Published	2019-06-05
URL	https://arxiv.org/abs/1906.01840v1
PDF	https://arxiv.org/pdf/1906.01840v1.pdf
PWC	https://paperswithcode.com/paper/improving-textual-network-embedding-with
Repo
Framework

Boosting LSTM Performance Through Dynamic Precision Selection


Title	Boosting LSTM Performance Through Dynamic Precision Selection
Authors	Franyell Silfa, Jose-Maria Arnau, Antonio Gonzàlez
Abstract	The use of low numerical precision is a fundamental optimization included in modern accelerators for Deep Neural Networks (DNNs). The number of bits of the numerical representation is set to the minimum precision that is able to retain accuracy based on an offline profiling, and it is kept constant for DNN inference. In this work, we explore the use of dynamic precision selection during DNN inference. We focus on Long Short Term Memory (LSTM) networks, which represent the state-of-the-art networks for applications such as machine translation and speech recognition. Unlike conventional DNNs, LSTM networks remember information from previous evaluations by storing data in the LSTM cell state. Our key observation is that the cell state determines the amount of precision required: time steps where the cell state changes significantly require higher precision, whereas time steps where the cell state is stable can be computed with lower precision without any loss in accuracy. Based on this observation, we implement a novel hardware scheme that tracks the evolution of the elements in the LSTM cell state and dynamically selects the appropriate precision in each time step. For a set of popular LSTM networks, our scheme selects the lowest precision for more than 66% of the time, outperforming systems that fix the precision statically. We evaluate our proposal on top of a modern accelerator highly optimized for LSTM computation, and show that it provides 1.56x speedup and 23% energy savings on average without any loss in accuracy. The extra hardware to determine the appropriate precision represents a small area overhead of 8.8%.
Tasks	Machine Translation, Speech Recognition
Published	2019-11-07
URL	https://arxiv.org/abs/1911.04244v1
PDF	https://arxiv.org/pdf/1911.04244v1.pdf
PWC	https://paperswithcode.com/paper/boosting-lstm-performance-through-dynamic
Repo
Framework

STMARL: A Spatio-Temporal Multi-Agent Reinforcement Learning Approach for Cooperative Traffic Light Control


Title	STMARL: A Spatio-Temporal Multi-Agent Reinforcement Learning Approach for Cooperative Traffic Light Control
Authors	Yanan Wang, Tong Xu, Xin Niu, Chang Tan, Enhong Chen, Hui Xiong
Abstract	The development of intelligent traffic light control systems is essential for smart transportation management. While some efforts have been made to optimize the use of individual traffic lights in an isolated way, related studies have largely ignored the fact that the use of multi-intersection traffic lights is spatially influenced and there is a temporal dependency of historical traffic status for current traffic light control. To that end, in this paper, we propose a novel SpatioTemporal Multi-Agent Reinforcement Learning (STMARL) framework for effectively capturing the spatio-temporal dependency of multiple related traffic lights and control these traffic lights in a coordinating way. Specifically, we first construct the traffic light adjacency graph based on the spatial structure among traffic lights. Then, historical traffic records will be integrated with current traffic status via Recurrent Neural Network structure. Moreover, based on the temporally-dependent traffic information, we design a Graph Neural Network based model to represent relationships among multiple traffic lights, and the decision for each traffic light will be made in a distributed way by the deep Q-learning method. Finally, the experimental results on both synthetic and real-world data have demonstrated the effectiveness of our STMARL framework, which also provides an insightful understanding of the influence mechanism among multi-intersection traffic lights.
Tasks	Multi-agent Reinforcement Learning, Q-Learning
Published	2019-08-28
URL	https://arxiv.org/abs/1908.10577v2
PDF	https://arxiv.org/pdf/1908.10577v2.pdf
PWC	https://paperswithcode.com/paper/stmarl-a-spatio-temporal-multi-agent
Repo
Framework

Fast shared response model for fMRI data


Title	Fast shared response model for fMRI data
Authors	Hugo Richard, Lucas Martin, Ana Luısa Pinho, Jonathan Pillow, Bertrand Thirion
Abstract	The shared response model provides a simple but effective framework to analyse fMRI data of subjects exposed to naturalistic stimuli. However when the number of subjects or runs is large, fitting the model requires a large amount of memory and computational power, which limits its use in practice. In this work, we introduce the FastSRM algorithm that relies on an intermediate atlas-based representation. It provides considerable speed-up in time and memory usage, hence it allows easy and fast large-scale analysis of naturalistic-stimulus fMRI data. Using four different datasets, we show that our method matches the performance of the original SRM algorithm while being about 5x faster and 20x to 40x more memory efficient. Based on this contribution, we use FastSRM to predict age from movie watching data on the CamCAN sample. Besides delivering accurate predictions (mean absolute error of 7.5 years), FastSRM extracts topographic patterns that are predictive of age, demonstrating that brain activity during free perception reflects age.
Tasks
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12537v2
PDF	https://arxiv.org/pdf/1909.12537v2.pdf
PWC	https://paperswithcode.com/paper/fast-shared-response-model-for-fmri-data
Repo
Framework

Language-guided Semantic Mapping and Mobile Manipulation in Partially Observable Environments


Title	Language-guided Semantic Mapping and Mobile Manipulation in Partially Observable Environments
Authors	Siddharth Patki, Ethan Fahnestock, Thomas M. Howard, Matthew R. Walter
Abstract	Recent advances in data-driven models for grounded language understanding have enabled robots to interpret increasingly complex instructions. Two fundamental limitations of these methods are that most require a full model of the environment to be known a priori, and they attempt to reason over a world representation that is flat and unnecessarily detailed, which limits scalability. Recent semantic mapping methods address partial observability by exploiting language as a sensor to infer a distribution over topological, metric and semantic properties of the environment. However, maintaining a distribution over highly detailed maps that can support grounding of diverse instructions is computationally expensive and hinders real-time human-robot collaboration. We propose a novel framework that learns to adapt perception according to the task in order to maintain compact distributions over semantic maps. Experiments with a mobile manipulator demonstrate more efficient instruction following in a priori unknown environments.
Tasks
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10034v1
PDF	https://arxiv.org/pdf/1910.10034v1.pdf
PWC	https://paperswithcode.com/paper/language-guided-semantic-mapping-and-mobile
Repo
Framework