Paper Group ANR 233
Balancing Novelty and Salience: Adaptive Learning to Rank Entities for Timeline Summarization of High-impact Events. On the Power of Truncated SVD for General High-rank Matrix Estimation Problems. Learning Semantic Correspondences in Technical Documentation. Analysis of Thompson Sampling for Gaussian Process Optimization in the Bandit Setting. Crim …
Balancing Novelty and Salience: Adaptive Learning to Rank Entities for Timeline Summarization of High-impact Events
Title | Balancing Novelty and Salience: Adaptive Learning to Rank Entities for Timeline Summarization of High-impact Events |
Authors | Tuan Tran, Claudia Niederée, Nattiya Kanhabua, Ujwal Gadiraju, Avishek Anand |
Abstract | Long-running, high-impact events such as the Boston Marathon bombing often develop through many stages and involve a large number of entities in their unfolding. Timeline summarization of an event by key sentences eases story digestion, but does not distinguish between what a user remembers and what she might want to re-check. In this work, we present a novel approach for timeline summarization of high-impact events, which uses entities instead of sentences for summarizing the event at each individual point in time. Such entity summaries can serve as both (1) important memory cues in a retrospective event consideration and (2) pointers for personalized event exploration. In order to automatically create such summaries, it is crucial to identify the “right” entities for inclusion. We propose to learn a ranking function for entities, with a dynamically adapted trade-off between the in-document salience of entities and the informativeness of entities across documents, i.e., the level of new information associated with an entity for a time point under consideration. Furthermore, for capturing collective attention for an entity we use an innovative soft labeling approach based on Wikipedia. Our experiments on a real large news datasets confirm the effectiveness of the proposed methods. |
Tasks | Learning-To-Rank, Timeline Summarization |
Published | 2017-01-14 |
URL | http://arxiv.org/abs/1701.03947v1 |
http://arxiv.org/pdf/1701.03947v1.pdf | |
PWC | https://paperswithcode.com/paper/balancing-novelty-and-salience-adaptive |
Repo | |
Framework | |
On the Power of Truncated SVD for General High-rank Matrix Estimation Problems
Title | On the Power of Truncated SVD for General High-rank Matrix Estimation Problems |
Authors | Simon S. Du, Yining Wang, Aarti Singh |
Abstract | We show that given an estimate $\widehat{A}$ that is close to a general high-rank positive semi-definite (PSD) matrix $A$ in spectral norm (i.e., $\widehat{A}-A_2 \leq \delta$), the simple truncated SVD of $\widehat{A}$ produces a multiplicative approximation of $A$ in Frobenius norm. This observation leads to many interesting results on general high-rank matrix estimation problems, which we briefly summarize below ($A$ is an $n\times n$ high-rank PSD matrix and $A_k$ is the best rank-$k$ approximation of $A$): (1) High-rank matrix completion: By observing $\Omega(\frac{n\max{\epsilon^{-4},k^2}\mu_0^2\A_F^2\log n}{\sigma_{k+1}(A)^2})$ elements of $A$ where $\sigma_{k+1}\left(A\right)$ is the $\left(k+1\right)$-th singular value of $A$ and $\mu_0$ is the incoherence, the truncated SVD on a zero-filled matrix satisfies $\widehat{A}_k-A_F \leq (1+O(\epsilon))\A-A_k_F$ with high probability. (2)High-rank matrix de-noising: Let $\widehat{A}=A+E$ where $E$ is a Gaussian random noise matrix with zero mean and $\nu^2/n$ variance on each entry. Then the truncated SVD of $\widehat{A}$ satisfies $\widehat{A}_k-A_F \leq (1+O(\sqrt{\nu/\sigma_{k+1}(A)}))\A-A_k_F + O(\sqrt{k}\nu)$. (3) Low-rank Estimation of high-dimensional covariance: Given $N$ i.i.d.~samples $X_1,\cdots,X_N\sim\mathcal N_n(0,A)$, can we estimate $A$ with a relative-error Frobenius norm bound? We show that if $N = \Omega\left(n\max{\epsilon^{-4},k^2}\gamma_k(A)^2\log N\right)$ for $\gamma_k(A)=\sigma_1(A)/\sigma_{k+1}(A)$, then $\widehat{A}_k-A_F \leq (1+O(\epsilon))\A-A_k_F$ with high probability, where $\widehat{A}=\frac{1}{N}\sum_{i=1}^N{X_iX_i^\top}$ is the sample covariance. |
Tasks | Matrix Completion |
Published | 2017-02-22 |
URL | http://arxiv.org/abs/1702.06861v2 |
http://arxiv.org/pdf/1702.06861v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-power-of-truncated-svd-for-general |
Repo | |
Framework | |
Learning Semantic Correspondences in Technical Documentation
Title | Learning Semantic Correspondences in Technical Documentation |
Authors | Kyle Richardson, Jonas Kuhn |
Abstract | We consider the problem of translating high-level textual descriptions to formal representations in technical documentation as part of an effort to model the meaning of such documentation. We focus specifically on the problem of learning translational correspondences between text descriptions and grounded representations in the target documentation, such as formal representation of functions or code templates. Our approach exploits the parallel nature of such documentation, or the tight coupling between high-level text and the low-level representations we aim to learn. Data is collected by mining technical documents for such parallel text-representation pairs, which we use to train a simple semantic parsing model. We report new baseline results on sixteen novel datasets, including the standard library documentation for nine popular programming languages across seven natural languages, and a small collection of Unix utility manuals. |
Tasks | Semantic Parsing |
Published | 2017-05-13 |
URL | http://arxiv.org/abs/1705.04815v1 |
http://arxiv.org/pdf/1705.04815v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-semantic-correspondences-in |
Repo | |
Framework | |
Analysis of Thompson Sampling for Gaussian Process Optimization in the Bandit Setting
Title | Analysis of Thompson Sampling for Gaussian Process Optimization in the Bandit Setting |
Authors | Kinjal Basu, Souvik Ghosh |
Abstract | We consider the global optimization of a function over a continuous domain. At every evaluation attempt, we can observe the function at a chosen point in the domain and we reap the reward of the value observed. We assume that drawing these observations are expensive and noisy. We frame it as a continuum-armed bandit problem with a Gaussian Process prior on the function. In this regime, most algorithms have been developed to minimize some form of regret. Contrary to this popular norm, in this paper, we study the convergence of the sequential point $\boldsymbol{x}^t$ to the global optimizer $\boldsymbol{x}^*$ for the Thompson Sampling approach. Under some assumptions and regularity conditions, we show an exponential rate of convergence to the true optimal. |
Tasks | |
Published | 2017-05-18 |
URL | http://arxiv.org/abs/1705.06808v3 |
http://arxiv.org/pdf/1705.06808v3.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-thompson-sampling-for-gaussian |
Repo | |
Framework | |
Crime incidents embedding using restricted Boltzmann machines
Title | Crime incidents embedding using restricted Boltzmann machines |
Authors | Shixiang Zhu, Yao Xie |
Abstract | We present a new approach for detecting related crime series, by unsupervised learning of the latent feature embeddings from narratives of crime record via the Gaussian-Bernoulli Restricted Boltzmann Machines (RBM). This is a drastically different approach from prior work on crime analysis, which typically considers only time and location and at most category information. After the embedding, related cases are closer to each other in the Euclidean feature space, and the unrelated cases are far apart, which is a good property can enable subsequent analysis such as detection and clustering of related cases. Experiments over several series of related crime incidents hand labeled by the Atlanta Police Department reveal the promise of our embedding methods. |
Tasks | |
Published | 2017-10-28 |
URL | http://arxiv.org/abs/1710.10513v3 |
http://arxiv.org/pdf/1710.10513v3.pdf | |
PWC | https://paperswithcode.com/paper/crime-incidents-embedding-using-restricted |
Repo | |
Framework | |
Sparse-TDA: Sparse Realization of Topological Data Analysis for Multi-Way Classification
Title | Sparse-TDA: Sparse Realization of Topological Data Analysis for Multi-Way Classification |
Authors | Wei Guo, Krithika Manohar, Steven L. Brunton, Ashis G. Banerjee |
Abstract | Topological data analysis (TDA) has emerged as one of the most promising techniques to reconstruct the unknown shapes of high-dimensional spaces from observed data samples. TDA, thus, yields key shape descriptors in the form of persistent topological features that can be used for any supervised or unsupervised learning task, including multi-way classification. Sparse sampling, on the other hand, provides a highly efficient technique to reconstruct signals in the spatial-temporal domain from just a few carefully-chosen samples. Here, we present a new method, referred to as the Sparse-TDA algorithm, that combines favorable aspects of the two techniques. This combination is realized by selecting an optimal set of sparse pixel samples from the persistent features generated by a vector-based TDA algorithm. These sparse samples are selected from a low-rank matrix representation of persistent features using QR pivoting. We show that the Sparse-TDA method demonstrates promising performance on three benchmark problems related to human posture recognition and image texture classification. |
Tasks | Texture Classification, Topological Data Analysis |
Published | 2017-01-12 |
URL | http://arxiv.org/abs/1701.03212v4 |
http://arxiv.org/pdf/1701.03212v4.pdf | |
PWC | https://paperswithcode.com/paper/sparse-tda-sparse-realization-of-topological |
Repo | |
Framework | |
A Framework for Knowledge Management and Automated Reasoning Applied on Intelligent Transport Systems
Title | A Framework for Knowledge Management and Automated Reasoning Applied on Intelligent Transport Systems |
Authors | Aneta Vulgarakis Feljan, Athanasios Karapantelakis, Leonid Mokrushin, Hongxin Liang, Rafia Inam, Elena Fersman, Carlos R. B. Azevedo, Klaus Raizer, Ricardo S. Souza |
Abstract | Cyber-Physical Systems in general, and Intelligent Transport Systems (ITS) in particular use heterogeneous data sources combined with problem solving expertise in order to make critical decisions that may lead to some form of actions e.g., driver notifications, change of traffic light signals and braking to prevent an accident. Currently, a major part of the decision process is done by human domain experts, which is time-consuming, tedious and error-prone. Additionally, due to the intrinsic nature of knowledge possession this decision process cannot be easily replicated or reused. Therefore, there is a need for automating the reasoning processes by providing computational systems a formal representation of the domain knowledge and a set of methods to process that knowledge. In this paper, we propose a knowledge model that can be used to express both declarative knowledge about the systems’ components, their relations and their current state, as well as procedural knowledge representing possible system behavior. In addition, we introduce a framework for knowledge management and automated reasoning (KMARF). The idea behind KMARF is to automatically select an appropriate problem solver based on formalized reasoning expertise in the knowledge base, and convert a problem definition to the corresponding format. This approach automates reasoning, thus reducing operational costs, and enables reusability of knowledge and methods across different domains. We illustrate the approach on a transportation planning use case. |
Tasks | |
Published | 2017-01-11 |
URL | http://arxiv.org/abs/1701.03000v1 |
http://arxiv.org/pdf/1701.03000v1.pdf | |
PWC | https://paperswithcode.com/paper/a-framework-for-knowledge-management-and |
Repo | |
Framework | |
Evolution of Convolutional Highway Networks
Title | Evolution of Convolutional Highway Networks |
Authors | Oliver Kramer |
Abstract | Convolutional highways are deep networks based on multiple stacked convolutional layers for feature preprocessing. We introduce an evolutionary algorithm (EA) for optimization of the structure and hyperparameters of convolutional highways and demonstrate the potential of this optimization setting on the well-known MNIST data set. The (1+1)-EA employs Rechenberg’s mutation rate control and a niching mechanism to overcome local optima adapts the optimization approach. An experimental study shows that the EA is capable of improving the state-of-the-art network contribution and of evolving highway networks from scratch. |
Tasks | |
Published | 2017-09-11 |
URL | http://arxiv.org/abs/1709.03247v1 |
http://arxiv.org/pdf/1709.03247v1.pdf | |
PWC | https://paperswithcode.com/paper/evolution-of-convolutional-highway-networks |
Repo | |
Framework | |
Detecting changes in slope with an $L_0$ penalty
Title | Detecting changes in slope with an $L_0$ penalty |
Authors | Robert Maidstone, Paul Fearnhead, Adam Letchford |
Abstract | Whilst there are many approaches to detecting changes in mean for a univariate time-series, the problem of detecting multiple changes in slope has comparatively been ignored. Part of the reason for this is that detecting changes in slope is much more challenging. For example, simple binary segmentation procedures do not work for this problem, whilst efficient dynamic programming methods that work well for the change in mean problem cannot be directly used for detecting changes in slope. We present a novel dynamic programming approach, CPOP, for finding the “best” continuous piecewise-linear fit to data. We define best based on a criterion that measures fit to data using the residual sum of squares, but penalises complexity based on an $L_0$ penalty on changes in slope. We show that using such a criterion is more reliable at estimating changepoint locations than approaches that penalise complexity using an $L_1$ penalty. Empirically CPOP has good computational properties, and can analyse a time-series with over 10,000 observations and over 100 changes in a few minutes. Our method is used to analyse data on the motion of bacteria, and provides fits to the data that both have substantially smaller residual sum of squares and are more parsimonious than two competing approaches. |
Tasks | Time Series |
Published | 2017-01-06 |
URL | http://arxiv.org/abs/1701.01672v2 |
http://arxiv.org/pdf/1701.01672v2.pdf | |
PWC | https://paperswithcode.com/paper/detecting-changes-in-slope-with-an-l_0 |
Repo | |
Framework | |
Stabiliser states are efficiently PAC-learnable
Title | Stabiliser states are efficiently PAC-learnable |
Authors | Andrea Rocchetto |
Abstract | The exponential scaling of the wave function is a fundamental property of quantum systems with far reaching implications in our ability to process quantum information. A problem where these are particularly relevant is quantum state tomography. State tomography, whose objective is to obtain a full description of a quantum system, can be analysed in the framework of computational learning theory. In this model, quantum states have been shown to be Probably Approximately Correct (PAC)-learnable with sample complexity linear in the number of qubits. However, it is conjectured that in general quantum states require an exponential amount of computation to be learned. Here, using results from the literature on the efficient classical simulation of quantum systems, we show that stabiliser states are efficiently PAC-learnable. Our results solve an open problem formulated by Aaronson [Proc. R. Soc. A, 2088, (2007)] and propose learning theory as a tool for exploring the power of quantum computation. |
Tasks | Quantum State Tomography |
Published | 2017-04-30 |
URL | http://arxiv.org/abs/1705.00345v2 |
http://arxiv.org/pdf/1705.00345v2.pdf | |
PWC | https://paperswithcode.com/paper/stabiliser-states-are-efficiently-pac |
Repo | |
Framework | |
Robust Online Matrix Factorization for Dynamic Background Subtraction
Title | Robust Online Matrix Factorization for Dynamic Background Subtraction |
Authors | Hongwei Yong, Deyu Meng, Wangmeng Zuo, Lei Zhang |
Abstract | We propose an effective online background subtraction method, which can be robustly applied to practical videos that have variations in both foreground and background. Different from previous methods which often model the foreground as Gaussian or Laplacian distributions, we model the foreground for each frame with a specific mixture of Gaussians (MoG) distribution, which is updated online frame by frame. Particularly, our MoG model in each frame is regularized by the learned foreground/background knowledge in previous frames. This makes our online MoG model highly robust, stable and adaptive to practical foreground and background variations. The proposed model can be formulated as a concise probabilistic MAP model, which can be readily solved by EM algorithm. We further embed an affine transformation operator into the proposed model, which can be automatically adjusted to fit a wide range of video background transformations and make the method more robust to camera movements. With using the sub-sampling technique, the proposed method can be accelerated to execute more than 250 frames per second on average, meeting the requirement of real-time background subtraction for practical video processing tasks. The superiority of the proposed method is substantiated by extensive experiments implemented on synthetic and real videos, as compared with state-of-the-art online and offline background subtraction methods. |
Tasks | |
Published | 2017-05-28 |
URL | http://arxiv.org/abs/1705.10000v1 |
http://arxiv.org/pdf/1705.10000v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-online-matrix-factorization-for |
Repo | |
Framework | |
Proportionate gradient updates with PercentDelta
Title | Proportionate gradient updates with PercentDelta |
Authors | Sami Abu-El-Haija |
Abstract | Deep Neural Networks are generally trained using iterative gradient updates. Magnitudes of gradients are affected by many factors, including choice of activation functions and initialization. More importantly, gradient magnitudes can greatly differ across layers, with some layers receiving much smaller gradients than others. causing some layers to train slower than others and therefore slowing down the overall convergence. We analytically explain this disproportionality. Then we propose to explicitly train all layers at the same speed, by scaling the gradient w.r.t. every trainable tensor to be proportional to its current value. In particular, at every batch, we want to update all trainable tensors, such that the relative change of the L1-norm of the tensors is the same, across all layers of the network, throughout training time. Experiments on MNIST show that our method appropriately scales gradients, such that the relative change in trainable tensors is approximately equal across layers. In addition, measuring the test accuracy with training time, shows that our method trains faster than other methods, giving higher test accuracy given same budget of training steps. |
Tasks | |
Published | 2017-08-24 |
URL | http://arxiv.org/abs/1708.07227v1 |
http://arxiv.org/pdf/1708.07227v1.pdf | |
PWC | https://paperswithcode.com/paper/proportionate-gradient-updates-with |
Repo | |
Framework | |
RubyStar: A Non-Task-Oriented Mixture Model Dialog System
Title | RubyStar: A Non-Task-Oriented Mixture Model Dialog System |
Authors | Huiting Liu, Tao Lin, Hanfei Sun, Weijian Lin, Chih-Wei Chang, Teng Zhong, Alexander Rudnicky |
Abstract | RubyStar is a dialog system designed to create “human-like” conversation by combining different response generation strategies. RubyStar conducts a non-task-oriented conversation on general topics by using an ensemble of rule-based, retrieval-based and generative methods. Topic detection, engagement monitoring, and context tracking are used for managing interaction. Predictable elements of conversation, such as the bot’s backstory and simple question answering are handled by separate modules. We describe a rating scheme we developed for evaluating response generation. We find that character-level RNN is an effective generation model for general responses, with proper parameter settings; however other kinds of conversation topics might benefit from using other models. |
Tasks | Question Answering |
Published | 2017-11-08 |
URL | http://arxiv.org/abs/1711.02781v3 |
http://arxiv.org/pdf/1711.02781v3.pdf | |
PWC | https://paperswithcode.com/paper/rubystar-a-non-task-oriented-mixture-model |
Repo | |
Framework | |
Non-Asymptotic Analysis of Robust Control from Coarse-Grained Identification
Title | Non-Asymptotic Analysis of Robust Control from Coarse-Grained Identification |
Authors | Stephen Tu, Ross Boczar, Andrew Packard, Benjamin Recht |
Abstract | This work explores the trade-off between the number of samples required to accurately build models of dynamical systems and the degradation of performance in various control objectives due to a coarse approximation. In particular, we show that simple models can be easily fit from input/output data and are sufficient for achieving various control objectives. We derive bounds on the number of noisy input/output samples from a stable linear time-invariant system that are sufficient to guarantee that the corresponding finite impulse response approximation is close to the true system in the $\mathcal{H}_\infty$-norm. We demonstrate that these demands are lower than those derived in prior art which aimed to accurately identify dynamical models. We also explore how different physical input constraints, such as power constraints, affect the sample complexity. Finally, we show how our analysis fits within the established framework of robust control, by demonstrating how a controller designed for an approximate system provably meets performance objectives on the true system. |
Tasks | |
Published | 2017-07-15 |
URL | http://arxiv.org/abs/1707.04791v2 |
http://arxiv.org/pdf/1707.04791v2.pdf | |
PWC | https://paperswithcode.com/paper/non-asymptotic-analysis-of-robust-control |
Repo | |
Framework | |
A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity
Title | A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity |
Authors | Pablo Hernandez-Leal, Michael Kaisers, Tim Baarslag, Enrique Munoz de Cote |
Abstract | The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target moves. Disparate streams of research have approached non-stationarity from several angles, which make a variety of implicit assumptions that make it hard to keep an overview of the state of the art and to validate the innovation and significance of new works. This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits. Further, we reflect on the principle approaches how algorithms model and cope with this non-stationarity, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind. A wide range of state-of-the-art algorithms is classified into a taxonomy, using these categories and key characteristics of the environment (e.g., observability) and adaptation behaviour of the opponents (e.g., smooth, abrupt). To clarify even further we present illustrative variations of one domain, contrasting the strengths and limitations of each category. Finally, we discuss in which environments the different approaches yield most merit, and point to promising avenues of future research. |
Tasks | Multi-Armed Bandits |
Published | 2017-07-28 |
URL | http://arxiv.org/abs/1707.09183v2 |
http://arxiv.org/pdf/1707.09183v2.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-of-learning-in-multiagent |
Repo | |
Framework | |