Paper Group ANR 114
Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task. Online learning with graph-structured feedback against adaptive adversaries. Leveraging the Exact Likelihood of Deep Latent Variable Models. Automatic Language Identification in Texts: A Survey. Spectral Learning of Binomial HMMs for DNA Methylation Data. Sequentia …
Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task
Title | Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task |
Authors | An Yang, Kai Liu, Jing Liu, Yajuan Lyu, Sujian Li |
Abstract | Current evaluation metrics to question answering based machine reading comprehension (MRC) systems generally focus on the lexical overlap between the candidate and reference answers, such as ROUGE and BLEU. However, bias may appear when these metrics are used for specific question types, especially questions inquiring yes-no opinions and entity lists. In this paper, we make adaptations on the metrics to better correlate n-gram overlap with the human judgment for answers to these two question types. Statistical analysis proves the effectiveness of our approach. Our adaptations may provide positive guidance for the development of real-scene MRC systems. |
Tasks | Machine Reading Comprehension, Question Answering, Reading Comprehension |
Published | 2018-06-10 |
URL | http://arxiv.org/abs/1806.03578v1 |
http://arxiv.org/pdf/1806.03578v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptations-of-rouge-and-bleu-to-better |
Repo | |
Framework | |
Online learning with graph-structured feedback against adaptive adversaries
Title | Online learning with graph-structured feedback against adaptive adversaries |
Authors | Zhili Feng, Po-Ling Loh |
Abstract | We derive upper and lower bounds for the policy regret of $T$-round online learning problems with graph-structured feedback, where the adversary is nonoblivious but assumed to have a bounded memory. We obtain upper bounds of $\widetilde O(T^{2/3})$ and $\widetilde O(T^{3/4})$ for strongly-observable and weakly-observable graphs, respectively, based on analyzing a variant of the Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we show that a matching lower bound of $\widetilde\Omega(T^{2/3})$ is achieved in the case of full-information feedback. We also study the particular loss structure of an oblivious adversary with switching costs, and show that in such a setting, non-revealing strongly-observable feedback graphs achieve a lower bound of $\widetilde\Omega(T^{2/3})$, as well. |
Tasks | |
Published | 2018-04-01 |
URL | http://arxiv.org/abs/1804.00335v1 |
http://arxiv.org/pdf/1804.00335v1.pdf | |
PWC | https://paperswithcode.com/paper/online-learning-with-graph-structured |
Repo | |
Framework | |
Leveraging the Exact Likelihood of Deep Latent Variable Models
Title | Leveraging the Exact Likelihood of Deep Latent Variable Models |
Authors | Pierre-Alexandre Mattei, Jes Frellsen |
Abstract | Deep latent variable models (DLVMs) combine the approximation abilities of deep neural networks and the statistical foundations of generative models. Variational methods are commonly used for inference; however, the exact likelihood of these models has been largely overlooked. The purpose of this work is to study the general properties of this quantity and to show how they can be leveraged in practice. We focus on important inferential problems that rely on the likelihood: estimation and missing data imputation. First, we investigate maximum likelihood estimation for DLVMs: in particular, we show that most unconstrained models used for continuous data have an unbounded likelihood function. This problematic behaviour is demonstrated to be a source of mode collapse. We also show how to ensure the existence of maximum likelihood estimates, and draw useful connections with nonparametric mixture models. Finally, we describe an algorithm for missing data imputation using the exact conditional likelihood of a deep latent variable model. On several data sets, our algorithm consistently and significantly outperforms the usual imputation scheme used for DLVMs. |
Tasks | Imputation, Latent Variable Models |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04826v4 |
http://arxiv.org/pdf/1802.04826v4.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-the-exact-likelihood-of-deep |
Repo | |
Framework | |
Automatic Language Identification in Texts: A Survey
Title | Automatic Language Identification in Texts: A Survey |
Authors | Tommi Jauhiainen, Marco Lui, Marcos Zampieri, Timothy Baldwin, Krister Lindén |
Abstract | Language identification (LI) is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years. Today, LI is a key part of many text processing pipelines, as text processing techniques generally assume that the language of the input text is known. Research in this area has recently been especially active. This article provides a brief history of LI research, and an extensive survey of the features and methods used so far in the LI literature. For describing the features and methods we introduce a unified notation. We discuss evaluation methods, applications of LI, as well as off-the-shelf LI systems that do not require training by the end user. Finally, we identify open issues, survey the work to date on each issue, and propose future directions for research in LI. |
Tasks | Language Identification |
Published | 2018-04-22 |
URL | http://arxiv.org/abs/1804.08186v2 |
http://arxiv.org/pdf/1804.08186v2.pdf | |
PWC | https://paperswithcode.com/paper/automatic-language-identification-in-texts-a |
Repo | |
Framework | |
Spectral Learning of Binomial HMMs for DNA Methylation Data
Title | Spectral Learning of Binomial HMMs for DNA Methylation Data |
Authors | Chicheng Zhang, Eran A. Mukamel, Kamalika Chaudhuri |
Abstract | We consider learning parameters of Binomial Hidden Markov Models, which may be used to model DNA methylation data. The standard algorithm for the problem is EM, which is computationally expensive for sequences of the scale of the mammalian genome. Recently developed spectral algorithms can learn parameters of latent variable models via tensor decomposition, and are highly efficient for large data. However, these methods have only been applied to categorial HMMs, and the main challenge is how to extend them to Binomial HMMs while still retaining computational efficiency. We address this challenge by introducing a new feature-map based approach that exploits specific properties of Binomial HMMs. We provide theoretical performance guarantees for our algorithm and evaluate it on real DNA methylation data. |
Tasks | Latent Variable Models |
Published | 2018-02-07 |
URL | http://arxiv.org/abs/1802.02498v1 |
http://arxiv.org/pdf/1802.02498v1.pdf | |
PWC | https://paperswithcode.com/paper/spectral-learning-of-binomial-hmms-for-dna |
Repo | |
Framework | |
Sequential Outlier Detection based on Incremental Decision Trees
Title | Sequential Outlier Detection based on Incremental Decision Trees |
Authors | Mohammadreza Mohaghegh Neyshabouri, Suleyman Serdar Kozat |
Abstract | We introduce an online outlier detection algorithm to detect outliers in a sequentially observed data stream. For this purpose, we use a two-stage filtering and hedging approach. In the first stage, we construct a multi-modal probability density function to model the normal samples. In the second stage, given a new observation, we label it as an anomaly if the value of aforementioned density function is below a specified threshold at the newly observed point. In order to construct our multi-modal density function, we use an incremental decision tree to construct a set of subspaces of the observation space. We train a single component density function of the exponential family using the observations, which fall inside each subspace represented on the tree. These single component density functions are then adaptively combined to produce our multi-modal density function, which is shown to achieve the performance of the best convex combination of the density functions defined on the subspaces. As we observe more samples, our tree grows and produces more subspaces. As a result, our modeling power increases in time, while mitigating overfitting issues. In order to choose our threshold level to label the observations, we use an adaptive thresholding scheme. We show that our adaptive threshold level achieves the performance of the optimal pre-fixed threshold level, which knows the observation labels in hindsight. Our algorithm provides significant performance improvements over the state of the art in our wide set of experiments involving both synthetic as well as real data. |
Tasks | Outlier Detection |
Published | 2018-03-09 |
URL | http://arxiv.org/abs/1803.03674v1 |
http://arxiv.org/pdf/1803.03674v1.pdf | |
PWC | https://paperswithcode.com/paper/sequential-outlier-detection-based-on |
Repo | |
Framework | |
Deep joint rain and haze removal from single images
Title | Deep joint rain and haze removal from single images |
Authors | Liang Shen, Zihan Yue, Quan Chen, Fan Feng, Jie Ma |
Abstract | Rain removal from a single image is a challenge which has been studied for a long time. In this paper, a novel convolutional neural network based on wavelet and dark channel is proposed. On one hand, we think that rain streaks correspond to high frequency component of the image. Therefore, haar wavelet transform is a good choice to separate the rain streaks and background to some extent. More specifically, the LL subband of a rain image is more inclined to express the background information, while LH, HL, HH subband tend to represent the rain streaks and the edges. On the other hand, the accumulation of rain streaks from long distance makes the rain image look like haze veil. We extract dark channel of rain image as a feature map in network. By increasing this mapping between the dark channel of input and output images, we achieve haze removal in an indirect way. All of the parameters are optimized by back-propagation. Experiments on both synthetic and real- world datasets reveal that our method outperforms other state-of- the-art methods from a qualitative and quantitative perspective. |
Tasks | Rain Removal |
Published | 2018-01-21 |
URL | http://arxiv.org/abs/1801.06769v1 |
http://arxiv.org/pdf/1801.06769v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-joint-rain-and-haze-removal-from-single |
Repo | |
Framework | |
Overpruning in Variational Bayesian Neural Networks
Title | Overpruning in Variational Bayesian Neural Networks |
Authors | Brian Trippe, Richard Turner |
Abstract | The motivations for using variational inference (VI) in neural networks differ significantly from those in latent variable models. This has a counter-intuitive consequence; more expressive variational approximations can provide significantly worse predictions as compared to those with less expressive families. In this work we make two contributions. First, we identify a cause of this performance gap, variational over-pruning. Second, we introduce a theoretically grounded explanation for this phenomenon. Our perspective sheds light on several related published results and provides intuition into the design of effective variational approximations of neural networks. |
Tasks | Latent Variable Models |
Published | 2018-01-18 |
URL | http://arxiv.org/abs/1801.06230v1 |
http://arxiv.org/pdf/1801.06230v1.pdf | |
PWC | https://paperswithcode.com/paper/overpruning-in-variational-bayesian-neural |
Repo | |
Framework | |
Understanding VAEs in Fisher-Shannon Plane
Title | Understanding VAEs in Fisher-Shannon Plane |
Authors | Huangjie Zheng, Jiangchao Yao, Ya Zhang, Ivor W. Tsang, Jia Wang |
Abstract | In information theory, Fisher information and Shannon information (entropy) are respectively used to quantify the uncertainty associated with the distribution modeling and the uncertainty in specifying the outcome of given variables. These two quantities are complementary and are jointly applied to information behavior analysis in most cases. The uncertainty property in information asserts a fundamental trade-off between Fisher information and Shannon information, which enlightens us the relationship between the encoder and the decoder in variational auto-encoders (VAEs). In this paper, we investigate VAEs in the Fisher-Shannon plane and demonstrate that the representation learning and the log-likelihood estimation are intrinsically related to these two information quantities. Through extensive qualitative and quantitative experiments, we provide with a better comprehension of VAEs in tasks such as high-resolution reconstruction, and representation learning in the perspective of Fisher information and Shannon information. We further propose a variant of VAEs, termed as Fisher auto-encoder (FAE), for practical needs to balance Fisher information and Shannon information. Our experimental results have demonstrated its promise in improving the reconstruction accuracy and avoiding the non-informative latent code as occurred in previous works. |
Tasks | Representation Learning |
Published | 2018-07-10 |
URL | http://arxiv.org/abs/1807.03723v2 |
http://arxiv.org/pdf/1807.03723v2.pdf | |
PWC | https://paperswithcode.com/paper/understanding-vaes-in-fisher-shannon-plane |
Repo | |
Framework | |
Putting the Horse Before the Cart:A Generator-Evaluator Framework for Question Generation from Text
Title | Putting the Horse Before the Cart:A Generator-Evaluator Framework for Question Generation from Text |
Authors | Vishwajeet Kumar, Ganesh Ramakrishnan, Yuan-Fang Li |
Abstract | Automatic question generation (QG) is a useful yet challenging task in NLP. Recent neural network-based approaches represent the state-of-the-art in this task. In this work, we attempt to strengthen them significantly by adopting a holistic and novel generator-evaluator framework that directly optimizes objectives that reward semantics and structure. The {\it generator} is a sequence-to-sequence model that incorporates the {\it structure} and {\it semantics} of the question being generated. The generator predicts an answer in the passage that the question can pivot on. Employing the copy and coverage mechanisms, it also acknowledges other contextually important (and possibly rare) keywords in the passage that the question needs to conform to, while not redundantly repeating words. The {\it evaluator} model evaluates and assigns a reward to each predicted question based on its conformity to the {\it structure} of ground-truth questions. We propose two novel QG-specific reward functions for text conformity and answer conformity of the generated question. The evaluator also employs structure-sensitive rewards based on evaluation measures such as BLEU, GLEU, and ROUGE-L, which are suitable for QG. In contrast, most of the previous works only optimize the cross-entropy loss, which can induce inconsistencies between training (objective) and testing (evaluation) measures. Our evaluation shows that our approach significantly outperforms state-of-the-art systems on the widely-used SQuAD benchmark as per both automatic and human evaluation. |
Tasks | Question Generation |
Published | 2018-08-15 |
URL | https://arxiv.org/abs/1808.04961v5 |
https://arxiv.org/pdf/1808.04961v5.pdf | |
PWC | https://paperswithcode.com/paper/a-framework-for-automatic-question-generation |
Repo | |
Framework | |
Privacy-preserving Neural Representations of Text
Title | Privacy-preserving Neural Representations of Text |
Authors | Maximin Coavoux, Shashi Narayan, Shay B. Cohen |
Abstract | This article deals with adversarial attacks towards deep learning systems for Natural Language Processing (NLP), in the context of privacy protection. We study a specific type of attack: an attacker eavesdrops on the hidden representations of a neural text classifier and tries to recover information about the input text. Such scenario may arise in situations when the computation of a neural network is shared across multiple devices, e.g. some hidden representation is computed by a user’s device and sent to a cloud-based model. We measure the privacy of a hidden representation by the ability of an attacker to predict accurately specific private information from it and characterize the tradeoff between the privacy and the utility of neural representations. Finally, we propose several defense methods based on modified training objectives and show that they improve the privacy of neural representations. |
Tasks | |
Published | 2018-08-28 |
URL | http://arxiv.org/abs/1808.09408v1 |
http://arxiv.org/pdf/1808.09408v1.pdf | |
PWC | https://paperswithcode.com/paper/privacy-preserving-neural-representations-of |
Repo | |
Framework | |
Utilizing Device-level Demand Forecasting for Flexibility Markets - Full Version
Title | Utilizing Device-level Demand Forecasting for Flexibility Markets - Full Version |
Authors | Bijay Neupane, Torben Bach Pedersen, Bo Thiesson |
Abstract | The uncertainty in the power supply due to fluctuating Renewable Energy Sources (RES) has severe (financial and other) implications for energy market players. In this paper, we present a device-level Demand Response (DR) scheme that captures the atomic (all available) flexibilities in energy demand and provides the largest possible solution space to generate demand/supply schedules that minimize market imbalances. We evaluate the effectiveness and feasibility of widely used forecasting models for device-level flexibility analysis. In a typical device-level flexibility forecast, a market player is more concerned with the \textit{utility} that the demand flexibility brings to the market, rather than the intrinsic forecast accuracy. In this regard, we provide comprehensive predictive modeling and scheduling of demand flexibility from household appliances to demonstrate the (financial and otherwise) viability of introducing flexibility-based DR in the Danish/Nordic market. Further, we investigate the correlation between the potential utility and the accuracy of the demand forecast model. Furthermore, we perform a number of experiments to determine the data granularity that provides the best financial reward to market players for adopting the proposed DR scheme. A cost-benefit analysis of forecast results shows that even with somewhat low forecast accuracy, market players can achieve regulation cost savings of 54% of the theoretically optimal. |
Tasks | |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.00702v1 |
http://arxiv.org/pdf/1805.00702v1.pdf | |
PWC | https://paperswithcode.com/paper/utilizing-device-level-demand-forecasting-for |
Repo | |
Framework | |
On a New Improvement-Based Acquisition Function for Bayesian Optimization
Title | On a New Improvement-Based Acquisition Function for Bayesian Optimization |
Authors | Umberto Noè, Dirk Husmeier |
Abstract | Bayesian optimization (BO) is a popular algorithm for solving challenging optimization tasks. It is designed for problems where the objective function is expensive to evaluate, perhaps not available in exact form, without gradient information and possibly returning noisy values. Different versions of the algorithm vary in the choice of the acquisition function, which recommends the point to query the objective at next. Initially, researchers focused on improvement-based acquisitions, while recently the attention has shifted to more computationally expensive information-theoretical measures. In this paper we present two major contributions to the literature. First, we propose a new improvement-based acquisition function that recommends query points where the improvement is expected to be high with high confidence. The proposed algorithm is evaluated on a large set of benchmark functions from the global optimization literature, where it turns out to perform at least as well as current state-of-the-art acquisition functions, and often better. This suggests that it is a powerful default choice for BO. The novel policy is then compared to widely used global optimization solvers in order to confirm that BO methods reduce the computational costs of the optimization by keeping the number of function evaluations small. The second main contribution represents an application to precision medicine, where the interest lies in the estimation of parameters of a partial differential equations model of the human pulmonary blood circulation system. Once inferred, these parameters can help clinicians in diagnosing a patient with pulmonary hypertension without going through the standard invasive procedure of right heart catheterization, which can lead to side effects and complications (e.g. severe pain, internal bleeding, thrombosis). |
Tasks | |
Published | 2018-08-21 |
URL | http://arxiv.org/abs/1808.06918v1 |
http://arxiv.org/pdf/1808.06918v1.pdf | |
PWC | https://paperswithcode.com/paper/on-a-new-improvement-based-acquisition |
Repo | |
Framework | |
AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference
Title | AutoPruner: An End-to-End Trainable Filter Pruning Method for Efficient Deep Model Inference |
Authors | Jian-Hao Luo, Jianxin Wu |
Abstract | Channel pruning is an important family of methods to speed up deep model’s inference. Previous filter pruning algorithms regard channel pruning and model fine-tuning as two independent steps. This paper argues that combining them into a single end-to-end trainable system will lead to better results. We propose an efficient channel selection layer, namely AutoPruner, to find less important filters automatically in a joint training manner. Our AutoPruner takes previous activation responses as an input and generates a true binary index code for pruning. Hence, all the filters corresponding to zero index values can be removed safely after training. We empirically demonstrate that the gradient information of this channel selection layer is also helpful for the whole model training. By gradually erasing several weak filters, we can prevent an excessive drop in model accuracy. Compared with previous state-of-the-art pruning algorithms (including training from scratch), AutoPruner achieves significantly better performance. Furthermore, ablation experiments show that the proposed novel mini-batch pooling and binarization operations are vital for the success of filter pruning. |
Tasks | |
Published | 2018-05-23 |
URL | http://arxiv.org/abs/1805.08941v3 |
http://arxiv.org/pdf/1805.08941v3.pdf | |
PWC | https://paperswithcode.com/paper/autopruner-an-end-to-end-trainable-filter |
Repo | |
Framework | |
Stochasticity from function – why the Bayesian brain may need no noise
Title | Stochasticity from function – why the Bayesian brain may need no noise |
Authors | Dominik Dold, Ilja Bytschok, Akos F. Kungl, Andreas Baumbach, Oliver Breitwieser, Walter Senn, Johannes Schemmel, Karlheinz Meier, Mihai A. Petrovici |
Abstract | An increasing body of evidence suggests that the trial-to-trial variability of spiking activity in the brain is not mere noise, but rather the reflection of a sampling-based encoding scheme for probabilistic computing. Since the precise statistical properties of neural activity are important in this context, many models assume an ad-hoc source of well-behaved, explicit noise, either on the input or on the output side of single neuron dynamics, most often assuming an independent Poisson process in either case. However, these assumptions are somewhat problematic: neighboring neurons tend to share receptive fields, rendering both their input and their output correlated; at the same time, neurons are known to behave largely deterministically, as a function of their membrane potential and conductance. We suggest that spiking neural networks may, in fact, have no need for noise to perform sampling-based Bayesian inference. We study analytically the effect of auto- and cross-correlations in functionally Bayesian spiking networks and demonstrate how their effect translates to synaptic interaction strengths, rendering them controllable through synaptic plasticity. This allows even small ensembles of interconnected deterministic spiking networks to simultaneously and co-dependently shape their output activity through learning, enabling them to perform complex Bayesian computation without any need for noise, which we demonstrate in silico, both in classical simulation and in neuromorphic emulation. These results close a gap between the abstract models and the biology of functionally Bayesian spiking networks, effectively reducing the architectural constraints imposed on physical neural substrates required to perform probabilistic computing, be they biological or artificial. |
Tasks | Bayesian Inference |
Published | 2018-09-21 |
URL | https://arxiv.org/abs/1809.08045v3 |
https://arxiv.org/pdf/1809.08045v3.pdf | |
PWC | https://paperswithcode.com/paper/stochasticity-from-function-why-the-bayesian |
Repo | |
Framework | |