January 29, 2020

3196 words 16 mins read

Paper Group ANR 490

Attending to Entities for Better Text Understanding. Quasi-Monte Carlo sampling for machine-learning partial differential equations. Comparing the Performance of the LSTM and HMM Language Models via Structural Similarity. Efficient Dialogue State Tracking by Selectively Overwriting Memory. Towards Generalizable Forgery Detection with Locality-aware …

Attending to Entities for Better Text Understanding


Title	Attending to Entities for Better Text Understanding
Authors	Pengxiang Cheng, Katrin Erk
Abstract	Recent progress in NLP witnessed the development of large-scale pre-trained language models (GPT, BERT, XLNet, etc.) based on Transformer (Vaswani et al. 2017), and in a range of end tasks, such models have achieved state-of-the-art results, approaching human performance. This demonstrates the power of the stacked self-attention architecture when paired with a sufficient number of layers and a large amount of pre-training data. However, on tasks that require complex and long-distance reasoning where surface-level cues are not enough, there is still a large gap between the pre-trained models and human performance. Strubell et al. (2018) recently showed that it is possible to inject knowledge of syntactic structure into a model through supervised self-attention. We conjecture that a similar injection of semantic knowledge, in particular, coreference information, into an existing model would improve performance on such complex problems. On the LAMBADA (Paperno et al. 2016) task, we show that a model trained from scratch with coreference as auxiliary supervision for self-attention outperforms the largest GPT-2 model, setting the new state-of-the-art, while only containing a tiny fraction of parameters compared to GPT-2. We also conduct a thorough analysis of different variants of model architectures and supervision configurations, suggesting future directions on applying similar techniques to other problems.
Tasks
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04361v1
PDF	https://arxiv.org/pdf/1911.04361v1.pdf
PWC	https://paperswithcode.com/paper/attending-to-entities-for-better-text
Repo
Framework

Quasi-Monte Carlo sampling for machine-learning partial differential equations


Title	Quasi-Monte Carlo sampling for machine-learning partial differential equations
Authors	Jingrun Chen, Rui Du, Panchi Li, Liyao Lyu
Abstract	Solving partial differential equations in high dimensions by deep neural network has brought significant attentions in recent years. In many scenarios, the loss function is defined as an integral over a high-dimensional domain. Monte-Carlo method, together with the deep neural network, is used to overcome the curse of dimensionality, while classical methods fail. Often, a deep neural network outperforms classical numerical methods in terms of both accuracy and efficiency. In this paper, we propose to use quasi-Monte Carlo sampling, instead of Monte-Carlo method to approximate the loss function. To demonstrate the idea, we conduct numerical experiments in the framework of deep Ritz method proposed by Weinan E and Bing Yu. For the same accuracy requirement, it is observed that quasi-Monte Carlo sampling reduces the size of training data set by more than two orders of magnitude compared to that of MC method. Under some assumptions, we prove that quasi-Monte Carlo sampling together with the deep neural network generates a convergent series with rate proportional to the approximation accuracy of quasi-Monte Carlo method for numerical integration. Numerically the fitted convergence rate is a bit smaller, but the proposed approach always outperforms Monte Carlo method. It is worth mentioning that the convergence analysis is generic whenever a loss function is approximated by the quasi-Monte Carlo method, although observations here are based on deep Ritz method.
Tasks
Published	2019-11-05
URL	https://arxiv.org/abs/1911.01612v1
PDF	https://arxiv.org/pdf/1911.01612v1.pdf
PWC	https://paperswithcode.com/paper/quasi-monte-carlo-sampling-for-machine
Repo
Framework

Comparing the Performance of the LSTM and HMM Language Models via Structural Similarity


Title	Comparing the Performance of the LSTM and HMM Language Models via Structural Similarity
Authors	Larkin Liu, Yu-Chung Lin, Joshua Reid
Abstract	Language models based on deep neural networks and traditional stochastic modelling have become both highly functional and effective in recent times. In this work, a general survey into the two types of language modelling is conducted. We investigate the effectiveness of the Hidden Markov Model (HMM), and the Long Short-Term Memory Model (LSTM). We analyze the hidden state structures common to both models, and present an analysis on structural similarity of the hidden states, common to both HMM’s and LSTM’s. We compare the LSTM’s predictive accuracy and hidden state output with respect to the HMM for a varying number of hidden states. In this work, we justify that the less complex HMM can serve as an appropriate approximation of the LSTM model.
Tasks	Language Modelling
Published	2019-07-09
URL	https://arxiv.org/abs/1907.04670v3
PDF	https://arxiv.org/pdf/1907.04670v3.pdf
PWC	https://paperswithcode.com/paper/improving-the-performance-of-the-lstm-and-hmm
Repo
Framework

Efficient Dialogue State Tracking by Selectively Overwriting Memory


Title	Efficient Dialogue State Tracking by Selectively Overwriting Memory
Authors	Sungdong Kim, Sohee Yang, Gyuwan Kim, Sang-Woo Lee
Abstract	Recent works in dialogue state tracking (DST) focus on an open vocabulary-based setting to resolve scalability and generalization issues of the predefined ontology-based approaches. However, they are computationally inefficient in that they predict the dialogue state at every turn from scratch. In this paper, we consider dialogue state as an explicit fixed-sized memory, and propose a selectively overwriting mechanism for more efficient DST. This mechanism consists of two steps: (1) predicting state operation on each of the memory slots, and (2) overwriting the memory with new values, of which only a few are generated according to the predicted state operations. Moreover, reducing the burden of the decoder by decomposing DST into two sub-tasks and guiding the decoder to focus only one of the tasks enables a more effective training and improvement in the performance. As a result, our proposed SOM-DST (Selectively Overwriting Memory for Dialogue State Tracking) achieves state-of-the-art joint goal accuracy with 51.38% in MultiWOZ 2.0 and 52.57% in MultiWOZ 2.1 in an open vocabulary-based DST setting. In addition, a massive gap between the current accuracy and the accuracy when ground truth operations are given suggests that improving the performance of state operation prediction is a promising research direction of DST.
Tasks	Dialogue State Tracking
Published	2019-11-10
URL	https://arxiv.org/abs/1911.03906v1
PDF	https://arxiv.org/pdf/1911.03906v1.pdf
PWC	https://paperswithcode.com/paper/efficient-dialogue-state-tracking-by
Repo
Framework

Towards Generalizable Forgery Detection with Locality-aware AutoEncoder


Title	Towards Generalizable Forgery Detection with Locality-aware AutoEncoder
Authors	Mengnan Du, Shiva Pentyala, Yuening Li, Xia Hu
Abstract	With advancements of deep learning techniques, it is now possible to generate super-realistic fake images and videos. These manipulated forgeries could reach mass audience and result in adverse impacts on our society. Although lots of efforts have been devoted to detect forgeries, their performance drops significantly on previously unseen but related manipulations and the detection generalization capability remains a problem. To bridge this gap, in this paper we propose Locality-aware AutoEncoder (LAE), which combines fine-grained representation learning and enforcing locality in a unified framework. In the training process, we use pixel-wise mask to regularize local interpretation of LAE to enforce the model to learn intrinsic representation from the forgery region, instead of capturing artifacts in the training set and learning spurious correlations to perform detection. We further propose an active learning framework to select the challenging candidates for labeling, to reduce the annotation efforts to regularize interpretations. Experimental results indicate that LAE indeed could focus on the forgery regions to make decisions. The results further show that LAE achieves superior generalization performance compared to state-of-the-arts on forgeries generated by alternative manipulation methods.
Tasks	Active Learning, Representation Learning
Published	2019-09-13
URL	https://arxiv.org/abs/1909.05999v1
PDF	https://arxiv.org/pdf/1909.05999v1.pdf
PWC	https://paperswithcode.com/paper/towards-generalizable-forgery-detection-with
Repo
Framework

On weighted uncertainty sampling in active learning


Title	On weighted uncertainty sampling in active learning
Authors	Vinay Jethava
Abstract	This note explores probabilistic sampling weighted by uncertainty in active learning. This method has been previously used and authors have tangentially remarked on its efficacy. The scheme has several benefits: (1) it is computationally cheap, (2) it can be implemented in a single-pass streaming fashion which is a benefit when deployed in real-world systems where different subsystems perform the suggestion scoring and extraction of user feedback, and (3) it is easily parameterizable. In this paper, we show on publicly available datasets that using probabilistic weighting is often beneficial and strikes a good compromise between exploration and representation especially when the starting set of labelled points is biased.
Tasks	Active Learning
Published	2019-09-11
URL	https://arxiv.org/abs/1909.04928v1
PDF	https://arxiv.org/pdf/1909.04928v1.pdf
PWC	https://paperswithcode.com/paper/on-weighted-uncertainty-sampling-in-active
Repo
Framework

TEINet: Towards an Efficient Architecture for Video Recognition


Title	TEINet: Towards an Efficient Architecture for Video Recognition
Authors	Zhaoyang Liu, Donghao Luo, Yabiao Wang, Limin Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Tong Lu
Abstract	Efficiency is an important issue in designing video architectures for action recognition. 3D CNNs have witnessed remarkable progress in action recognition from videos. However, compared with their 2D counterparts, 3D convolutions often introduce a large amount of parameters and cause high computational cost. To relieve this problem, we propose an efficient temporal module, termed as Temporal Enhancement-and-Interaction (TEI Module), which could be plugged into the existing 2D CNNs (denoted by TEINet). The TEI module presents a different paradigm to learn temporal features by decoupling the modeling of channel correlation and temporal interaction. First, it contains a Motion Enhanced Module (MEM) which is to enhance the motion-related features while suppress irrelevant information (e.g., background). Then, it introduces a Temporal Interaction Module (TIM) which supplements the temporal contextual information in a channel-wise manner. This two-stage modeling scheme is not only able to capture temporal structure flexibly and effectively, but also efficient for model inference. We conduct extensive experiments to verify the effectiveness of TEINet on several benchmarks (e.g., Something-Something V1&V2, Kinetics, UCF101 and HMDB51). Our proposed TEINet can achieve a good recognition accuracy on these datasets but still preserve a high efficiency.
Tasks	Video Recognition
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09435v1
PDF	https://arxiv.org/pdf/1911.09435v1.pdf
PWC	https://paperswithcode.com/paper/teinet-towards-an-efficient-architecture-for
Repo
Framework

Adversarial Orthogonal Regression: Two non-Linear Regressions for Causal Inference


Title	Adversarial Orthogonal Regression: Two non-Linear Regressions for Causal Inference
Authors	M. Reza Heydari, Saber Salehkaleybar, Kun Zhang
Abstract	We propose two nonlinear regression methods, named Adversarial Orthogonal Regression (AdOR) for additive noise models and Adversarial Orthogonal Structural Equation Model (AdOSE) for the general case of structural equation models. Both methods try to make the residual of regression independent from regressors while putting no assumption on noise distribution. In both methods, two adversarial networks are trained simultaneously where a regression network outputs predictions and a loss network that estimates mutual information (in AdOR) and KL-divergence (in AdOSE). These methods can be formulated as a minimax two-player game; at equilibrium, AdOR finds a deterministic map between inputs and output and estimates mutual information between residual and inputs, while AdOSE estimates a conditional probability distribution of output given inputs. The proposed methods can be used as subroutines to address several learning problems in causality, such as causal direction determination (or more generally, causal structure learning) and causal model estimation. Synthetic and real-world experiments demonstrate that the proposed methods have a remarkable performance with respect to previous solutions.
Tasks	Causal Inference
Published	2019-09-10
URL	https://arxiv.org/abs/1909.04454v1
PDF	https://arxiv.org/pdf/1909.04454v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-orthogonal-regression-two-non
Repo
Framework

Fast Mean Estimation with Sub-Gaussian Rates


Title	Fast Mean Estimation with Sub-Gaussian Rates
Authors	Yeshwanth Cherapanamjeri, Nicolas Flammarion, Peter L. Bartlett
Abstract	We propose an estimator for the mean of a random vector in $\mathbb{R}^d$ that can be computed in time $O(n^4+n^2d)$ for $n$ i.i.d.~samples and that has error bounds matching the sub-Gaussian case. The only assumptions we make about the data distribution are that it has finite mean and covariance; in particular, we make no assumptions about higher-order moments. Like the polynomial time estimator introduced by Hopkins, 2018, which is based on the sum-of-squares hierarchy, our estimator achieves optimal statistical efficiency in this challenging setting, but it has a significantly faster runtime and a simpler analysis.
Tasks
Published	2019-02-06
URL	http://arxiv.org/abs/1902.01998v1
PDF	http://arxiv.org/pdf/1902.01998v1.pdf
PWC	https://paperswithcode.com/paper/fast-mean-estimation-with-sub-gaussian-rates
Repo
Framework

A discriminative approach for finding and characterizing positivity violations using decision trees


Title	A discriminative approach for finding and characterizing positivity violations using decision trees
Authors	Ehud Karavani, Peter Bak, Yishai Shimoni
Abstract	The assumption of positivity in causal inference (also known as common support and co-variate overlap) is necessary to obtain valid causal estimates. Therefore, confirming it holds in a given dataset is an important first step of any causal analysis. Most common methods to date are insufficient for discovering non-positivity, as they do not scale for modern high-dimensional covariate spaces, or they cannot pinpoint the subpopulation violating positivity. To overcome these issues, we suggest to harness decision trees for detecting violations. By dividing the covariate space into mutually exclusive regions, each with maximized homogeneity of treatment groups, decision trees can be used to automatically detect subspaces violating positivity. By augmenting the method with an additional random forest model, we can quantify the robustness of the violation within each subspace. This solution is scalable and provides an interpretable characterization of the subspaces in which violations occur. We provide a visualization of the stratification rules that define each subpopulation, combined with the severity of positivity violation within it. We also provide an interactive version of the visualization that allows a deeper dive into the properties of each subspace.
Tasks	Causal Inference
Published	2019-07-18
URL	https://arxiv.org/abs/1907.08127v1
PDF	https://arxiv.org/pdf/1907.08127v1.pdf
PWC	https://paperswithcode.com/paper/a-discriminative-approach-for-finding-and
Repo
Framework

Quantifying Error in the Presence of Confounders for Causal Inference


Title	Quantifying Error in the Presence of Confounders for Causal Inference
Authors	Rathin Desai, Amit Sharma
Abstract	Estimating average causal effect (ACE) is useful whenever we want to know the effect of an intervention on a given outcome. In the absence of a randomized experiment, many methods such as stratification and inverse propensity weighting have been proposed to estimate ACE. However, it is hard to know which method is optimal for a given dataset or which hyperparameters to use for a chosen method. To this end, we provide a framework to characterize the loss of a causal inference method against the true ACE, by framing causal inference as a representation learning problem. We show that many popular methods, including back-door methods can be considered as weighting or representation learning algorithms, and provide general error bounds for their causal estimates. In addition, we consider the case when unobserved variables can confound the causal estimate and extend proposed bounds using principles of robust statistics, considering confounding as contamination under the Huber contamination model. These bounds are also estimable; as an example, we provide empirical bounds for the Inverse Propensity Weighting (IPW) estimator and show how the bounds can be used to optimize the threshold of clipping extreme propensity scores. Our work provides a new way to reason about competing estimators, and opens up the potential of deriving new methods by minimizing the proposed error bounds.
Tasks	Causal Inference, Representation Learning
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04805v1
PDF	https://arxiv.org/pdf/1907.04805v1.pdf
PWC	https://paperswithcode.com/paper/quantifying-error-in-the-presence-of
Repo
Framework

Hierarchically Robust Representation Learning


Title	Hierarchically Robust Representation Learning
Authors	Qi Qian, Juhua Hu, Hao Li
Abstract	With the tremendous success of deep learning in visual tasks, the representations extracted from intermediate layers of learned models, that is, deep features, attract much attention of researchers. Previous empirical analysis shows that those features can contain appropriate semantic information. Therefore, with a model trained on a large-scale benchmark data set (e.g., ImageNet), the extracted features can work well on other tasks. In this work, we investigate this phenomenon and demonstrate that deep features can be suboptimal due to the fact that they are learned by minimizing the empirical risk. When the data distribution of the target task is different from that of the benchmark data set, the performance of deep features can degrade. Hence, we propose a hierarchically robust optimization method to learn more generic features. Considering the example-level and concept-level robustness simultaneously, we formulate the problem as a distributionally robust optimization problem with Wasserstein ambiguity set constraints, and an efficient algorithm with the conventional training pipeline is proposed. Experiments on benchmark data sets demonstrate the effectiveness of the robust deep representations.
Tasks	Representation Learning
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04047v2
PDF	https://arxiv.org/pdf/1911.04047v2.pdf
PWC	https://paperswithcode.com/paper/hierarchically-robust-representation-learning
Repo
Framework

Oscillator Circuit for Spike Neural Network with Sigmoid Like Activation Function and Firing Rate Coding


Title	Oscillator Circuit for Spike Neural Network with Sigmoid Like Activation Function and Firing Rate Coding
Authors	Andrei Velichko, Petr Boriskov
Abstract	The study presents an oscillator circuit for a spike neural network with the possibility of firing rate coding and sigmoid-like activation function. The circuit contains a switching element with an S-shaped current-voltage characteristic and two capacitors; one of the capacitors is shunted by a control resistor. The circuit is characterised by a strong dependence of the frequency of relaxation oscillations on the magnitude of the control resistor. The dependence has a sigmoid-like form and we present an analytical method for dependence calculation. Finally, we describe the concept of the spike neural network architecture with firing rate coding based on the presented circuit for creating neuromorphic devices and artificial intelligence.
Tasks
Published	2019-11-23
URL	https://arxiv.org/abs/1911.10351v1
PDF	https://arxiv.org/pdf/1911.10351v1.pdf
PWC	https://paperswithcode.com/paper/oscillator-circuit-for-spike-neural-network
Repo
Framework

Generative Image Translation for Data Augmentation of Bone Lesion Pathology


Title	Generative Image Translation for Data Augmentation of Bone Lesion Pathology
Authors	Anant Gupta, Srivas Venkatesh, Sumit Chopra, Christian Ledig
Abstract	Insufficient training data and severe class imbalance are often limiting factors when developing machine learning models for the classification of rare diseases. In this work, we address the problem of classifying bone lesions from X-ray images by increasing the small number of positive samples in the training set. We propose a generative data augmentation approach based on a cycle-consistent generative adversarial network that synthesizes bone lesions on images without pathology. We pose the generative task as an image-patch translation problem that we optimize specifically for distinct bones (humerus, tibia, femur). In experimental results, we confirm that the described method mitigates the class imbalance problem in the binary classification task of bone lesion detection. We show that the augmented training sets enable the training of superior classifiers achieving better performance on a held-out test set. Additionally, we demonstrate the feasibility of transfer learning and apply a generative model that was trained on one body part to another.
Tasks	Data Augmentation, Transfer Learning
Published	2019-02-06
URL	http://arxiv.org/abs/1902.02248v1
PDF	http://arxiv.org/pdf/1902.02248v1.pdf
PWC	https://paperswithcode.com/paper/generative-image-translation-for-data
Repo
Framework

KLUCB Approach to Copeland Bandits


Title	KLUCB Approach to Copeland Bandits
Authors	Nischal Agrawal, Prasanna Chaporkar
Abstract	Multi-armed bandit(MAB) problem is a reinforcement learning framework where an agent tries to maximise her profit by proper selection of actions through absolute feedback for each action. The dueling bandits problem is a variation of MAB problem in which an agent chooses a pair of actions and receives relative feedback for the chosen action pair. The dueling bandits problem is well suited for modelling a setting in which it is not possible to provide quantitative feedback for each action, but qualitative feedback for each action is preferred as in the case of human feedback. The dueling bandits have been successfully applied in applications such as online rank elicitation, information retrieval, search engine improvement and clinical online recommendation. We propose a new method called Sup-KLUCB for K-armed dueling bandit problem specifically Copeland bandit problem by converting it into a standard MAB problem. Instead of using MAB algorithm independently for each action in a pair as in Sparring and in Self-Sparring algorithms, we combine a pair of action and use it as one action. Previous UCB algorithms such as Relative Upper Confidence Bound(RUCB) can be applied only in case of Condorcet dueling bandits, whereas this algorithm applies to general Copeland dueling bandits, including Condorcet dueling bandits as a special case. Our empirical results outperform state of the art Double Thompson Sampling(DTS) in case of Copeland dueling bandits.
Tasks	Information Retrieval
Published	2019-02-07
URL	http://arxiv.org/abs/1902.02778v1
PDF	http://arxiv.org/pdf/1902.02778v1.pdf
PWC	https://paperswithcode.com/paper/klucb-approach-to-copeland-bandits
Repo
Framework