Paper Group ANR 490
Attending to Entities for Better Text Understanding. Quasi-Monte Carlo sampling for machine-learning partial differential equations. Comparing the Performance of the LSTM and HMM Language Models via Structural Similarity. Efficient Dialogue State Tracking by Selectively Overwriting Memory. Towards Generalizable Forgery Detection with Locality-aware …
Attending to Entities for Better Text Understanding
Title | Attending to Entities for Better Text Understanding |
Authors | Pengxiang Cheng, Katrin Erk |
Abstract | Recent progress in NLP witnessed the development of large-scale pre-trained language models (GPT, BERT, XLNet, etc.) based on Transformer (Vaswani et al. 2017), and in a range of end tasks, such models have achieved state-of-the-art results, approaching human performance. This demonstrates the power of the stacked self-attention architecture when paired with a sufficient number of layers and a large amount of pre-training data. However, on tasks that require complex and long-distance reasoning where surface-level cues are not enough, there is still a large gap between the pre-trained models and human performance. Strubell et al. (2018) recently showed that it is possible to inject knowledge of syntactic structure into a model through supervised self-attention. We conjecture that a similar injection of semantic knowledge, in particular, coreference information, into an existing model would improve performance on such complex problems. On the LAMBADA (Paperno et al. 2016) task, we show that a model trained from scratch with coreference as auxiliary supervision for self-attention outperforms the largest GPT-2 model, setting the new state-of-the-art, while only containing a tiny fraction of parameters compared to GPT-2. We also conduct a thorough analysis of different variants of model architectures and supervision configurations, suggesting future directions on applying similar techniques to other problems. |
Tasks | |
Published | 2019-11-11 |
URL | https://arxiv.org/abs/1911.04361v1 |
https://arxiv.org/pdf/1911.04361v1.pdf | |
PWC | https://paperswithcode.com/paper/attending-to-entities-for-better-text |
Repo | |
Framework | |
Quasi-Monte Carlo sampling for machine-learning partial differential equations
Title | Quasi-Monte Carlo sampling for machine-learning partial differential equations |
Authors | Jingrun Chen, Rui Du, Panchi Li, Liyao Lyu |
Abstract | Solving partial differential equations in high dimensions by deep neural network has brought significant attentions in recent years. In many scenarios, the loss function is defined as an integral over a high-dimensional domain. Monte-Carlo method, together with the deep neural network, is used to overcome the curse of dimensionality, while classical methods fail. Often, a deep neural network outperforms classical numerical methods in terms of both accuracy and efficiency. In this paper, we propose to use quasi-Monte Carlo sampling, instead of Monte-Carlo method to approximate the loss function. To demonstrate the idea, we conduct numerical experiments in the framework of deep Ritz method proposed by Weinan E and Bing Yu. For the same accuracy requirement, it is observed that quasi-Monte Carlo sampling reduces the size of training data set by more than two orders of magnitude compared to that of MC method. Under some assumptions, we prove that quasi-Monte Carlo sampling together with the deep neural network generates a convergent series with rate proportional to the approximation accuracy of quasi-Monte Carlo method for numerical integration. Numerically the fitted convergence rate is a bit smaller, but the proposed approach always outperforms Monte Carlo method. It is worth mentioning that the convergence analysis is generic whenever a loss function is approximated by the quasi-Monte Carlo method, although observations here are based on deep Ritz method. |
Tasks | |
Published | 2019-11-05 |
URL | https://arxiv.org/abs/1911.01612v1 |
https://arxiv.org/pdf/1911.01612v1.pdf | |
PWC | https://paperswithcode.com/paper/quasi-monte-carlo-sampling-for-machine |
Repo | |
Framework | |
Comparing the Performance of the LSTM and HMM Language Models via Structural Similarity
Title | Comparing the Performance of the LSTM and HMM Language Models via Structural Similarity |
Authors | Larkin Liu, Yu-Chung Lin, Joshua Reid |
Abstract | Language models based on deep neural networks and traditional stochastic modelling have become both highly functional and effective in recent times. In this work, a general survey into the two types of language modelling is conducted. We investigate the effectiveness of the Hidden Markov Model (HMM), and the Long Short-Term Memory Model (LSTM). We analyze the hidden state structures common to both models, and present an analysis on structural similarity of the hidden states, common to both HMM’s and LSTM’s. We compare the LSTM’s predictive accuracy and hidden state output with respect to the HMM for a varying number of hidden states. In this work, we justify that the less complex HMM can serve as an appropriate approximation of the LSTM model. |
Tasks | Language Modelling |
Published | 2019-07-09 |
URL | https://arxiv.org/abs/1907.04670v3 |
https://arxiv.org/pdf/1907.04670v3.pdf | |
PWC | https://paperswithcode.com/paper/improving-the-performance-of-the-lstm-and-hmm |
Repo | |
Framework | |
Efficient Dialogue State Tracking by Selectively Overwriting Memory
Title | Efficient Dialogue State Tracking by Selectively Overwriting Memory |
Authors | Sungdong Kim, Sohee Yang, Gyuwan Kim, Sang-Woo Lee |
Abstract | Recent works in dialogue state tracking (DST) focus on an open vocabulary-based setting to resolve scalability and generalization issues of the predefined ontology-based approaches. However, they are computationally inefficient in that they predict the dialogue state at every turn from scratch. In this paper, we consider dialogue state as an explicit fixed-sized memory, and propose a selectively overwriting mechanism for more efficient DST. This mechanism consists of two steps: (1) predicting state operation on each of the memory slots, and (2) overwriting the memory with new values, of which only a few are generated according to the predicted state operations. Moreover, reducing the burden of the decoder by decomposing DST into two sub-tasks and guiding the decoder to focus only one of the tasks enables a more effective training and improvement in the performance. As a result, our proposed SOM-DST (Selectively Overwriting Memory for Dialogue State Tracking) achieves state-of-the-art joint goal accuracy with 51.38% in MultiWOZ 2.0 and 52.57% in MultiWOZ 2.1 in an open vocabulary-based DST setting. In addition, a massive gap between the current accuracy and the accuracy when ground truth operations are given suggests that improving the performance of state operation prediction is a promising research direction of DST. |
Tasks | Dialogue State Tracking |
Published | 2019-11-10 |
URL | https://arxiv.org/abs/1911.03906v1 |
https://arxiv.org/pdf/1911.03906v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-dialogue-state-tracking-by |
Repo | |
Framework | |
Towards Generalizable Forgery Detection with Locality-aware AutoEncoder
Title | Towards Generalizable Forgery Detection with Locality-aware AutoEncoder |
Authors | Mengnan Du, Shiva Pentyala, Yuening Li, Xia Hu |
Abstract | With advancements of deep learning techniques, it is now possible to generate super-realistic fake images and videos. These manipulated forgeries could reach mass audience and result in adverse impacts on our society. Although lots of efforts have been devoted to detect forgeries, their performance drops significantly on previously unseen but related manipulations and the detection generalization capability remains a problem. To bridge this gap, in this paper we propose Locality-aware AutoEncoder (LAE), which combines fine-grained representation learning and enforcing locality in a unified framework. In the training process, we use pixel-wise mask to regularize local interpretation of LAE to enforce the model to learn intrinsic representation from the forgery region, instead of capturing artifacts in the training set and learning spurious correlations to perform detection. We further propose an active learning framework to select the challenging candidates for labeling, to reduce the annotation efforts to regularize interpretations. Experimental results indicate that LAE indeed could focus on the forgery regions to make decisions. The results further show that LAE achieves superior generalization performance compared to state-of-the-arts on forgeries generated by alternative manipulation methods. |
Tasks | Active Learning, Representation Learning |
Published | 2019-09-13 |
URL | https://arxiv.org/abs/1909.05999v1 |
https://arxiv.org/pdf/1909.05999v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-generalizable-forgery-detection-with |
Repo | |
Framework | |
On weighted uncertainty sampling in active learning
Title | On weighted uncertainty sampling in active learning |
Authors | Vinay Jethava |
Abstract | This note explores probabilistic sampling weighted by uncertainty in active learning. This method has been previously used and authors have tangentially remarked on its efficacy. The scheme has several benefits: (1) it is computationally cheap, (2) it can be implemented in a single-pass streaming fashion which is a benefit when deployed in real-world systems where different subsystems perform the suggestion scoring and extraction of user feedback, and (3) it is easily parameterizable. In this paper, we show on publicly available datasets that using probabilistic weighting is often beneficial and strikes a good compromise between exploration and representation especially when the starting set of labelled points is biased. |
Tasks | Active Learning |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.04928v1 |
https://arxiv.org/pdf/1909.04928v1.pdf | |
PWC | https://paperswithcode.com/paper/on-weighted-uncertainty-sampling-in-active |
Repo | |
Framework | |
TEINet: Towards an Efficient Architecture for Video Recognition
Title | TEINet: Towards an Efficient Architecture for Video Recognition |
Authors | Zhaoyang Liu, Donghao Luo, Yabiao Wang, Limin Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Tong Lu |
Abstract | Efficiency is an important issue in designing video architectures for action recognition. 3D CNNs have witnessed remarkable progress in action recognition from videos. However, compared with their 2D counterparts, 3D convolutions often introduce a large amount of parameters and cause high computational cost. To relieve this problem, we propose an efficient temporal module, termed as Temporal Enhancement-and-Interaction (TEI Module), which could be plugged into the existing 2D CNNs (denoted by TEINet). The TEI module presents a different paradigm to learn temporal features by decoupling the modeling of channel correlation and temporal interaction. First, it contains a Motion Enhanced Module (MEM) which is to enhance the motion-related features while suppress irrelevant information (e.g., background). Then, it introduces a Temporal Interaction Module (TIM) which supplements the temporal contextual information in a channel-wise manner. This two-stage modeling scheme is not only able to capture temporal structure flexibly and effectively, but also efficient for model inference. We conduct extensive experiments to verify the effectiveness of TEINet on several benchmarks (e.g., Something-Something V1&V2, Kinetics, UCF101 and HMDB51). Our proposed TEINet can achieve a good recognition accuracy on these datasets but still preserve a high efficiency. |
Tasks | Video Recognition |
Published | 2019-11-21 |
URL | https://arxiv.org/abs/1911.09435v1 |
https://arxiv.org/pdf/1911.09435v1.pdf | |
PWC | https://paperswithcode.com/paper/teinet-towards-an-efficient-architecture-for |
Repo | |
Framework | |
Adversarial Orthogonal Regression: Two non-Linear Regressions for Causal Inference
Title | Adversarial Orthogonal Regression: Two non-Linear Regressions for Causal Inference |
Authors | M. Reza Heydari, Saber Salehkaleybar, Kun Zhang |
Abstract | We propose two nonlinear regression methods, named Adversarial Orthogonal Regression (AdOR) for additive noise models and Adversarial Orthogonal Structural Equation Model (AdOSE) for the general case of structural equation models. Both methods try to make the residual of regression independent from regressors while putting no assumption on noise distribution. In both methods, two adversarial networks are trained simultaneously where a regression network outputs predictions and a loss network that estimates mutual information (in AdOR) and KL-divergence (in AdOSE). These methods can be formulated as a minimax two-player game; at equilibrium, AdOR finds a deterministic map between inputs and output and estimates mutual information between residual and inputs, while AdOSE estimates a conditional probability distribution of output given inputs. The proposed methods can be used as subroutines to address several learning problems in causality, such as causal direction determination (or more generally, causal structure learning) and causal model estimation. Synthetic and real-world experiments demonstrate that the proposed methods have a remarkable performance with respect to previous solutions. |
Tasks | Causal Inference |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04454v1 |
https://arxiv.org/pdf/1909.04454v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-orthogonal-regression-two-non |
Repo | |
Framework | |
Fast Mean Estimation with Sub-Gaussian Rates
Title | Fast Mean Estimation with Sub-Gaussian Rates |
Authors | Yeshwanth Cherapanamjeri, Nicolas Flammarion, Peter L. Bartlett |
Abstract | We propose an estimator for the mean of a random vector in $\mathbb{R}^d$ that can be computed in time $O(n^4+n^2d)$ for $n$ i.i.d.~samples and that has error bounds matching the sub-Gaussian case. The only assumptions we make about the data distribution are that it has finite mean and covariance; in particular, we make no assumptions about higher-order moments. Like the polynomial time estimator introduced by Hopkins, 2018, which is based on the sum-of-squares hierarchy, our estimator achieves optimal statistical efficiency in this challenging setting, but it has a significantly faster runtime and a simpler analysis. |
Tasks | |
Published | 2019-02-06 |
URL | http://arxiv.org/abs/1902.01998v1 |
http://arxiv.org/pdf/1902.01998v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-mean-estimation-with-sub-gaussian-rates |
Repo | |
Framework | |
A discriminative approach for finding and characterizing positivity violations using decision trees
Title | A discriminative approach for finding and characterizing positivity violations using decision trees |
Authors | Ehud Karavani, Peter Bak, Yishai Shimoni |
Abstract | The assumption of positivity in causal inference (also known as common support and co-variate overlap) is necessary to obtain valid causal estimates. Therefore, confirming it holds in a given dataset is an important first step of any causal analysis. Most common methods to date are insufficient for discovering non-positivity, as they do not scale for modern high-dimensional covariate spaces, or they cannot pinpoint the subpopulation violating positivity. To overcome these issues, we suggest to harness decision trees for detecting violations. By dividing the covariate space into mutually exclusive regions, each with maximized homogeneity of treatment groups, decision trees can be used to automatically detect subspaces violating positivity. By augmenting the method with an additional random forest model, we can quantify the robustness of the violation within each subspace. This solution is scalable and provides an interpretable characterization of the subspaces in which violations occur. We provide a visualization of the stratification rules that define each subpopulation, combined with the severity of positivity violation within it. We also provide an interactive version of the visualization that allows a deeper dive into the properties of each subspace. |
Tasks | Causal Inference |
Published | 2019-07-18 |
URL | https://arxiv.org/abs/1907.08127v1 |
https://arxiv.org/pdf/1907.08127v1.pdf | |
PWC | https://paperswithcode.com/paper/a-discriminative-approach-for-finding-and |
Repo | |
Framework | |
Quantifying Error in the Presence of Confounders for Causal Inference
Title | Quantifying Error in the Presence of Confounders for Causal Inference |
Authors | Rathin Desai, Amit Sharma |
Abstract | Estimating average causal effect (ACE) is useful whenever we want to know the effect of an intervention on a given outcome. In the absence of a randomized experiment, many methods such as stratification and inverse propensity weighting have been proposed to estimate ACE. However, it is hard to know which method is optimal for a given dataset or which hyperparameters to use for a chosen method. To this end, we provide a framework to characterize the loss of a causal inference method against the true ACE, by framing causal inference as a representation learning problem. We show that many popular methods, including back-door methods can be considered as weighting or representation learning algorithms, and provide general error bounds for their causal estimates. In addition, we consider the case when unobserved variables can confound the causal estimate and extend proposed bounds using principles of robust statistics, considering confounding as contamination under the Huber contamination model. These bounds are also estimable; as an example, we provide empirical bounds for the Inverse Propensity Weighting (IPW) estimator and show how the bounds can be used to optimize the threshold of clipping extreme propensity scores. Our work provides a new way to reason about competing estimators, and opens up the potential of deriving new methods by minimizing the proposed error bounds. |
Tasks | Causal Inference, Representation Learning |
Published | 2019-07-10 |
URL | https://arxiv.org/abs/1907.04805v1 |
https://arxiv.org/pdf/1907.04805v1.pdf | |
PWC | https://paperswithcode.com/paper/quantifying-error-in-the-presence-of |
Repo | |
Framework | |
Hierarchically Robust Representation Learning
Title | Hierarchically Robust Representation Learning |
Authors | Qi Qian, Juhua Hu, Hao Li |
Abstract | With the tremendous success of deep learning in visual tasks, the representations extracted from intermediate layers of learned models, that is, deep features, attract much attention of researchers. Previous empirical analysis shows that those features can contain appropriate semantic information. Therefore, with a model trained on a large-scale benchmark data set (e.g., ImageNet), the extracted features can work well on other tasks. In this work, we investigate this phenomenon and demonstrate that deep features can be suboptimal due to the fact that they are learned by minimizing the empirical risk. When the data distribution of the target task is different from that of the benchmark data set, the performance of deep features can degrade. Hence, we propose a hierarchically robust optimization method to learn more generic features. Considering the example-level and concept-level robustness simultaneously, we formulate the problem as a distributionally robust optimization problem with Wasserstein ambiguity set constraints, and an efficient algorithm with the conventional training pipeline is proposed. Experiments on benchmark data sets demonstrate the effectiveness of the robust deep representations. |
Tasks | Representation Learning |
Published | 2019-11-11 |
URL | https://arxiv.org/abs/1911.04047v2 |
https://arxiv.org/pdf/1911.04047v2.pdf | |
PWC | https://paperswithcode.com/paper/hierarchically-robust-representation-learning |
Repo | |
Framework | |
Oscillator Circuit for Spike Neural Network with Sigmoid Like Activation Function and Firing Rate Coding
Title | Oscillator Circuit for Spike Neural Network with Sigmoid Like Activation Function and Firing Rate Coding |
Authors | Andrei Velichko, Petr Boriskov |
Abstract | The study presents an oscillator circuit for a spike neural network with the possibility of firing rate coding and sigmoid-like activation function. The circuit contains a switching element with an S-shaped current-voltage characteristic and two capacitors; one of the capacitors is shunted by a control resistor. The circuit is characterised by a strong dependence of the frequency of relaxation oscillations on the magnitude of the control resistor. The dependence has a sigmoid-like form and we present an analytical method for dependence calculation. Finally, we describe the concept of the spike neural network architecture with firing rate coding based on the presented circuit for creating neuromorphic devices and artificial intelligence. |
Tasks | |
Published | 2019-11-23 |
URL | https://arxiv.org/abs/1911.10351v1 |
https://arxiv.org/pdf/1911.10351v1.pdf | |
PWC | https://paperswithcode.com/paper/oscillator-circuit-for-spike-neural-network |
Repo | |
Framework | |
Generative Image Translation for Data Augmentation of Bone Lesion Pathology
Title | Generative Image Translation for Data Augmentation of Bone Lesion Pathology |
Authors | Anant Gupta, Srivas Venkatesh, Sumit Chopra, Christian Ledig |
Abstract | Insufficient training data and severe class imbalance are often limiting factors when developing machine learning models for the classification of rare diseases. In this work, we address the problem of classifying bone lesions from X-ray images by increasing the small number of positive samples in the training set. We propose a generative data augmentation approach based on a cycle-consistent generative adversarial network that synthesizes bone lesions on images without pathology. We pose the generative task as an image-patch translation problem that we optimize specifically for distinct bones (humerus, tibia, femur). In experimental results, we confirm that the described method mitigates the class imbalance problem in the binary classification task of bone lesion detection. We show that the augmented training sets enable the training of superior classifiers achieving better performance on a held-out test set. Additionally, we demonstrate the feasibility of transfer learning and apply a generative model that was trained on one body part to another. |
Tasks | Data Augmentation, Transfer Learning |
Published | 2019-02-06 |
URL | http://arxiv.org/abs/1902.02248v1 |
http://arxiv.org/pdf/1902.02248v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-image-translation-for-data |
Repo | |
Framework | |
KLUCB Approach to Copeland Bandits
Title | KLUCB Approach to Copeland Bandits |
Authors | Nischal Agrawal, Prasanna Chaporkar |
Abstract | Multi-armed bandit(MAB) problem is a reinforcement learning framework where an agent tries to maximise her profit by proper selection of actions through absolute feedback for each action. The dueling bandits problem is a variation of MAB problem in which an agent chooses a pair of actions and receives relative feedback for the chosen action pair. The dueling bandits problem is well suited for modelling a setting in which it is not possible to provide quantitative feedback for each action, but qualitative feedback for each action is preferred as in the case of human feedback. The dueling bandits have been successfully applied in applications such as online rank elicitation, information retrieval, search engine improvement and clinical online recommendation. We propose a new method called Sup-KLUCB for K-armed dueling bandit problem specifically Copeland bandit problem by converting it into a standard MAB problem. Instead of using MAB algorithm independently for each action in a pair as in Sparring and in Self-Sparring algorithms, we combine a pair of action and use it as one action. Previous UCB algorithms such as Relative Upper Confidence Bound(RUCB) can be applied only in case of Condorcet dueling bandits, whereas this algorithm applies to general Copeland dueling bandits, including Condorcet dueling bandits as a special case. Our empirical results outperform state of the art Double Thompson Sampling(DTS) in case of Copeland dueling bandits. |
Tasks | Information Retrieval |
Published | 2019-02-07 |
URL | http://arxiv.org/abs/1902.02778v1 |
http://arxiv.org/pdf/1902.02778v1.pdf | |
PWC | https://paperswithcode.com/paper/klucb-approach-to-copeland-bandits |
Repo | |
Framework | |