Paper Group ANR 416
On the Computational Power of RNNs. Interpretable Almost-Matching-Exactly With Instrumental Variables. Learning Deterministic Weighted Automata with Queries and Counterexamples. Unbiased estimators for random design regression. Modeling Point Clouds with Self-Attention and Gumbel Subset Sampling. Benchmarking the Performance and Power of AI Acceler …
On the Computational Power of RNNs
Title | On the Computational Power of RNNs |
Authors | Samuel A. Korsky, Robert C. Berwick |
Abstract | Recent neural network architectures such as the basic recurrent neural network (RNN) and Gated Recurrent Unit (GRU) have gained prominence as end-to-end learning architectures for natural language processing tasks. But what is the computational power of such systems? We prove that finite precision RNNs with one hidden layer and ReLU activation and finite precision GRUs are exactly as computationally powerful as deterministic finite automata. Allowing arbitrary precision, we prove that RNNs with one hidden layer and ReLU activation are at least as computationally powerful as pushdown automata. If we also allow infinite precision, infinite edge weights, and nonlinear output activation functions, we prove that GRUs are at least as computationally powerful as pushdown automata. All results are shown constructively. |
Tasks | |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06349v2 |
https://arxiv.org/pdf/1906.06349v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-computational-power-of-rnns |
Repo | |
Framework | |
Interpretable Almost-Matching-Exactly With Instrumental Variables
Title | Interpretable Almost-Matching-Exactly With Instrumental Variables |
Authors | M. Usaid Awan, Yameng Liu, Marco Morucci, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky |
Abstract | Uncertainty in the estimation of the causal effect in observational studies is often due to unmeasured confounding, i.e., the presence of unobserved covariates linking treatments and outcomes. Instrumental Variables (IV) are commonly used to reduce the effects of unmeasured confounding. Existing methods for IV estimation either require strong parametric assumptions, use arbitrary distance metrics, or do not scale well to large datasets. We propose a matching framework for IV in the presence of observed categorical confounders that addresses these weaknesses. Our method first matches units exactly, and then consecutively drops variables to approximately match the remaining units on as many variables as possible. We show that our algorithm constructs better matches than other existing methods on simulated datasets, and we produce interesting results in an application to political canvassing. |
Tasks | |
Published | 2019-06-27 |
URL | https://arxiv.org/abs/1906.11658v2 |
https://arxiv.org/pdf/1906.11658v2.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-almost-matching-exactly-with |
Repo | |
Framework | |
Learning Deterministic Weighted Automata with Queries and Counterexamples
Title | Learning Deterministic Weighted Automata with Queries and Counterexamples |
Authors | Gail Weiss, Yoav Goldberg, Eran Yahav |
Abstract | We present an algorithm for extraction of a probabilistic deterministic finite automaton (PDFA) from a given black-box language model, such as a recurrent neural network (RNN). The algorithm is a variant of the exact-learning algorithm L*, adapted to a probabilistic setting with noise. The key insight is the use of conditional probabilities for observations, and the introduction of a local tolerance when comparing them. When applied to RNNs, our algorithm often achieves better word error rate (WER) and normalised distributed cumulative gain (NDCG) than that achieved by spectral extraction of weighted finite automata (WFA) from the same networks. PDFAs are substantially more expressive than n-grams, and are guaranteed to be stochastic and deterministic - unlike spectrally extracted WFAs. |
Tasks | Language Modelling |
Published | 2019-10-30 |
URL | https://arxiv.org/abs/1910.13895v2 |
https://arxiv.org/pdf/1910.13895v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-deterministic-weighted-automata-with |
Repo | |
Framework | |
Unbiased estimators for random design regression
Title | Unbiased estimators for random design regression |
Authors | Michał Dereziński, Manfred K. Warmuth, Daniel Hsu |
Abstract | In linear regression we wish to estimate the optimum linear least squares predictor for a distribution over d-dimensional input points and real-valued responses, based on a small sample. Under standard random design analysis, where the sample is drawn i.i.d. from the input distribution, the least squares solution for that sample can be viewed as the natural estimator of the optimum. Unfortunately, this estimator almost always incurs an undesirable bias coming from the randomness of the input points. In this paper we show that it is possible to draw a non-i.i.d. sample of input points such that, regardless of the response model, the least squares solution is an unbiased estimator of the optimum. Moreover, this sample can be produced efficiently by augmenting a previously drawn i.i.d. sample with an additional set of d points drawn jointly from the input distribution rescaled by the squared volume spanned by the points. Motivated by this, we develop a theoretical framework for studying volume-rescaled sampling, and in the process prove a number of new matrix expectation identities. We use them to show that for any input distribution and $\epsilon>0$ there is a random design consisting of $O(d\log d+ d/\epsilon)$ points from which an unbiased estimator can be constructed whose square loss over the entire distribution is with high probability bounded by $1+\epsilon$ times the loss of the optimum. We provide efficient algorithms for generating such unbiased estimators in a number of practical settings and support our claims experimentally. |
Tasks | |
Published | 2019-07-08 |
URL | https://arxiv.org/abs/1907.03411v1 |
https://arxiv.org/pdf/1907.03411v1.pdf | |
PWC | https://paperswithcode.com/paper/unbiased-estimators-for-random-design |
Repo | |
Framework | |
Modeling Point Clouds with Self-Attention and Gumbel Subset Sampling
Title | Modeling Point Clouds with Self-Attention and Gumbel Subset Sampling |
Authors | Jiancheng Yang, Qiang Zhang, Bingbing Ni, Linguo Li, Jinxian Liu, Mengdie Zhou, Qi Tian |
Abstract | Geometric deep learning is increasingly important thanks to the popularity of 3D sensors. Inspired by the recent advances in NLP domain, the self-attention transformer is introduced to consume the point clouds. We develop Point Attention Transformers (PATs), using a parameter-efficient Group Shuffle Attention (GSA) to replace the costly Multi-Head Attention. We demonstrate its ability to process size-varying inputs, and prove its permutation equivariance. Besides, prior work uses heuristics dependence on the input data (e.g., Furthest Point Sampling) to hierarchically select subsets of input points. Thereby, we for the first time propose an end-to-end learnable and task-agnostic sampling operation, named Gumbel Subset Sampling (GSS), to select a representative subset of input points. Equipped with Gumbel-Softmax, it produces a “soft” continuous subset in training phase, and a “hard” discrete subset in test phase. By selecting representative subsets in a hierarchical fashion, the networks learn a stronger representation of the input sets with lower computation cost. Experiments on classification and segmentation benchmarks show the effectiveness and efficiency of our methods. Furthermore, we propose a novel application, to process event camera stream as point clouds, and achieve a state-of-the-art performance on DVS128 Gesture Dataset. |
Tasks | |
Published | 2019-04-06 |
URL | http://arxiv.org/abs/1904.03375v1 |
http://arxiv.org/pdf/1904.03375v1.pdf | |
PWC | https://paperswithcode.com/paper/modeling-point-clouds-with-self-attention-and |
Repo | |
Framework | |
Benchmarking the Performance and Power of AI Accelerators for AI Training
Title | Benchmarking the Performance and Power of AI Accelerators for AI Training |
Authors | Yuxin Wang, Qiang Wang, Shaohuai Shi, Xin He, Zhenheng Tang, Kaiyong Zhao, Xiaowen Chu |
Abstract | Deep learning has become widely used in complex AI applications. Yet, training a deep neural network (DNNs) model requires a huge amount of calculations, taking a long running time and consuming a lot of energy. Nowadays, many-core AI accelerators (e.g., GPUs and TPUs) are designed to improve the AI training performance. However, different processors from different vendors perform very differently in terms of performance and power consumption. To investigate the differences among several popular off-the-shelf processors (i.e., Intel CPU, NVIDIA GPU, AMD GPU and Google TPU) in training DNNs, we carry out a detailed benchmark study on the performance and power (when possible) of these processors when training a representative set of DNNs, including three classical convolutional neural networks (CNNs), a recurrent neural network (LSTM), Deep Speech 2, and Transformer. We try to understand the impact of hardware, vendor’s software library, and deep learning framework on the final training performance. Our evaluation results make two valuable directions for end-users and vendors. For the end-users, the evaluation results provide a guide for selecting a proper accelerator for training DNN models. For the vendors, some advantages and disadvantages revealed in our evaluation results could be useful for future architecture design and software library optimization. |
Tasks | |
Published | 2019-09-15 |
URL | https://arxiv.org/abs/1909.06842v5 |
https://arxiv.org/pdf/1909.06842v5.pdf | |
PWC | https://paperswithcode.com/paper/performance-and-power-evaluation-of-ai |
Repo | |
Framework | |
Embedded Agency
Title | Embedded Agency |
Authors | Abram Demski, Scott Garrabrant |
Abstract | Traditional models of rational action treat the agent as though it is cleanly separated from its environment, and can act on that environment from the outside. Such agents have a known functional relationship with their environment, can model their environment in every detail, and do not need to reason about themselves or their internal parts. We provide an informal survey of obstacles to formalizing good reasoning for agents embedded in their environment. Such agents must optimize an environment that is not of type ``function’'; they must rely on models that fit within the modeled environment; and they must reason about themselves as just another physical system, made of parts that can be modified and that can work at cross purposes. | |
Tasks | |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1902.09469v1 |
http://arxiv.org/pdf/1902.09469v1.pdf | |
PWC | https://paperswithcode.com/paper/embedded-agency |
Repo | |
Framework | |
Improving Tree-LSTM with Tree Attention
Title | Improving Tree-LSTM with Tree Attention |
Authors | Mahtab Ahmed, Muhammad Rifayat Samee, Robert E. Mercer |
Abstract | In Natural Language Processing (NLP), we often need to extract information from tree topology. Sentence structure can be represented via a dependency tree or a constituency tree structure. For this reason, a variant of LSTMs, named Tree-LSTM, was proposed to work on tree topology. In this paper, we design a generalized attention framework for both dependency and constituency trees by encoding variants of decomposable attention inside a Tree-LSTM cell. We evaluated our models on a semantic relatedness task and achieved notable results compared to Tree-LSTM based methods with no attention as well as other neural and non-neural methods and good results compared to Tree-LSTM based methods with attention. |
Tasks | |
Published | 2019-01-01 |
URL | http://arxiv.org/abs/1901.00066v1 |
http://arxiv.org/pdf/1901.00066v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-tree-lstm-with-tree-attention |
Repo | |
Framework | |
Optimizing Nondecomposable Data Dependent Regularizers via Lagrangian Reparameterization offers Significant Performance and Efficiency Gains
Title | Optimizing Nondecomposable Data Dependent Regularizers via Lagrangian Reparameterization offers Significant Performance and Efficiency Gains |
Authors | Sathya N. Ravi, Abhay Venkatesh, Glenn Moo Fung, Vikas Singh |
Abstract | Data dependent regularization is known to benefit a wide variety of problems in machine learning. Often, these regularizers cannot be easily decomposed into a sum over a finite number of terms, e.g., a sum over individual example-wise terms. The $F_\beta$ measure, Area under the ROC curve (AUCROC) and Precision at a fixed recall (P@R) are some prominent examples that are used in many applications. We find that for most medium to large sized datasets, scalability issues severely limit our ability in leveraging the benefits of such regularizers. Importantly, the key technical impediment despite some recent progress is that, such objectives remain difficult to optimize via backpropapagation procedures. While an efficient general-purpose strategy for this problem still remains elusive, in this paper, we show that for many data-dependent nondecomposable regularizers that are relevant in applications, sizable gains in efficiency are possible with minimal code-level changes; in other words, no specialized tools or numerical schemes are needed. Our procedure involves a reparameterization followed by a partial dualization – this leads to a formulation that has provably cheap projection operators. We present a detailed analysis of runtime and convergence properties of our algorithm. On the experimental side, we show that a direct use of our scheme significantly improves the state of the art IOU measures reported for MSCOCO Stuff segmentation dataset. |
Tasks | |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.12398v1 |
https://arxiv.org/pdf/1909.12398v1.pdf | |
PWC | https://paperswithcode.com/paper/optimizing-nondecomposable-data-dependent |
Repo | |
Framework | |
Adversarial Networks and Autoencoders: The Primal-Dual Relationship and Generalization Bounds
Title | Adversarial Networks and Autoencoders: The Primal-Dual Relationship and Generalization Bounds |
Authors | Hisham Husain, Richard Nock, Robert C. Williamson |
Abstract | Since the introduction of Generative Adversarial Networks (GANs) and Variational Autoencoders (VAE), the literature on generative modelling has witnessed an overwhelming resurgence. The impressive, yet elusive empirical performance of GANs has lead to the rise of many GAN-VAE hybrids, with the hopes of GAN level performance and additional benefits of VAE, such as an encoder for feature reduction, which is not offered by GANs. Recently, the Wasserstein Autoencoder (WAE) was proposed, achieving performance similar to that of GANs, yet it is still unclear whether the two are fundamentally different or can be further improved into a unified model. In this work, we study the $f$-GAN and WAE models and make two main discoveries. First, we find that the $f$-GAN and WAE objectives partake in a primal-dual relationship and are equivalent under some assumptions, which then allows us to explicate the success of WAE. Second, the equivalence result allows us to, for the first time, prove generalization bounds for Autoencoder models, which is a pertinent problem when it comes to theoretical analyses of generative models. Furthermore, we show that the WAE objective is related to other statistical quantities such as the $f$-divergence and in particular, upper bounded by the Wasserstein distance, which then allows us to tap into existing efficient (regularized) optimal transport solvers. Our findings thus present the first primal-dual relationship between GANs and Autoencoder models, comment on generalization abilities and make a step towards unifying these models. |
Tasks | |
Published | 2019-02-03 |
URL | http://arxiv.org/abs/1902.00985v2 |
http://arxiv.org/pdf/1902.00985v2.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-networks-and-autoencoders-the |
Repo | |
Framework | |
Unsupervised Writer Adaptation for Synthetic-to-Real Handwritten Word Recognition
Title | Unsupervised Writer Adaptation for Synthetic-to-Real Handwritten Word Recognition |
Authors | Lei Kang, Marçal Rusiñol, Alicia Fornés, Pau Riba, Mauricio Villegas |
Abstract | Handwritten Text Recognition (HTR) is still a challenging problem because it must deal with two important difficulties: the variability among writing styles, and the scarcity of labelled data. To alleviate such problems, synthetic data generation and data augmentation are typically used to train HTR systems. However, training with such data produces encouraging but still inaccurate transcriptions in real words. In this paper, we propose an unsupervised writer adaptation approach that is able to automatically adjust a generic handwritten word recognizer, fully trained with synthetic fonts, towards a new incoming writer. We have experimentally validated our proposal using five different datasets, covering several challenges (i) the document source: modern and historic samples, which may involve paper degradation problems; (ii) different handwriting styles: single and multiple writer collections; and (iii) language, which involves different character combinations. Across these challenging collections, we show that our system is able to maintain its performance, thus, it provides a practical and generic approach to deal with new document collections without requiring any expensive and tedious manual annotation step. |
Tasks | Data Augmentation, Synthetic Data Generation |
Published | 2019-09-18 |
URL | https://arxiv.org/abs/1909.08473v1 |
https://arxiv.org/pdf/1909.08473v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-writer-adaptation-for-synthetic |
Repo | |
Framework | |
Context-Dependent Models for Predicting and Characterizing Facial Expressiveness
Title | Context-Dependent Models for Predicting and Characterizing Facial Expressiveness |
Authors | Victoria Lin, Jeffrey M. Girard, Louis-Philippe Morency |
Abstract | In recent years, extensive research has emerged in affective computing on topics like automatic emotion recognition and determining the signals that characterize individual emotions. Much less studied, however, is expressiveness, or the extent to which someone shows any feeling or emotion. Expressiveness is related to personality and mental health and plays a crucial role in social interaction. As such, the ability to automatically detect or predict expressiveness can facilitate significant advancements in areas ranging from psychiatric care to artificial social intelligence. Motivated by these potential applications, we present an extension of the BP4D+ dataset with human ratings of expressiveness and develop methods for (1) automatically predicting expressiveness from visual data and (2) defining relationships between interpretable visual signals and expressiveness. In addition, we study the emotional context in which expressiveness occurs and hypothesize that different sets of signals are indicative of expressiveness in different contexts (e.g., in response to surprise or in response to pain). Analysis of our statistical models confirms our hypothesis. Consequently, by looking at expressiveness separately in distinct emotional contexts, our predictive models show significant improvements over baselines and achieve comparable results to human performance in terms of correlation with the ground truth. |
Tasks | Emotion Recognition |
Published | 2019-12-10 |
URL | https://arxiv.org/abs/1912.04523v1 |
https://arxiv.org/pdf/1912.04523v1.pdf | |
PWC | https://paperswithcode.com/paper/context-dependent-models-for-predicting-and |
Repo | |
Framework | |
Deep causal representation learning for unsupervised domain adaptation
Title | Deep causal representation learning for unsupervised domain adaptation |
Authors | Raha Moraffah, Kai Shu, Adrienne Raglin, Huan Liu |
Abstract | Studies show that the representations learned by deep neural networks can be transferred to similar prediction tasks in other domains for which we do not have enough labeled data. However, as we transition to higher layers in the model, the representations become more task-specific and less generalizable. Recent research on deep domain adaptation proposed to mitigate this problem by forcing the deep model to learn more transferable feature representations across domains. This is achieved by incorporating domain adaptation methods into deep learning pipeline. The majority of existing models learn the transferable feature representations which are highly correlated with the outcome. However, correlations are not always transferable. In this paper, we propose a novel deep causal representation learning framework for unsupervised domain adaptation, in which we propose to learn domain-invariant causal representations of the input from the source domain. We simulate a virtual target domain using reweighted samples from the source domain and estimate the causal effect of features on the outcomes. The extensive comparative study demonstrates the strengths of the proposed model for unsupervised domain adaptation via causal representations. |
Tasks | Domain Adaptation, Representation Learning, Unsupervised Domain Adaptation |
Published | 2019-10-28 |
URL | https://arxiv.org/abs/1910.12417v1 |
https://arxiv.org/pdf/1910.12417v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-causal-representation-learning-for |
Repo | |
Framework | |
Towards automated mobile-phone-based plant pathology management
Title | Towards automated mobile-phone-based plant pathology management |
Authors | Nantheera Anantrasirichai, Sion Hannuna, Nishan Canagarajah |
Abstract | This paper presents a framework which uses computer vision algorithms to standardise images and analyse them for identifying crop diseases automatically. The tools are created to bridge the information gap between farmers, advisory call centres and agricultural experts using the images of diseased/infected crop captured by mobile-phones. These images are generally sensitive to a number of factors including camera type and lighting. We therefore propose a technique for standardising the colour of plant images within the context of the advisory system. Subsequently, to aid the advisory process, the disease recognition process is automated using image processing in conjunction with machine learning techniques. We describe our proposed leaf extraction, affected area segmentation and disease classification techniques. The proposed disease recognition system is tested using six mango diseases and the results show over 80% accuracy. The final output of our system is a list of possible diseases with relevant management advice. |
Tasks | |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.09239v2 |
https://arxiv.org/pdf/1912.09239v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-automated-mobile-phone-based-plant |
Repo | |
Framework | |
LTLf Synthesis with Fairness and Stability Assumptions
Title | LTLf Synthesis with Fairness and Stability Assumptions |
Authors | Shufang Zhu, Giuseppe De Giacomo, Geguang Pu, Moshe Vardi |
Abstract | In synthesis, assumptions are constraints on the environment that rule out certain environment behaviors. A key observation here is that even if we consider systems with LTLf goals on finite traces, environment assumptions need to be expressed over infinite traces, since accomplishing the agent goals may require an unbounded number of environment action. To solve synthesis with respect to finite-trace LTLf goals under infinite-trace assumptions, we could reduce the problem to LTL synthesis. Unfortunately, while synthesis in LTLf and in LTL have the same worst-case complexity (both 2EXPTIME-complete), the algorithms available for LTL synthesis are much more difficult in practice than those for LTLf synthesis. In this work we show that in interesting cases we can avoid such a detour to LTL synthesis and keep the simplicity of LTLf synthesis. Specifically, we develop a BDD-based fixpoint-based technique for handling basic forms of fairness and of stability assumptions. We show, empirically, that this technique performs much better than standard LTL synthesis. |
Tasks | |
Published | 2019-12-17 |
URL | https://arxiv.org/abs/1912.07804v1 |
https://arxiv.org/pdf/1912.07804v1.pdf | |
PWC | https://paperswithcode.com/paper/ltlf-synthesis-with-fairness-and-stability |
Repo | |
Framework | |