Paper Group ANR 583
Language coverage and generalization in RNN-based continuous sentence embeddings for interacting agents. Understanding Undesirable Word Embedding Associations. Learning Features with Differentiable Closed-Form Solver for Tracking. Transfer Learning for Causal Sentence Detection. Graph-based data clustering via multiscale community detection. Federa …
Language coverage and generalization in RNN-based continuous sentence embeddings for interacting agents
Title | Language coverage and generalization in RNN-based continuous sentence embeddings for interacting agents |
Authors | Luca Celotti, Simon Brodeur, Jean Rouat |
Abstract | Continuous sentence embeddings using recurrent neural networks (RNNs), where variable-length sentences are encoded into fixed-dimensional vectors, are often the main building blocks of architectures applied to language tasks such as dialogue generation. While it is known that those embeddings are able to learn some structures of language (e.g. grammar) in a purely data-driven manner, there is very little work on the objective evaluation of their ability to cover the whole language space and to generalize to sentences outside the language bias of the training data. Using a manually designed context-free grammar (CFG) to generate a large-scale dataset of sentences related to the content of realistic 3D indoor scenes, we evaluate the language coverage and generalization abilities of the most common continuous sentence embeddings based on RNNs. We also propose a new embedding method based on arithmetic coding, AriEL, that is not data-driven and that efficiently encodes in continuous space any sentence from the CFG. We find that RNN-based embeddings underfit the training data and cover only a small subset of the language defined by the CFG. They also fail to learn the underlying CFG and generalize to unbiased sentences from that same CFG. We found that AriEL provides an insightful baseline. |
Tasks | Dialogue Generation, Sentence Embeddings |
Published | 2019-11-05 |
URL | https://arxiv.org/abs/1911.02002v1 |
https://arxiv.org/pdf/1911.02002v1.pdf | |
PWC | https://paperswithcode.com/paper/language-coverage-and-generalization-in-rnn |
Repo | |
Framework | |
Understanding Undesirable Word Embedding Associations
Title | Understanding Undesirable Word Embedding Associations |
Authors | Kawin Ethayarajh, David Duvenaud, Graeme Hirst |
Abstract | Word embeddings are often criticized for capturing undesirable word associations such as gender stereotypes. However, methods for measuring and removing such biases remain poorly understood. We show that for any embedding model that implicitly does matrix factorization, debiasing vectors post hoc using subspace projection (Bolukbasi et al., 2016) is, under certain conditions, equivalent to training on an unbiased corpus. We also prove that WEAT, the most common association test for word embeddings, systematically overestimates bias. Given that the subspace projection method is provably effective, we use it to derive a new measure of association called the $\textit{relational inner product association}$ (RIPA). Experiments with RIPA reveal that, on average, skipgram with negative sampling (SGNS) does not make most words any more gendered than they are in the training corpus. However, for gender-stereotyped words, SGNS actually amplifies the gender association in the corpus. |
Tasks | Word Embeddings |
Published | 2019-08-18 |
URL | https://arxiv.org/abs/1908.06361v1 |
https://arxiv.org/pdf/1908.06361v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-undesirable-word-embedding-1 |
Repo | |
Framework | |
Learning Features with Differentiable Closed-Form Solver for Tracking
Title | Learning Features with Differentiable Closed-Form Solver for Tracking |
Authors | Linyu Zheng, Ming Tang, JinqiaoWang, Hanqing Lu |
Abstract | We present a novel and easy-to-implement training framework for visual tracking. Our approach mainly focuses on learning feature embeddings in an end-to-end way, which can generalize well to the trackers based on online discriminatively trained ridge regression model. This goal is efficiently achieved by taking advantage of the following two important theories. 1) Ridge regression problem has closed-form solution and is implicit differentiation under the optimality condition. Therefore, its solver can be embedded as a layer with efficient forward and backward processes in training deep convolutional neural networks. 2) Woodbury identity can be utilized to ensure efficient solution of ridge regression problem when the high-dimensional feature embeddings are employed. Moreover, in order to address the extreme foreground-background class imbalance during training, we modify the origin shrinkage loss and then employ it as the loss function for efficient and effective training. It is worth mentioning that the above core parts of our proposed training framework are easy to be implemented with several lines of code under the current popular deep learning frameworks, thus our approach is easy to be followed. Extensive experiments on six public benchmarks, OTB2015, NFS, TrackingNet, GOT10k, VOT2018, and VOT2019, show that the proposed tracker achieves state-of-the-art performance, while running at over 30 FPS. Code will be made available. |
Tasks | Visual Tracking |
Published | 2019-06-25 |
URL | https://arxiv.org/abs/1906.10414v1 |
https://arxiv.org/pdf/1906.10414v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-features-with-differentiable-closed |
Repo | |
Framework | |
Transfer Learning for Causal Sentence Detection
Title | Transfer Learning for Causal Sentence Detection |
Authors | Manolis Kyriakakis, Ion Androutsopoulos, Joan Ginés i Ametllé, Artur Saudabayev |
Abstract | We consider the task of detecting sentences that express causality, as a step towards mining causal relations from texts. To bypass the scarcity of causal instances in relation extraction datasets, we exploit transfer learning, namely ELMO and BERT, using a bidirectional GRU with self-attention (BIGRUATT) as a baseline. We experiment with both generic public relation extraction datasets and a new biomedical causal sentence detection dataset, a subset of which we make publicly available. We find that transfer learning helps only in very small datasets. With larger datasets, BIGRUATT reaches a performance plateau, then larger datasets and transfer learning do not help. |
Tasks | Relation Extraction, Transfer Learning |
Published | 2019-06-18 |
URL | https://arxiv.org/abs/1906.07544v2 |
https://arxiv.org/pdf/1906.07544v2.pdf | |
PWC | https://paperswithcode.com/paper/transfer-learning-for-causal-sentence |
Repo | |
Framework | |
Graph-based data clustering via multiscale community detection
Title | Graph-based data clustering via multiscale community detection |
Authors | Zijing Liu, Mauricio Barahona |
Abstract | We present a graph-theoretical approach to data clustering, which combines the creation of a graph from the data with Markov Stability, a multiscale community detection framework. We show how the multiscale capabilities of the method allow the estimation of the number of clusters, as well as alleviating the sensitivity to the parameters in graph construction. We use both synthetic and benchmark real datasets to compare and evaluate several graph construction methods and clustering algorithms, and show that multiscale graph-based clustering achieves improved performance compared to popular clustering methods without the need to set externally the number of clusters. |
Tasks | Community Detection, graph construction |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.04491v2 |
https://arxiv.org/pdf/1909.04491v2.pdf | |
PWC | https://paperswithcode.com/paper/graph-based-data-clustering-via-multiscale |
Repo | |
Framework | |
Federated Learning with Differential Privacy: Algorithms and Performance Analysis
Title | Federated Learning with Differential Privacy: Algorithms and Performance Analysis |
Authors | Kang Wei, Jun Li, Ming Ding, Chuan Ma, Howard H. Yang, Farokhi Farhad, Shi Jin, Tony Q. S. Quek, H. Vincent Poor |
Abstract | In this paper, to effectively prevent information leakage, we propose a novel framework based on the concept of differential privacy (DP), in which artificial noises are added to the parameters at the clients side before aggregating, namely, noising before model aggregation FL (NbAFL). First, we prove that the NbAFL can satisfy DP under distinct protection levels by properly adapting different variances of artificial noises. Then we develop a theoretical convergence bound of the loss function of the trained FL model in the NbAFL. Specifically, the theoretical bound reveals the following three key properties: 1) There is a tradeoff between the convergence performance and privacy protection levels, i.e., a better convergence performance leads to a lower protection level; 2) Given a fixed privacy protection level, increasing the number $N$ of overall clients participating in FL can improve the convergence performance; 3) There is an optimal number of maximum aggregation times (communication rounds) in terms of convergence performance for a given protection level. Furthermore, we propose a $K$-random scheduling strategy, where $K$ ($1<K<N$) clients are randomly selected from the $N$ overall clients to participate in each aggregation. We also develop the corresponding convergence bound of the loss function in this case and the $K$-random scheduling strategy can also retain the above three properties. Moreover, we find that there is an optimal $K$ that achieves the best convergence performance at a fixed privacy level. Evaluations demonstrate that our theoretical results are consistent with simulations, thereby facilitating the designs on various privacy-preserving FL algorithms with different tradeoff requirements on convergence performance and privacy levels. |
Tasks | |
Published | 2019-11-01 |
URL | https://arxiv.org/abs/1911.00222v2 |
https://arxiv.org/pdf/1911.00222v2.pdf | |
PWC | https://paperswithcode.com/paper/performance-analysis-on-federated-learning |
Repo | |
Framework | |
Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes
Title | Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes |
Authors | Jie Cao, Michael Tanana, Zac E. Imel, Eric Poitras, David C. Atkins, Vivek Srikumar |
Abstract | Automatically analyzing dialogue can help understand and guide behavior in domains such as counseling, where interactions are largely mediated by conversation. In this paper, we study modeling behavioral codes used to asses a psychotherapy treatment style called Motivational Interviewing (MI), which is effective for addressing substance abuse and related problems. Specifically, we address the problem of providing real-time guidance to therapists with a dialogue observer that (1) categorizes therapist and client MI behavioral codes and, (2) forecasts codes for upcoming utterances to help guide the conversation and potentially alert the therapist. For both tasks, we define neural network models that build upon recent successes in dialogue modeling. Our experiments demonstrate that our models can outperform several baselines for both tasks. We also report the results of a careful analysis that reveals the impact of the various network design tradeoffs for modeling therapy dialogue. |
Tasks | |
Published | 2019-06-30 |
URL | https://arxiv.org/abs/1907.00326v1 |
https://arxiv.org/pdf/1907.00326v1.pdf | |
PWC | https://paperswithcode.com/paper/observing-dialogue-in-therapy-categorizing |
Repo | |
Framework | |
Analyzing CART
Title | Analyzing CART |
Authors | Jason M. Klusowski |
Abstract | Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to a split point that maximizes the reduction in sum of squares error (the impurity) along a particular variable. This paper aims to study the statistical properties of regression trees constructed with CART. In doing so, we find that the training error is governed by Pearson’s correlation between the optimal decision stump and response data in each node, which we bound by solving a quadratic program. We leverage this to show that CART with cost-complexity pruning achieves a good bias-variance tradeoff when the depth scales with the logarithm of the sample size. Data dependent quantities, which adapt to the local dimensionality and structure of the regression surface, are seen to govern the rates of convergence of the prediction error. |
Tasks | |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.10086v6 |
https://arxiv.org/pdf/1906.10086v6.pdf | |
PWC | https://paperswithcode.com/paper/best-split-nodes-for-regression-trees |
Repo | |
Framework | |
The Many-to-Many Mapping Between Concordance Correlation Coefficient and Mean Square Error
Title | The Many-to-Many Mapping Between Concordance Correlation Coefficient and Mean Square Error |
Authors | Vedhas Pandit, Björn Schuller |
Abstract | While the mean square error (MSE) continues to retain its place as one of the most popular loss functions today, the concordance correlation coefficient (CCC) is one of the most widely used reproducibility indices and performance measures, introduced by Lin in 1989. Surprisingly enough, we are yet to witness a formally established relationship between these two popular utility functions, despite their ubiquitous and ever-growing simultaneous usage in much of the correlation research, e.g. interrater agreement, multivariate predictions and assay validation. While minimisation of $L_p$ norm of the errors or of its positive powers (e.g. MSE) is effectively aimed at CCC maximisation, we establish in this paper the sheer ineffectiveness of this popular strategy, with underlying concrete reasons. To this end, for the very first time, we derive and present the formulation for many-to-many mapping existing between the MSE and the CCC. As a consequence, we propose the effective loss function to be $\ \frac{MSE(x,y)}{cov(x,y)}\ $. We also establish conditions for CCC optimisation when given a fixed MSE; and then as a logical next step, when given a fixed set of error coefficients. We present a few interesting mathematical paradoxes (albeit apparent) we discovered through this CCC optimisation endeavour. This newly discovered mapping does not only uncover a counter-intuitive revelation that ‘$MSE_1$ < $MSE_2$ may \emph{not} necessarily translate to $CCC_1$ > $CCC_2$’, but it also provides us with the precise range for the possible CCC values, given MSE. Thereby, the study also inspires and anticipates to pioneer the growing use of CCC-inspired loss functions such as $\ \frac{MSE(x,y)}{cov(x,y)}\ $ replacing the traditional $L_p$ error loss function usage for multivariate regressions in general. |
Tasks | Sentiment Analysis, Time Series |
Published | 2019-02-14 |
URL | https://arxiv.org/abs/1902.05180v3 |
https://arxiv.org/pdf/1902.05180v3.pdf | |
PWC | https://paperswithcode.com/paper/on-many-to-many-mapping-between-concordance |
Repo | |
Framework | |
Fairness Warnings and Fair-MAML: Learning Fairly with Minimal Data
Title | Fairness Warnings and Fair-MAML: Learning Fairly with Minimal Data |
Authors | Dylan Slack, Sorelle Friedler, Emile Givental |
Abstract | Motivated by concerns surrounding the fairness effects of sharing and transferring fair machine learning tools, we propose two algorithms: Fairness Warnings and Fair-MAML. The first is a model-agnostic algorithm that provides interpretable boundary conditions for when a fairly trained model may not behave fairly on similar but slightly different tasks within a given domain. The second is a fair meta-learning approach to train models that can be quickly fine-tuned to specific tasks from only a few number of sample instances while balancing fairness and accuracy. We demonstrate experimentally the individual utility of each model using relevant baselines and provide the first experiment to our knowledge of K-shot fairness, i.e. training a fair model on a new task with only K data points. Then, we illustrate the usefulness of both algorithms as a combined method for training models from a few data points on new tasks while using Fairness Warnings as interpretable boundary conditions under which the newly trained model may not be fair. |
Tasks | Meta-Learning |
Published | 2019-08-24 |
URL | https://arxiv.org/abs/1908.09092v2 |
https://arxiv.org/pdf/1908.09092v2.pdf | |
PWC | https://paperswithcode.com/paper/fairness-warnings-and-fair-maml-learning |
Repo | |
Framework | |
Explainable AI: A Neurally-Inspired Decision Stack Framework
Title | Explainable AI: A Neurally-Inspired Decision Stack Framework |
Authors | J. L. Olds, M. S. Khan, M. Nayebpour, N. Koizumi |
Abstract | European Law now requires AI to be explainable in the context of adverse decisions affecting European Union (EU) citizens. At the same time, it is expected that there will be increasing instances of AI failure as it operates on imperfect data. This paper puts forward a neurally-inspired framework called decision stacks that can provide for a way forward in research aimed at developing explainable AI. Leveraging findings from memory systems in biological brains, the decision stack framework operationalizes the definition of explainability and then proposes a test that can potentially reveal how a given AI decision came to its conclusion. |
Tasks | |
Published | 2019-08-27 |
URL | https://arxiv.org/abs/1908.10300v1 |
https://arxiv.org/pdf/1908.10300v1.pdf | |
PWC | https://paperswithcode.com/paper/explainable-ai-a-neurally-inspired-decision |
Repo | |
Framework | |
Towards Human Body-Part Learning for Model-Free Gait Recognition
Title | Towards Human Body-Part Learning for Model-Free Gait Recognition |
Authors | Imad Rida |
Abstract | Gait based biometric aims to discriminate among people by the way or manner they walk. It represents a biometric at distance which has many advantages over other biometric modalities. State-of-the-art methods require a limited cooperation from the individuals. Consequently, contrary to other modalities, gait is a non-invasive approach. As a behavioral analysis, gait is difficult to circumvent. Moreover, gait can be performed without the subject being aware of it. Consequently, it is more difficult to try to tamper one own biometric signature. In this paper we review different features and approaches used in gait recognition. A novel method able to learn the discriminative human body-parts to improve the recognition accuracy will be introduced. Extensive experiments will be performed on CASIA gait benchmark database and results will be compared to state-of-the-art methods. |
Tasks | Gait Recognition |
Published | 2019-04-02 |
URL | http://arxiv.org/abs/1904.01620v1 |
http://arxiv.org/pdf/1904.01620v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-human-body-part-learning-for-model |
Repo | |
Framework | |
Reserve Pricing in Repeated Second-Price Auctions with Strategic Bidders
Title | Reserve Pricing in Repeated Second-Price Auctions with Strategic Bidders |
Authors | Alexey Drutsa |
Abstract | We study revenue optimization learning algorithms for repeated second-price auctions with reserve where a seller interacts with multiple strategic bidders each of which holds a fixed private valuation for a good and seeks to maximize his expected future cumulative discounted surplus. We propose a novel algorithm that has strategic regret upper bound of $O(\log\log T)$ for worst-case valuations. This pricing is based on our novel transformation that upgrades an algorithm designed for the setup with a single buyer to the multi-buyer case. We provide theoretical guarantees on the ability of a transformed algorithm to learn the valuation of a strategic buyer, which has uncertainty about the future due to the presence of rivals. |
Tasks | |
Published | 2019-06-21 |
URL | https://arxiv.org/abs/1906.09331v1 |
https://arxiv.org/pdf/1906.09331v1.pdf | |
PWC | https://paperswithcode.com/paper/reserve-pricing-in-repeated-second-price |
Repo | |
Framework | |
A Comprehensive Review On Various State Of Art Techniques For Eye Blink Detection
Title | A Comprehensive Review On Various State Of Art Techniques For Eye Blink Detection |
Authors | Sannidhan MS, Sunil Kumar Aithal, Abhir Bhandary |
Abstract | Computer Vision is considered to be one of the most important areas in research and has focused on developing many applications that has proved to be useful for both research and societal benefits. Today we have been witnessing many of the road mishaps happening just because of the lack of concentration while driving.As a part of avoiding this kind of disaster happening in day to day life there are many technologies focusing on keeping track of the vehicle drivers concentration.One such technology uses the method of eye blink detection to find out the concentration level of the driver.With the advent of many high end camera devices with cost effectiveness factor today it has become more efficient and cheaper to use eye blink detection for keeping track of the concentration level of the driver.Hence this paper presents an exhaustive review on the implementations of various eye blink detection algorithms.The detection system has also extended its application in various other fields like drowsiness detection and fatigue detection and expression detection. |
Tasks | |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1912.05017v1 |
https://arxiv.org/pdf/1912.05017v1.pdf | |
PWC | https://paperswithcode.com/paper/a-comprehensive-review-on-various-state-of |
Repo | |
Framework | |
Generative predecessor models for sample-efficient imitation learning
Title | Generative predecessor models for sample-efficient imitation learning |
Authors | Yannick Schroecker, Mel Vecerik, Jonathan Scholz |
Abstract | We propose Generative Predecessor Models for Imitation Learning (GPRIL), a novel imitation learning algorithm that matches the state-action distribution to the distribution observed in expert demonstrations, using generative models to reason probabilistically about alternative histories of demonstrated states. We show that this approach allows an agent to learn robust policies using only a small number of expert demonstrations and self-supervised interactions with the environment. We derive this approach from first principles and compare it empirically to a state-of-the-art imitation learning method, showing that it outperforms or matches its performance on two simulated robot manipulation tasks and demonstrate significantly higher sample efficiency by applying the algorithm on a real robot. |
Tasks | Imitation Learning |
Published | 2019-04-01 |
URL | http://arxiv.org/abs/1904.01139v1 |
http://arxiv.org/pdf/1904.01139v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-predecessor-models-for-sample-1 |
Repo | |
Framework | |