Paper Group ANR 1619
A Surprisingly Robust Trick for Winograd Schema Challenge. Inducing Constituency Trees through Neural Machine Translation. Personalizing human activity recognition models using incremental learning. SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Stochastic Optimization. Importance of user inputs while using incremental lea …
A Surprisingly Robust Trick for Winograd Schema Challenge
Title | A Surprisingly Robust Trick for Winograd Schema Challenge |
Authors | Vid Kocijan, Ana-Maria Cretu, Oana-Maria Camburu, Yordan Yordanov, Thomas Lukasiewicz |
Abstract | The Winograd Schema Challenge (WSC) dataset WSC273 and its inference counterpart WNLI are popular benchmarks for natural language understanding and commonsense reasoning. In this paper, we show that the performance of three language models on WSC273 strongly improves when fine-tuned on a similar pronoun disambiguation problem dataset (denoted WSCR). We additionally generate a large unsupervised WSC-like dataset. By fine-tuning the BERT language model both on the introduced and on the WSCR dataset, we achieve overall accuracies of 72.5% and 74.7% on WSC273 and WNLI, improving the previous state-of-the-art solutions by 8.8% and 9.6%, respectively. Furthermore, our fine-tuned models are also consistently more robust on the “complex” subsets of WSC273, introduced by Trichelair et al. (2018). |
Tasks | Language Modelling |
Published | 2019-05-15 |
URL | https://arxiv.org/abs/1905.06290v2 |
https://arxiv.org/pdf/1905.06290v2.pdf | |
PWC | https://paperswithcode.com/paper/a-surprisingly-robust-trick-for-winograd |
Repo | |
Framework | |
Inducing Constituency Trees through Neural Machine Translation
Title | Inducing Constituency Trees through Neural Machine Translation |
Authors | Phu Mon Htut, Kyunghyun Cho, Samuel R. Bowman |
Abstract | Latent tree learning(LTL) methods learn to parse sentences using only indirect supervision from a downstream task. Recent advances in latent tree learning have made it possible to recover moderately high quality tree structures by training with language modeling or auto-encoding objectives. In this work, we explore the hypothesis that decoding in machine translation, as a conditional language modeling task, will produce better tree structures since it offers a similar training signal as language modeling, but with more semantic signal. We adapt two existing latent-tree language models–PRPN andON-LSTM–for use in translation. We find that they indeed recover trees that are better in F1 score than those seen in language modeling on WSJ test set, while maintaining strong translation quality. We observe that translation is a better objective than language modeling for inducing trees, marking the first success at latent tree learning using a machine translation objective. Additionally, our findings suggest that, although translation provides better signal for inducing trees than language modeling, translation models can perform well without exploiting the latent tree structure. |
Tasks | Language Modelling, Machine Translation |
Published | 2019-09-22 |
URL | https://arxiv.org/abs/1909.10056v1 |
https://arxiv.org/pdf/1909.10056v1.pdf | |
PWC | https://paperswithcode.com/paper/190910056 |
Repo | |
Framework | |
Personalizing human activity recognition models using incremental learning
Title | Personalizing human activity recognition models using incremental learning |
Authors | Pekka Siirtola, Heli Koskimäki, Juha Röning |
Abstract | In this study, the aim is to personalize inertial sensor data-based human activity recognition models using incremental learning. At first, the recognition is based on user-independent model. However, when personal streaming data becomes available, the incremental learning-based recognition model can be updated, and therefore personalized, based on the data without user-interruption. The used incremental learning algorithm is Learn++ which is an ensemble method that can use any classifier as a base classifier. In fact, study compares three different base classifiers: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) and classification and regression tree (CART). Experiments are based on publicly open data set and they show that already a small personal training data set can improve the classification accuracy. Improvement using LDA as base classifier is 4.6 percentage units, using QDA 2.0 percentage units, and 2.3 percentage units using CART. However, if the user-independent model used in the first phase of the recognition process is not accurate enough, personalization cannot improve recognition accuracy. |
Tasks | Activity Recognition, Human Activity Recognition |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12628v1 |
https://arxiv.org/pdf/1905.12628v1.pdf | |
PWC | https://paperswithcode.com/paper/personalizing-human-activity-recognition |
Repo | |
Framework | |
SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Stochastic Optimization
Title | SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Stochastic Optimization |
Authors | Navjot Singh, Deepesh Data, Jemin George, Suhas Diggavi |
Abstract | In this paper, we propose and analyze SPARQ-SGD, which is an event-triggered and compressed algorithm for decentralized training of large-scale machine learning models. Each node can locally compute a condition (event) which triggers a communication where quantized and sparsified local model parameters are sent. In SPARQ-SGD each node takes at least a fixed number ($H$) of local gradient steps and then checks if the model parameters have significantly changed compared to its last update; it communicates further compressed model parameters only when there is a significant change, as specified by a (design) criterion. We prove that the SPARQ-SGD converges as $O(\frac{1}{nT})$ and $O(\frac{1}{\sqrt{nT}})$ in the strongly-convex and non-convex settings, respectively, demonstrating that such aggressive compression, including event-triggered communication, model sparsification and quantization does not affect the overall convergence rate as compared to uncompressed decentralized training; thereby theoretically yielding communication efficiency for “free”. We evaluate SPARQ-SGD over real datasets to demonstrate significant amount of savings in communication over the state-of-the-art. |
Tasks | Quantization, Stochastic Optimization |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1910.14280v2 |
https://arxiv.org/pdf/1910.14280v2.pdf | |
PWC | https://paperswithcode.com/paper/sparq-sgd-event-triggered-and-compressed |
Repo | |
Framework | |
Importance of user inputs while using incremental learning to personalize human activity recognition models
Title | Importance of user inputs while using incremental learning to personalize human activity recognition models |
Authors | Pekka Siirtola, Heli Koskimäki, Juha Röning |
Abstract | In this study, importance of user inputs is studied in the context of personalizing human activity recognition models using incremental learning. Inertial sensor data from three body positions are used, and the classification is based on Learn++ ensemble method. Three different approaches to update models are compared: non-supervised, semi-supervised and supervised. Non-supervised approach relies fully on predicted labels, supervised fully on user labeled data, and the proposed method for semi-supervised learning, is a combination of these two. In fact, our experiments show that by relying on predicted labels with high confidence, and asking the user to label only uncertain observations (from 12% to 26% of the observations depending on the used base classifier), almost as low error rates can be achieved as by using supervised approach. In fact, the difference was less than 2%-units. Moreover, unlike non-supervised approach, semi-supervised approach does not suffer from drastic concept drift, and thus, the error rate of the non-supervised approach is over 5%-units higher than using semi-supervised approach. |
Tasks | Activity Recognition, Human Activity Recognition |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11775v1 |
https://arxiv.org/pdf/1905.11775v1.pdf | |
PWC | https://paperswithcode.com/paper/importance-of-user-inputs-while-using |
Repo | |
Framework | |
Integrating Motion into Vision Models for Better Visual Prediction
Title | Integrating Motion into Vision Models for Better Visual Prediction |
Authors | Michael Hazoglou, Todd Hylton |
Abstract | We demonstrate an improved vision system that learns a model of its environment using a self-supervised, predictive learning method. The system includes a pan-tilt camera, a foveated visual input, a saccading reflex to servo the foveated region to areas high prediction error, input frame transformation synced to the camera motion, and a recursive, hierachical machine learning technique based on the Predictive Vision Model. In earlier work, which did not integrate camera motion into the vision model, prediction was impaired and camera movement suffered from undesired feedback effects. Here we detail the integration of camera motion into the predictive learning system and show improved visual prediction and saccadic behavior. From these experiences, we speculate on the integration of additional sensory and motor systems into self-supervised, predictive learning models. |
Tasks | |
Published | 2019-12-03 |
URL | https://arxiv.org/abs/1912.01661v1 |
https://arxiv.org/pdf/1912.01661v1.pdf | |
PWC | https://paperswithcode.com/paper/integrating-motion-into-vision-models-for |
Repo | |
Framework | |
Screening Data Points in Empirical Risk Minimization via Ellipsoidal Regions and Safe Loss Function
Title | Screening Data Points in Empirical Risk Minimization via Ellipsoidal Regions and Safe Loss Function |
Authors | Grégoire Mialon, Alexandre d’Aspremont, Julien Mairal |
Abstract | We design simple screening tests to automatically discard data samples in empirical risk minimization without losing optimization guarantees. We derive loss functions that produce dual objectives with a sparse solution. We also show how to regularize convex losses to ensure such a dual sparsity-inducing property, and propose a general method to design screening tests for classification or regression based on ellipsoidal approximations of the optimal set. In addition to producing computational gains, our approach also allows us to compress a dataset into a subset of representative points. |
Tasks | |
Published | 2019-12-05 |
URL | https://arxiv.org/abs/1912.02566v2 |
https://arxiv.org/pdf/1912.02566v2.pdf | |
PWC | https://paperswithcode.com/paper/screening-data-points-in-empirical-risk |
Repo | |
Framework | |
Language Modeling through Long Term Memory Network
Title | Language Modeling through Long Term Memory Network |
Authors | Anupiya Nugaliyadde, Kok Wai Wong, Ferdous Sohel, Hong Xie |
Abstract | Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM), and Memory Networks which contain memory are popularly used to learn patterns in sequential data. Sequential data has long sequences that hold relationships. RNN can handle long sequences but suffers from the vanishing and exploding gradient problems. While LSTM and other memory networks address this problem, they are not capable of handling long sequences (50 or more data points long sequence patterns). Language modelling requiring learning from longer sequences are affected by the need for more information in memory. This paper introduces Long Term Memory network (LTM), which can tackle the exploding and vanishing gradient problems and handles long sequences without forgetting. LTM is designed to scale data in the memory and gives a higher weight to the input in the sequence. LTM avoid overfitting by scaling the cell state after achieving the optimal results. The LTM is tested on Penn treebank dataset, and Text8 dataset and LTM achieves test perplexities of 83 and 82 respectively. 650 LTM cells achieved a test perplexity of 67 for Penn treebank, and 600 cells achieved a test perplexity of 77 for Text8. LTM achieves state of the art results by only using ten hidden LTM cells for both datasets. |
Tasks | Language Modelling |
Published | 2019-04-18 |
URL | http://arxiv.org/abs/1904.08936v1 |
http://arxiv.org/pdf/1904.08936v1.pdf | |
PWC | https://paperswithcode.com/paper/language-modeling-through-long-term-memory |
Repo | |
Framework | |
Application and Computation of Probabilistic Neural Plasticity
Title | Application and Computation of Probabilistic Neural Plasticity |
Authors | Soaad Hossain |
Abstract | The discovery of neural plasticity has proved that throughout the life of a human being, the brain reorganizes itself through forming new neural connections. The formation of new neural connections are achieved through the brain’s effort to adapt to new environments or to changes in the existing environment. Despite the realization of neural plasticity, there is a lack of understanding the probability of neural plasticity occurring given some event. Using ordinary differential equations, neural firing equations and spike-train statistics, we show how an additive short-term memory (STM) equation can be formulated to approach the computation of neural plasticity. We then show how the additive STM equation can be used for probabilistic inference in computable neural plasticity, and the computation of probabilistic neural plasticity. We will also provide a brief introduction to the theory of probabilistic neural plasticity and conclude with showing how it can be applied to multiple disciplines such as behavioural science, machine learning, and psychiatry. |
Tasks | |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1907.00689v1 |
https://arxiv.org/pdf/1907.00689v1.pdf | |
PWC | https://paperswithcode.com/paper/application-and-computation-of-probabilistic |
Repo | |
Framework | |
A Consolidated System for Robust Multi-Document Entity Risk Extraction and Taxonomy Augmentation
Title | A Consolidated System for Robust Multi-Document Entity Risk Extraction and Taxonomy Augmentation |
Authors | Berk Ekmekci, Eleanor Hagerman, Blake Howald |
Abstract | We introduce a hybrid human-automated system that provides scalable entity-risk relation extractions across large data sets. Given an expert-defined keyword taxonomy, entities, and data sources, the system returns text extractions based on bidirectional token distances between entities and keywords and expands taxonomy coverage with word vector encodings. Our system represents a more simplified architecture compared to alerting focused systems - motivated by high coverage use cases in the risk mining space such as due diligence activities and intelligence gathering. We provide an overview of the system and expert evaluations for a range of token distances. We demonstrate that single and multi-sentence distance groups significantly outperform baseline extractions with shorter, single sentences being preferred by analysts. As the taxonomy expands, the amount of relevant information increases and multi-sentence extractions become more preferred, but this is tempered against entity-risk relations become more indirect. We discuss the implications of these observations on users, management of ambiguity and taxonomy expansion, and future system modifications. |
Tasks | |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.10368v1 |
https://arxiv.org/pdf/1909.10368v1.pdf | |
PWC | https://paperswithcode.com/paper/190910368 |
Repo | |
Framework | |
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Title | Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent |
Authors | Jaehoon Lee, Lechao Xiao, Samuel S. Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl-Dickstein, Jeffrey Pennington |
Abstract | A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Furthermore, mirroring the correspondence between wide Bayesian neural networks and Gaussian processes, gradient-based training of wide neural networks with a squared loss produces test set predictions drawn from a Gaussian process with a particular compositional kernel. While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized version even for finite practically-sized networks. This agreement is robust across different architectures, optimization methods, and loss functions. |
Tasks | Gaussian Processes |
Published | 2019-02-18 |
URL | https://arxiv.org/abs/1902.06720v4 |
https://arxiv.org/pdf/1902.06720v4.pdf | |
PWC | https://paperswithcode.com/paper/wide-neural-networks-of-any-depth-evolve-as |
Repo | |
Framework | |
A novel framework of the fuzzy c-means distances problem based weighted distance
Title | A novel framework of the fuzzy c-means distances problem based weighted distance |
Authors | Andy Arief Setyawan, Ahmad Ilham |
Abstract | Clustering is one of the major roles in data mining that is widely application in pattern recognition and image segmentation. Fuzzy C-means (FCM) is the most used clustering algorithm that proven efficient, fast and easy to implement, however, FCM uses the Euclidean distance that often leads to clustering errors, especially when handling multidimensional and noisy data. In the last few years, many distances metric have been proposed by researchers to improve the performance of the FCM algorithms, and the majority of researchers propose weighted distance. In this paper, we proposed Canberra Weighted Distance to improved performance of the FCM algorithm. The experimental result using the UCI data set show the proposed method is superior to the original method and other clustering methods. |
Tasks | Semantic Segmentation |
Published | 2019-07-31 |
URL | https://arxiv.org/abs/1907.13513v1 |
https://arxiv.org/pdf/1907.13513v1.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-framework-of-the-fuzzy-c-means |
Repo | |
Framework | |
Factored Contextual Policy Search with Bayesian Optimization
Title | Factored Contextual Policy Search with Bayesian Optimization |
Authors | Robert Pinsler, Peter Karkus, Andras Kupcsik, David Hsu, Wee Sun Lee |
Abstract | Scarce data is a major challenge to scaling robot learning to truly complex tasks, as we need to generalize locally learned policies over different task contexts. Contextual policy search offers data-efficient learning and generalization by explicitly conditioning the policy on a parametric context space. In this paper, we further structure the contextual policy representation. We propose to factor contexts into two components: target contexts that describe the task objectives, e.g. target position for throwing a ball; and environment contexts that characterize the environment, e.g. initial position or mass of the ball. Our key observation is that experience can be directly generalized over target contexts. We show that this can be easily exploited in contextual policy search algorithms. In particular, we apply factorization to a Bayesian optimization approach to contextual policy search both in sampling-based and active learning settings. Our simulation results show faster learning and better generalization in various robotic domains. See our supplementary video: https://youtu.be/MNTbBAOufDY. |
Tasks | Active Learning |
Published | 2019-04-26 |
URL | http://arxiv.org/abs/1904.11761v1 |
http://arxiv.org/pdf/1904.11761v1.pdf | |
PWC | https://paperswithcode.com/paper/factored-contextual-policy-search-with-1 |
Repo | |
Framework | |
A Machine Learning-Based Detection Technique for Optical Fiber Nonlinearity Mitigation
Title | A Machine Learning-Based Detection Technique for Optical Fiber Nonlinearity Mitigation |
Authors | Abdelkerim Amari, Xiang Lin, Octavia A. Dobre, Ramachandran Venkatesan, Alex Alvarado |
Abstract | We investigate the performance of a machine learning classification technique, called the Parzen window, to mitigate the fiber nonlinearity in the context of dispersion managed and dispersion unmanaged systems. The technique is applied for detection at the receiver side, and deals with the non-Gaussian nonlinear effects by designing improved decision boundaries. We also propose a two-stage mitigation technique using digital back propagation and Parzen window for dispersion unmanaged systems. In this case, digital back propagation compensates for the deterministic nonlinearity and the Parzen window deals with the stochastic nonlinear signal-noise interactions, which are not taken into account by digital back propagation. A performance improvement up to 0:4 dB in terms of Q factor is observed. |
Tasks | |
Published | 2019-02-27 |
URL | http://arxiv.org/abs/1903.01549v2 |
http://arxiv.org/pdf/1903.01549v2.pdf | |
PWC | https://paperswithcode.com/paper/a-machine-learning-based-detection-technique |
Repo | |
Framework | |
Disparity-Augmented Trajectories for Human Activity Recognition
Title | Disparity-Augmented Trajectories for Human Activity Recognition |
Authors | Pejman Habashi, Boubakeur Boufama, Imran Shafiq Ahmad |
Abstract | Numerous methods for human activity recognition have been proposed in the past two decades. Many of these methods are based on sparse representation, which describes the whole video content by a set of local features. Trajectories, being mid-level sparse features, are capable of describing the motion of an interest-point in 2D space. 2D trajectories might be affected by viewpoint changes, potentially decreasing their accuracy. In this paper, we initially propose and compare different 2D trajectory-based algorithms for human activity recognition. Moreover, we propose a new way of fusing disparity information with 2D trajectory information, without the calculation of 3D reconstruction. The obtained results show a 2.76% improvement when using disparity-augmented trajectories, compared to using the classical 2D trajectory information only. Furthermore, we have also tested our method on the challenging Hollywood 3D dataset, and we have obtained competitive results, at a faster speed. |
Tasks | 3D Reconstruction, Activity Recognition, Human Activity Recognition |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05344v1 |
https://arxiv.org/pdf/1905.05344v1.pdf | |
PWC | https://paperswithcode.com/paper/disparity-augmented-trajectories-for-human |
Repo | |
Framework | |