January 29, 2020

3143 words 15 mins read

Paper Group ANR 616

Paper Group ANR 616

Generation of Policy-Level Explanations for Reinforcement Learning. TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation. A Stochastic LBFGS Algorithm for Radio Interferometric Calibration. Predicting the Leading Political Ideology of YouTube Channels Using Acoustic, Textual, and Metadata Information. Probabilistic Permuta …

Generation of Policy-Level Explanations for Reinforcement Learning

Title Generation of Policy-Level Explanations for Reinforcement Learning
Authors Nicholay Topin, Manuela Veloso
Abstract Though reinforcement learning has greatly benefited from the incorporation of neural networks, the inability to verify the correctness of such systems limits their use. Current work in explainable deep learning focuses on explaining only a single decision in terms of input features, making it unsuitable for explaining a sequence of decisions. To address this need, we introduce Abstracted Policy Graphs, which are Markov chains of abstract states. This representation concisely summarizes a policy so that individual decisions can be explained in the context of expected future transitions. Additionally, we propose a method to generate these Abstracted Policy Graphs for deterministic policies given a learned value function and a set of observed transitions, potentially off-policy transitions used during training. Since no restrictions are placed on how the value function is generated, our method is compatible with many existing reinforcement learning methods. We prove that the worst-case time complexity of our method is quadratic in the number of features and linear in the number of provided transitions, $O(F^2 tr_samples)$. By applying our method to a family of domains, we show that our method scales well in practice and produces Abstracted Policy Graphs which reliably capture relationships within these domains.
Tasks
Published 2019-05-28
URL https://arxiv.org/abs/1905.12044v1
PDF https://arxiv.org/pdf/1905.12044v1.pdf
PWC https://paperswithcode.com/paper/generation-of-policy-level-explanations-for
Repo
Framework

TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation

Title TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation
Authors Wubo Li, Wei Zou, Xiangang Li
Abstract Multimodalities provide promising performance than unimodality in most tasks. However, learning the semantic of the representations from multimodalities efficiently is extremely challenging. To tackle this, we propose the Transformer based Cross-modal Translator (TCT) to learn unimodal sequence representations by translating from other related multimodal sequences on a supervised learning method. Combined TCT with Multimodal Transformer Network (MTN), we evaluate MTN-TCT on the video-grounded dialogue which uses multimodality. The proposed method reports new state-of-the-art performance on video-grounded dialogue which indicates representations learned by TCT are more semantics compared to directly use unimodality.
Tasks
Published 2019-10-23
URL https://arxiv.org/abs/1911.05186v1
PDF https://arxiv.org/pdf/1911.05186v1.pdf
PWC https://paperswithcode.com/paper/tct-a-cross-supervised-learning-method-for
Repo
Framework

A Stochastic LBFGS Algorithm for Radio Interferometric Calibration

Title A Stochastic LBFGS Algorithm for Radio Interferometric Calibration
Authors Sarod Yatawatta, Lukas De Clercq, Hanno Spreeuw, Faruk Diblen
Abstract We present a stochastic, limited-memory Broyden Fletcher Goldfarb Shanno (LBFGS) algorithm that is suitable for handling very large amounts of data. A direct application of this algorithm is radio interferometric calibration of raw data at fine time and frequency resolution. Almost all existing radio interferometric calibration algorithms assume that it is possible to fit the dataset being calibrated into memory. Therefore, the raw data is averaged in time and frequency to reduce its size by many orders of magnitude before calibration is performed. However, this averaging is detrimental for the detection of some signals of interest that have narrow bandwidth and time duration such as fast radio bursts (FRBs). Using the proposed algorithm, it is possible to calibrate data at such a fine resolution that they cannot be entirely loaded into memory, thus preserving such signals. As an additional demonstration, we use the proposed algorithm for training deep neural networks and compare the performance against the mainstream first order optimization algorithms that are used in deep learning.
Tasks Calibration
Published 2019-04-11
URL http://arxiv.org/abs/1904.05619v2
PDF http://arxiv.org/pdf/1904.05619v2.pdf
PWC https://paperswithcode.com/paper/a-stochastic-lbfgs-algorithm-for-radio
Repo
Framework

Predicting the Leading Political Ideology of YouTube Channels Using Acoustic, Textual, and Metadata Information

Title Predicting the Leading Political Ideology of YouTube Channels Using Acoustic, Textual, and Metadata Information
Authors Yoan Dinkov, Ahmed Ali, Ivan Koychev, Preslav Nakov
Abstract We address the problem of predicting the leading political ideology, i.e., left-center-right bias, for YouTube channels of news media. Previous work on the problem has focused exclusively on text and on analysis of the language used, topics discussed, sentiment, and the like. In contrast, here we study videos, which yields an interesting multimodal setup. Starting with gold annotations about the leading political ideology of major world news media from Media Bias/Fact Check, we searched on YouTube to find their corresponding channels, and we downloaded a recent sample of videos from each channel. We crawled more than 1,000 YouTube hours along with the corresponding subtitles and metadata, thus producing a new multimodal dataset. We further developed a multimodal deep-learning architecture for the task. Our analysis shows that the use of acoustic signal helped to improve bias detection by more than 6% absolute over using text and metadata only. We release the dataset to the research community, hoping to help advance the field of multi-modal political bias detection.
Tasks
Published 2019-10-20
URL https://arxiv.org/abs/1910.08948v1
PDF https://arxiv.org/pdf/1910.08948v1.pdf
PWC https://paperswithcode.com/paper/predicting-the-leading-political-ideology-of
Repo
Framework

Probabilistic Permutation Synchronization using the Riemannian Structure of the Birkhoff Polytope

Title Probabilistic Permutation Synchronization using the Riemannian Structure of the Birkhoff Polytope
Authors Tolga Birdal, Umut Şimşekli
Abstract We present an entirely new geometric and probabilistic approach to synchronization of correspondences across multiple sets of objects or images. In particular, we present two algorithms: (1) Birkhoff-Riemannian L-BFGS for optimizing the relaxed version of the combinatorially intractable cycle consistency loss in a principled manner, (2) Birkhoff-Riemannian Langevin Monte Carlo for generating samples on the Birkhoff Polytope and estimating the confidence of the found solutions. To this end, we first introduce the very recently developed Riemannian geometry of the Birkhoff Polytope. Next, we introduce a new probabilistic synchronization model in the form of a Markov Random Field (MRF). Finally, based on the first order retraction operators, we formulate our problem as simulating a stochastic differential equation and devise new integrators. We show on both synthetic and real datasets that we achieve high quality multi-graph matching results with faster convergence and reliable confidence/uncertainty estimates.
Tasks Graph Matching
Published 2019-04-11
URL http://arxiv.org/abs/1904.05814v1
PDF http://arxiv.org/pdf/1904.05814v1.pdf
PWC https://paperswithcode.com/paper/probabilistic-permutation-synchronization
Repo
Framework

Distant Learning for Entity Linking with Automatic Noise Detection

Title Distant Learning for Entity Linking with Automatic Noise Detection
Authors Phong Le, Ivan Titov
Abstract Accurate entity linkers have been produced for domains and languages where annotated data (i.e., texts linked to a knowledge base) is available. However, little progress has been made for the settings where no or very limited amounts of labeled data are present (e.g., legal or most scientific domains). In this work, we show how we can learn to link mentions without having any labeled examples, only a knowledge base and a collection of unannotated texts from the corresponding domain. In order to achieve this, we frame the task as a multi-instance learning problem and rely on surface matching to create initial noisy labels. As the learning signal is weak and our surrogate labels are noisy, we introduce a noise detection component in our model: it lets the model detect and disregard examples which are likely to be noisy. Our method, jointly learning to detect noise and link entities, greatly outperforms the surface matching baseline. For a subset of entity categories, it even approaches the performance of supervised learning.
Tasks Entity Linking
Published 2019-05-17
URL https://arxiv.org/abs/1905.07189v2
PDF https://arxiv.org/pdf/1905.07189v2.pdf
PWC https://paperswithcode.com/paper/distant-learning-for-entity-linking-with
Repo
Framework

A Multi-Scale Mapping Approach Based on a Deep Learning CNN Model for Reconstructing High-Resolution Urban DEMs

Title A Multi-Scale Mapping Approach Based on a Deep Learning CNN Model for Reconstructing High-Resolution Urban DEMs
Authors Ling Jiang, Yang Hu, Xilin Xia, Qiuhua Liang, Andrea Soltoggio
Abstract The shortage of high-resolution urban digital elevation model (DEM) datasets has been a challenge for modelling urban flood and managing its risk. A solution is to develop effective approaches to reconstruct high-resolution DEMs from their low-resolution equivalents that are more widely available. However, the current high-resolution DEM reconstruction approaches mainly focus on natural topography. Few attempts have been made for urban topography which is typically an integration of complex man-made and natural features. This study proposes a novel multi-scale mapping approach based on convolutional neural network (CNN) to deal with the complex characteristics of urban topography and reconstruct high-resolution urban DEMs. The proposed multi-scale CNN model is firstly trained using urban DEMs that contain topographic features at different resolutions, and then used to reconstruct the urban DEM at a specified (high) resolution from a low-resolution equivalent. A two-level accuracy assessment approach is also designed to evaluate the performance of the proposed urban DEM reconstruction method, in terms of numerical accuracy and morphological accuracy. The proposed DEM reconstruction approach is applied to a 121 km2 urbanized area in London, UK. Compared with other commonly used methods, the current CNN based approach produces superior results, providing a cost-effective innovative method to acquire high-resolution DEMs in other data-scarce environments.
Tasks
Published 2019-07-19
URL https://arxiv.org/abs/1907.12898v2
PDF https://arxiv.org/pdf/1907.12898v2.pdf
PWC https://paperswithcode.com/paper/a-multi-scale-mapping-approach-based-on-a
Repo
Framework

Deep Structured Neural Network for Event Temporal Relation Extraction

Title Deep Structured Neural Network for Event Temporal Relation Extraction
Authors Rujun Han, I-Hung Hsu, Mu Yang, Aram Galstyan, Ralph Weischedel, Nanyun Peng
Abstract We propose a novel deep structured learning framework for event temporal relation extraction. The model consists of 1) a recurrent neural network (RNN) to learn scoring functions for pair-wise relations, and 2) a structured support vector machine (SSVM) to make joint predictions. The neural network automatically learns representations that account for long-term contexts to provide robust features for the structured model, while the SSVM incorporates domain knowledge such as transitive closure of temporal relations as constraints to make better globally consistent decisions. By jointly training the two components, our model combines the benefits of both data-driven learning and knowledge exploitation. Experimental results on three high-quality event temporal relation datasets (TCR, MATRES, and TB-Dense) demonstrate that incorporated with pre-trained contextualized embeddings, the proposed model achieves significantly better performances than the state-of-the-art methods on all three datasets. We also provide thorough ablation studies to investigate our model.
Tasks Relation Extraction
Published 2019-09-22
URL https://arxiv.org/abs/1909.10094v2
PDF https://arxiv.org/pdf/1909.10094v2.pdf
PWC https://paperswithcode.com/paper/190910094
Repo
Framework

Deep Octonion Networks

Title Deep Octonion Networks
Authors Jiasong Wu, Ling Xu, Youyong Kong, Lotfi Senhadji, Huazhong Shu
Abstract Deep learning is a research hot topic in the field of machine learning. Real-value neural networks (Real NNs), especially deep real networks (DRNs), have been widely used in many research fields. In recent years, the deep complex networks (DCNs) and the deep quaternion networks (DQNs) have attracted more and more attentions. The octonion algebra, which is an extension of complex algebra and quaternion algebra, can provide more efficient and compact expression. This paper constructs a general framework of deep octonion networks (DONs) and provides the main building blocks of DONs such as octonion convolution, octonion batch normalization and octonion weight initialization; DONs are then used in image classification tasks for CIFAR-10 and CIFAR-100 data sets. Compared with the DRNs, the DCNs, and the DQNs, the proposed DONs have better convergence and higher classification accuracy. The success of DONs is also explained by multi-task learning.
Tasks Image Classification, Multi-Task Learning
Published 2019-03-20
URL http://arxiv.org/abs/1903.08478v1
PDF http://arxiv.org/pdf/1903.08478v1.pdf
PWC https://paperswithcode.com/paper/deep-octonion-networks
Repo
Framework

Sparsely Activated Networks: A new method for decomposing and compressing data

Title Sparsely Activated Networks: A new method for decomposing and compressing data
Authors Paschalis Bizopoulos
Abstract Recent literature on unsupervised learning focused on designing structural priors with the aim of learning meaningful features, but without considering the description length of the representations. In this thesis, first we introduce the{\phi}metric that evaluates unsupervised models based on their reconstruction accuracy and the degree of compression of their internal representations. We then present and define two activation functions (Identity, ReLU) as base of reference and three sparse activation functions (top-k absolutes, Extrema-Pool indices, Extrema) as candidate structures that minimize the previously defined metric $\varphi$. We lastly present Sparsely Activated Networks (SANs) that consist of kernels with shared weights that, during encoding, are convolved with the input and then passed through a sparse activation function. During decoding, the same weights are convolved with the sparse activation map and subsequently the partial reconstructions from each weight are summed to reconstruct the input. We compare SANs using the five previously defined activation functions on a variety of datasets (Physionet, UCI-epilepsy, MNIST, FMNIST) and show that models that are selected using $\varphi$ have small description representation length and consist of interpretable kernels.
Tasks
Published 2019-10-30
URL https://arxiv.org/abs/1911.00400v1
PDF https://arxiv.org/pdf/1911.00400v1.pdf
PWC https://paperswithcode.com/paper/sparsely-activated-networks-a-new-method-for
Repo
Framework

Training ASR models by Generation of Contextual Information

Title Training ASR models by Generation of Contextual Information
Authors Kritika Singh, Dmytro Okhonko, Jun Liu, Yongqiang Wang, Frank Zhang, Ross Girshick, Sergey Edunov, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed
Abstract Supervised ASR models have reached unprecedented levels of accuracy, thanks in part to ever-increasing amounts of labelled training data. However, in many applications and locales, only moderate amounts of data are available, which has led to a surge in semi- and weakly-supervised learning research. In this paper, we conduct a large-scale study evaluating the effectiveness of weakly-supervised learning for speech recognition by using loosely related contextual information as a surrogate for ground-truth labels. For weakly supervised training, we use 50k hours of public English social media videos along with their respective titles and post text to train an encoder-decoder transformer model. Our best encoder-decoder models achieve an average of 20.8% WER reduction over a 1000 hours supervised baseline, and an average of 13.4% WER reduction when using only the weakly supervised encoder for CTC fine-tuning. Our results show that our setup for weak supervision improved both the encoder acoustic representations as well as the decoder language generation abilities.
Tasks Speech Recognition, Text Generation
Published 2019-10-27
URL https://arxiv.org/abs/1910.12367v2
PDF https://arxiv.org/pdf/1910.12367v2.pdf
PWC https://paperswithcode.com/paper/training-asr-models-by-generation-of
Repo
Framework

Large-Batch Training for LSTM and Beyond

Title Large-Batch Training for LSTM and Beyond
Authors Yang You, Jonathan Hseu, Chris Ying, James Demmel, Kurt Keutzer, Cho-Jui Hsieh
Abstract Large-batch training approaches have enabled researchers to utilize large-scale distributed processing and greatly accelerate deep-neural net (DNN) training. For example, by scaling the batch size from 256 to 32K, researchers have been able to reduce the training time of ResNet50 on ImageNet from 29 hours to 2.2 minutes (Ying et al., 2018). In this paper, we propose a new approach called linear-epoch gradual-warmup (LEGW) for better large-batch training. With LEGW, we are able to conduct large-batch training for both CNNs and RNNs with the Sqrt Scaling scheme. LEGW enables Sqrt Scaling scheme to be useful in practice and as a result we achieve much better results than the Linear Scaling learning rate scheme. For LSTM applications, we are able to scale the batch size by a factor of 64 without losing accuracy and without tuning the hyper-parameters. For CNN applications, LEGW is able to achieve the same accuracy even as we scale the batch size to 32K. LEGW works better than previous large-batch auto-tuning techniques. LEGW achieves a 5.3X average speedup over the baselines for four LSTM-based applications on the same hardware. We also provide some theoretical explanations for LEGW.
Tasks
Published 2019-01-24
URL http://arxiv.org/abs/1901.08256v1
PDF http://arxiv.org/pdf/1901.08256v1.pdf
PWC https://paperswithcode.com/paper/large-batch-training-for-lstm-and-beyond
Repo
Framework

AquaSight: Automatic Water Impurity Detection Utilizing Convolutional Neural Networks

Title AquaSight: Automatic Water Impurity Detection Utilizing Convolutional Neural Networks
Authors Ankit Gupta, Elliott Ruebush
Abstract According to the United Nations World Water Assessment Programme, every day, 2 million tons of sewage and industrial and agricultural waste are discharged into the worlds water. In order to address this pervasive issue of increasing water pollution, while ensuring that the global population has an efficient, accurate, and low cost method to assess whether the water they drink is contaminated, we propose AquaSight, a novel mobile application that utilizes deep learning methods, specifically Convolutional Neural Networks, for automated water impurity detection. After comprehensive training with a dataset of 105 images representing varying magnitudes of contamination, the deep learning algorithm achieved a 96 percent accuracy and loss of 0.108. Furthermore, the machine learning model uses efficient analysis of the turbidity and transparency levels of water to estimate a particular sample of waters level of contamination. When deployed, the AquaSight system will provide an efficient way for individuals to secure an estimation of water quality, alerting local and national government to take action and potentially saving millions of lives worldwide.
Tasks
Published 2019-07-17
URL https://arxiv.org/abs/1907.07573v1
PDF https://arxiv.org/pdf/1907.07573v1.pdf
PWC https://paperswithcode.com/paper/aquasight-automatic-water-impurity-detection
Repo
Framework

Abductive Commonsense Reasoning

Title Abductive Commonsense Reasoning
Authors Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Scott Wen-tau Yih, Yejin Choi
Abstract Abductive reasoning is inference to the most plausible explanation. For example, if Jenny finds her house in a mess when she returns from work, and remembers that she left a window open, she can hypothesize that a thief broke into her house and caused the mess, as the most plausible explanation. While abduction has long been considered to be at the core of how people interpret and read between the lines in natural language (Hobbs et al., 1988), there has been relatively little research in support of abductive natural language inference and generation. We present the first study that investigates the viability of language-based abductive reasoning. We introduce a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations. Based on this dataset, we conceptualize two new tasks – (i) Abductive NLI: a multiple-choice question answering task for choosing the more likely explanation, and (ii) Abductive NLG: a conditional generation task for explaining given observations in natural language. On Abductive NLI, the best model achieves 68.9% accuracy, well below human performance of 91.4%. On Abductive NLG, the current best language generators struggle even more, as they lack reasoning capabilities that are trivial for humans. Our analysis leads to new insights into the types of reasoning that deep pre-trained language models fail to perform–despite their strong performance on the related but more narrowly defined task of entailment NLI–pointing to interesting avenues for future research.
Tasks Natural Language Inference, Question Answering
Published 2019-08-15
URL https://arxiv.org/abs/1908.05739v2
PDF https://arxiv.org/pdf/1908.05739v2.pdf
PWC https://paperswithcode.com/paper/abductive-commonsense-reasoning
Repo
Framework

Pre-training in Deep Reinforcement Learning for Automatic Speech Recognition

Title Pre-training in Deep Reinforcement Learning for Automatic Speech Recognition
Authors Thejan Rajapakshe, Rajib Rana, Siddique Latif, Sara Khalifa, Björn W. Schuller
Abstract Deep reinforcement learning (deep RL) is a combination of deep learning with reinforcement learning principles to create efficient methods that can learn by interacting with its environment. This led to breakthroughs in many complex tasks that were previously difficult to solve. However, deep RL requires a large amount of training time that makes it difficult to use in various real-life applications like human-computer interaction (HCI). Therefore, in this paper, we study pre-training in deep RL to reduce the training time and improve the performance in speech recognition, a popular application of HCI. We achieve significantly improved performance in less time on a publicly available speech command recognition dataset.
Tasks Speech Recognition
Published 2019-10-24
URL https://arxiv.org/abs/1910.11256v2
PDF https://arxiv.org/pdf/1910.11256v2.pdf
PWC https://paperswithcode.com/paper/pre-training-in-deep-reinforcement-learning
Repo
Framework
comments powered by Disqus