Paper Group ANR 616
Generation of Policy-Level Explanations for Reinforcement Learning. TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation. A Stochastic LBFGS Algorithm for Radio Interferometric Calibration. Predicting the Leading Political Ideology of YouTube Channels Using Acoustic, Textual, and Metadata Information. Probabilistic Permuta …
Generation of Policy-Level Explanations for Reinforcement Learning
Title | Generation of Policy-Level Explanations for Reinforcement Learning |
Authors | Nicholay Topin, Manuela Veloso |
Abstract | Though reinforcement learning has greatly benefited from the incorporation of neural networks, the inability to verify the correctness of such systems limits their use. Current work in explainable deep learning focuses on explaining only a single decision in terms of input features, making it unsuitable for explaining a sequence of decisions. To address this need, we introduce Abstracted Policy Graphs, which are Markov chains of abstract states. This representation concisely summarizes a policy so that individual decisions can be explained in the context of expected future transitions. Additionally, we propose a method to generate these Abstracted Policy Graphs for deterministic policies given a learned value function and a set of observed transitions, potentially off-policy transitions used during training. Since no restrictions are placed on how the value function is generated, our method is compatible with many existing reinforcement learning methods. We prove that the worst-case time complexity of our method is quadratic in the number of features and linear in the number of provided transitions, $O(F^2 tr_samples)$. By applying our method to a family of domains, we show that our method scales well in practice and produces Abstracted Policy Graphs which reliably capture relationships within these domains. |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.12044v1 |
https://arxiv.org/pdf/1905.12044v1.pdf | |
PWC | https://paperswithcode.com/paper/generation-of-policy-level-explanations-for |
Repo | |
Framework | |
TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation
Title | TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation |
Authors | Wubo Li, Wei Zou, Xiangang Li |
Abstract | Multimodalities provide promising performance than unimodality in most tasks. However, learning the semantic of the representations from multimodalities efficiently is extremely challenging. To tackle this, we propose the Transformer based Cross-modal Translator (TCT) to learn unimodal sequence representations by translating from other related multimodal sequences on a supervised learning method. Combined TCT with Multimodal Transformer Network (MTN), we evaluate MTN-TCT on the video-grounded dialogue which uses multimodality. The proposed method reports new state-of-the-art performance on video-grounded dialogue which indicates representations learned by TCT are more semantics compared to directly use unimodality. |
Tasks | |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1911.05186v1 |
https://arxiv.org/pdf/1911.05186v1.pdf | |
PWC | https://paperswithcode.com/paper/tct-a-cross-supervised-learning-method-for |
Repo | |
Framework | |
A Stochastic LBFGS Algorithm for Radio Interferometric Calibration
Title | A Stochastic LBFGS Algorithm for Radio Interferometric Calibration |
Authors | Sarod Yatawatta, Lukas De Clercq, Hanno Spreeuw, Faruk Diblen |
Abstract | We present a stochastic, limited-memory Broyden Fletcher Goldfarb Shanno (LBFGS) algorithm that is suitable for handling very large amounts of data. A direct application of this algorithm is radio interferometric calibration of raw data at fine time and frequency resolution. Almost all existing radio interferometric calibration algorithms assume that it is possible to fit the dataset being calibrated into memory. Therefore, the raw data is averaged in time and frequency to reduce its size by many orders of magnitude before calibration is performed. However, this averaging is detrimental for the detection of some signals of interest that have narrow bandwidth and time duration such as fast radio bursts (FRBs). Using the proposed algorithm, it is possible to calibrate data at such a fine resolution that they cannot be entirely loaded into memory, thus preserving such signals. As an additional demonstration, we use the proposed algorithm for training deep neural networks and compare the performance against the mainstream first order optimization algorithms that are used in deep learning. |
Tasks | Calibration |
Published | 2019-04-11 |
URL | http://arxiv.org/abs/1904.05619v2 |
http://arxiv.org/pdf/1904.05619v2.pdf | |
PWC | https://paperswithcode.com/paper/a-stochastic-lbfgs-algorithm-for-radio |
Repo | |
Framework | |
Predicting the Leading Political Ideology of YouTube Channels Using Acoustic, Textual, and Metadata Information
Title | Predicting the Leading Political Ideology of YouTube Channels Using Acoustic, Textual, and Metadata Information |
Authors | Yoan Dinkov, Ahmed Ali, Ivan Koychev, Preslav Nakov |
Abstract | We address the problem of predicting the leading political ideology, i.e., left-center-right bias, for YouTube channels of news media. Previous work on the problem has focused exclusively on text and on analysis of the language used, topics discussed, sentiment, and the like. In contrast, here we study videos, which yields an interesting multimodal setup. Starting with gold annotations about the leading political ideology of major world news media from Media Bias/Fact Check, we searched on YouTube to find their corresponding channels, and we downloaded a recent sample of videos from each channel. We crawled more than 1,000 YouTube hours along with the corresponding subtitles and metadata, thus producing a new multimodal dataset. We further developed a multimodal deep-learning architecture for the task. Our analysis shows that the use of acoustic signal helped to improve bias detection by more than 6% absolute over using text and metadata only. We release the dataset to the research community, hoping to help advance the field of multi-modal political bias detection. |
Tasks | |
Published | 2019-10-20 |
URL | https://arxiv.org/abs/1910.08948v1 |
https://arxiv.org/pdf/1910.08948v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-the-leading-political-ideology-of |
Repo | |
Framework | |
Probabilistic Permutation Synchronization using the Riemannian Structure of the Birkhoff Polytope
Title | Probabilistic Permutation Synchronization using the Riemannian Structure of the Birkhoff Polytope |
Authors | Tolga Birdal, Umut Şimşekli |
Abstract | We present an entirely new geometric and probabilistic approach to synchronization of correspondences across multiple sets of objects or images. In particular, we present two algorithms: (1) Birkhoff-Riemannian L-BFGS for optimizing the relaxed version of the combinatorially intractable cycle consistency loss in a principled manner, (2) Birkhoff-Riemannian Langevin Monte Carlo for generating samples on the Birkhoff Polytope and estimating the confidence of the found solutions. To this end, we first introduce the very recently developed Riemannian geometry of the Birkhoff Polytope. Next, we introduce a new probabilistic synchronization model in the form of a Markov Random Field (MRF). Finally, based on the first order retraction operators, we formulate our problem as simulating a stochastic differential equation and devise new integrators. We show on both synthetic and real datasets that we achieve high quality multi-graph matching results with faster convergence and reliable confidence/uncertainty estimates. |
Tasks | Graph Matching |
Published | 2019-04-11 |
URL | http://arxiv.org/abs/1904.05814v1 |
http://arxiv.org/pdf/1904.05814v1.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-permutation-synchronization |
Repo | |
Framework | |
Distant Learning for Entity Linking with Automatic Noise Detection
Title | Distant Learning for Entity Linking with Automatic Noise Detection |
Authors | Phong Le, Ivan Titov |
Abstract | Accurate entity linkers have been produced for domains and languages where annotated data (i.e., texts linked to a knowledge base) is available. However, little progress has been made for the settings where no or very limited amounts of labeled data are present (e.g., legal or most scientific domains). In this work, we show how we can learn to link mentions without having any labeled examples, only a knowledge base and a collection of unannotated texts from the corresponding domain. In order to achieve this, we frame the task as a multi-instance learning problem and rely on surface matching to create initial noisy labels. As the learning signal is weak and our surrogate labels are noisy, we introduce a noise detection component in our model: it lets the model detect and disregard examples which are likely to be noisy. Our method, jointly learning to detect noise and link entities, greatly outperforms the surface matching baseline. For a subset of entity categories, it even approaches the performance of supervised learning. |
Tasks | Entity Linking |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.07189v2 |
https://arxiv.org/pdf/1905.07189v2.pdf | |
PWC | https://paperswithcode.com/paper/distant-learning-for-entity-linking-with |
Repo | |
Framework | |
A Multi-Scale Mapping Approach Based on a Deep Learning CNN Model for Reconstructing High-Resolution Urban DEMs
Title | A Multi-Scale Mapping Approach Based on a Deep Learning CNN Model for Reconstructing High-Resolution Urban DEMs |
Authors | Ling Jiang, Yang Hu, Xilin Xia, Qiuhua Liang, Andrea Soltoggio |
Abstract | The shortage of high-resolution urban digital elevation model (DEM) datasets has been a challenge for modelling urban flood and managing its risk. A solution is to develop effective approaches to reconstruct high-resolution DEMs from their low-resolution equivalents that are more widely available. However, the current high-resolution DEM reconstruction approaches mainly focus on natural topography. Few attempts have been made for urban topography which is typically an integration of complex man-made and natural features. This study proposes a novel multi-scale mapping approach based on convolutional neural network (CNN) to deal with the complex characteristics of urban topography and reconstruct high-resolution urban DEMs. The proposed multi-scale CNN model is firstly trained using urban DEMs that contain topographic features at different resolutions, and then used to reconstruct the urban DEM at a specified (high) resolution from a low-resolution equivalent. A two-level accuracy assessment approach is also designed to evaluate the performance of the proposed urban DEM reconstruction method, in terms of numerical accuracy and morphological accuracy. The proposed DEM reconstruction approach is applied to a 121 km2 urbanized area in London, UK. Compared with other commonly used methods, the current CNN based approach produces superior results, providing a cost-effective innovative method to acquire high-resolution DEMs in other data-scarce environments. |
Tasks | |
Published | 2019-07-19 |
URL | https://arxiv.org/abs/1907.12898v2 |
https://arxiv.org/pdf/1907.12898v2.pdf | |
PWC | https://paperswithcode.com/paper/a-multi-scale-mapping-approach-based-on-a |
Repo | |
Framework | |
Deep Structured Neural Network for Event Temporal Relation Extraction
Title | Deep Structured Neural Network for Event Temporal Relation Extraction |
Authors | Rujun Han, I-Hung Hsu, Mu Yang, Aram Galstyan, Ralph Weischedel, Nanyun Peng |
Abstract | We propose a novel deep structured learning framework for event temporal relation extraction. The model consists of 1) a recurrent neural network (RNN) to learn scoring functions for pair-wise relations, and 2) a structured support vector machine (SSVM) to make joint predictions. The neural network automatically learns representations that account for long-term contexts to provide robust features for the structured model, while the SSVM incorporates domain knowledge such as transitive closure of temporal relations as constraints to make better globally consistent decisions. By jointly training the two components, our model combines the benefits of both data-driven learning and knowledge exploitation. Experimental results on three high-quality event temporal relation datasets (TCR, MATRES, and TB-Dense) demonstrate that incorporated with pre-trained contextualized embeddings, the proposed model achieves significantly better performances than the state-of-the-art methods on all three datasets. We also provide thorough ablation studies to investigate our model. |
Tasks | Relation Extraction |
Published | 2019-09-22 |
URL | https://arxiv.org/abs/1909.10094v2 |
https://arxiv.org/pdf/1909.10094v2.pdf | |
PWC | https://paperswithcode.com/paper/190910094 |
Repo | |
Framework | |
Deep Octonion Networks
Title | Deep Octonion Networks |
Authors | Jiasong Wu, Ling Xu, Youyong Kong, Lotfi Senhadji, Huazhong Shu |
Abstract | Deep learning is a research hot topic in the field of machine learning. Real-value neural networks (Real NNs), especially deep real networks (DRNs), have been widely used in many research fields. In recent years, the deep complex networks (DCNs) and the deep quaternion networks (DQNs) have attracted more and more attentions. The octonion algebra, which is an extension of complex algebra and quaternion algebra, can provide more efficient and compact expression. This paper constructs a general framework of deep octonion networks (DONs) and provides the main building blocks of DONs such as octonion convolution, octonion batch normalization and octonion weight initialization; DONs are then used in image classification tasks for CIFAR-10 and CIFAR-100 data sets. Compared with the DRNs, the DCNs, and the DQNs, the proposed DONs have better convergence and higher classification accuracy. The success of DONs is also explained by multi-task learning. |
Tasks | Image Classification, Multi-Task Learning |
Published | 2019-03-20 |
URL | http://arxiv.org/abs/1903.08478v1 |
http://arxiv.org/pdf/1903.08478v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-octonion-networks |
Repo | |
Framework | |
Sparsely Activated Networks: A new method for decomposing and compressing data
Title | Sparsely Activated Networks: A new method for decomposing and compressing data |
Authors | Paschalis Bizopoulos |
Abstract | Recent literature on unsupervised learning focused on designing structural priors with the aim of learning meaningful features, but without considering the description length of the representations. In this thesis, first we introduce the{\phi}metric that evaluates unsupervised models based on their reconstruction accuracy and the degree of compression of their internal representations. We then present and define two activation functions (Identity, ReLU) as base of reference and three sparse activation functions (top-k absolutes, Extrema-Pool indices, Extrema) as candidate structures that minimize the previously defined metric $\varphi$. We lastly present Sparsely Activated Networks (SANs) that consist of kernels with shared weights that, during encoding, are convolved with the input and then passed through a sparse activation function. During decoding, the same weights are convolved with the sparse activation map and subsequently the partial reconstructions from each weight are summed to reconstruct the input. We compare SANs using the five previously defined activation functions on a variety of datasets (Physionet, UCI-epilepsy, MNIST, FMNIST) and show that models that are selected using $\varphi$ have small description representation length and consist of interpretable kernels. |
Tasks | |
Published | 2019-10-30 |
URL | https://arxiv.org/abs/1911.00400v1 |
https://arxiv.org/pdf/1911.00400v1.pdf | |
PWC | https://paperswithcode.com/paper/sparsely-activated-networks-a-new-method-for |
Repo | |
Framework | |
Training ASR models by Generation of Contextual Information
Title | Training ASR models by Generation of Contextual Information |
Authors | Kritika Singh, Dmytro Okhonko, Jun Liu, Yongqiang Wang, Frank Zhang, Ross Girshick, Sergey Edunov, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed |
Abstract | Supervised ASR models have reached unprecedented levels of accuracy, thanks in part to ever-increasing amounts of labelled training data. However, in many applications and locales, only moderate amounts of data are available, which has led to a surge in semi- and weakly-supervised learning research. In this paper, we conduct a large-scale study evaluating the effectiveness of weakly-supervised learning for speech recognition by using loosely related contextual information as a surrogate for ground-truth labels. For weakly supervised training, we use 50k hours of public English social media videos along with their respective titles and post text to train an encoder-decoder transformer model. Our best encoder-decoder models achieve an average of 20.8% WER reduction over a 1000 hours supervised baseline, and an average of 13.4% WER reduction when using only the weakly supervised encoder for CTC fine-tuning. Our results show that our setup for weak supervision improved both the encoder acoustic representations as well as the decoder language generation abilities. |
Tasks | Speech Recognition, Text Generation |
Published | 2019-10-27 |
URL | https://arxiv.org/abs/1910.12367v2 |
https://arxiv.org/pdf/1910.12367v2.pdf | |
PWC | https://paperswithcode.com/paper/training-asr-models-by-generation-of |
Repo | |
Framework | |
Large-Batch Training for LSTM and Beyond
Title | Large-Batch Training for LSTM and Beyond |
Authors | Yang You, Jonathan Hseu, Chris Ying, James Demmel, Kurt Keutzer, Cho-Jui Hsieh |
Abstract | Large-batch training approaches have enabled researchers to utilize large-scale distributed processing and greatly accelerate deep-neural net (DNN) training. For example, by scaling the batch size from 256 to 32K, researchers have been able to reduce the training time of ResNet50 on ImageNet from 29 hours to 2.2 minutes (Ying et al., 2018). In this paper, we propose a new approach called linear-epoch gradual-warmup (LEGW) for better large-batch training. With LEGW, we are able to conduct large-batch training for both CNNs and RNNs with the Sqrt Scaling scheme. LEGW enables Sqrt Scaling scheme to be useful in practice and as a result we achieve much better results than the Linear Scaling learning rate scheme. For LSTM applications, we are able to scale the batch size by a factor of 64 without losing accuracy and without tuning the hyper-parameters. For CNN applications, LEGW is able to achieve the same accuracy even as we scale the batch size to 32K. LEGW works better than previous large-batch auto-tuning techniques. LEGW achieves a 5.3X average speedup over the baselines for four LSTM-based applications on the same hardware. We also provide some theoretical explanations for LEGW. |
Tasks | |
Published | 2019-01-24 |
URL | http://arxiv.org/abs/1901.08256v1 |
http://arxiv.org/pdf/1901.08256v1.pdf | |
PWC | https://paperswithcode.com/paper/large-batch-training-for-lstm-and-beyond |
Repo | |
Framework | |
AquaSight: Automatic Water Impurity Detection Utilizing Convolutional Neural Networks
Title | AquaSight: Automatic Water Impurity Detection Utilizing Convolutional Neural Networks |
Authors | Ankit Gupta, Elliott Ruebush |
Abstract | According to the United Nations World Water Assessment Programme, every day, 2 million tons of sewage and industrial and agricultural waste are discharged into the worlds water. In order to address this pervasive issue of increasing water pollution, while ensuring that the global population has an efficient, accurate, and low cost method to assess whether the water they drink is contaminated, we propose AquaSight, a novel mobile application that utilizes deep learning methods, specifically Convolutional Neural Networks, for automated water impurity detection. After comprehensive training with a dataset of 105 images representing varying magnitudes of contamination, the deep learning algorithm achieved a 96 percent accuracy and loss of 0.108. Furthermore, the machine learning model uses efficient analysis of the turbidity and transparency levels of water to estimate a particular sample of waters level of contamination. When deployed, the AquaSight system will provide an efficient way for individuals to secure an estimation of water quality, alerting local and national government to take action and potentially saving millions of lives worldwide. |
Tasks | |
Published | 2019-07-17 |
URL | https://arxiv.org/abs/1907.07573v1 |
https://arxiv.org/pdf/1907.07573v1.pdf | |
PWC | https://paperswithcode.com/paper/aquasight-automatic-water-impurity-detection |
Repo | |
Framework | |
Abductive Commonsense Reasoning
Title | Abductive Commonsense Reasoning |
Authors | Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Scott Wen-tau Yih, Yejin Choi |
Abstract | Abductive reasoning is inference to the most plausible explanation. For example, if Jenny finds her house in a mess when she returns from work, and remembers that she left a window open, she can hypothesize that a thief broke into her house and caused the mess, as the most plausible explanation. While abduction has long been considered to be at the core of how people interpret and read between the lines in natural language (Hobbs et al., 1988), there has been relatively little research in support of abductive natural language inference and generation. We present the first study that investigates the viability of language-based abductive reasoning. We introduce a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations. Based on this dataset, we conceptualize two new tasks – (i) Abductive NLI: a multiple-choice question answering task for choosing the more likely explanation, and (ii) Abductive NLG: a conditional generation task for explaining given observations in natural language. On Abductive NLI, the best model achieves 68.9% accuracy, well below human performance of 91.4%. On Abductive NLG, the current best language generators struggle even more, as they lack reasoning capabilities that are trivial for humans. Our analysis leads to new insights into the types of reasoning that deep pre-trained language models fail to perform–despite their strong performance on the related but more narrowly defined task of entailment NLI–pointing to interesting avenues for future research. |
Tasks | Natural Language Inference, Question Answering |
Published | 2019-08-15 |
URL | https://arxiv.org/abs/1908.05739v2 |
https://arxiv.org/pdf/1908.05739v2.pdf | |
PWC | https://paperswithcode.com/paper/abductive-commonsense-reasoning |
Repo | |
Framework | |
Pre-training in Deep Reinforcement Learning for Automatic Speech Recognition
Title | Pre-training in Deep Reinforcement Learning for Automatic Speech Recognition |
Authors | Thejan Rajapakshe, Rajib Rana, Siddique Latif, Sara Khalifa, Björn W. Schuller |
Abstract | Deep reinforcement learning (deep RL) is a combination of deep learning with reinforcement learning principles to create efficient methods that can learn by interacting with its environment. This led to breakthroughs in many complex tasks that were previously difficult to solve. However, deep RL requires a large amount of training time that makes it difficult to use in various real-life applications like human-computer interaction (HCI). Therefore, in this paper, we study pre-training in deep RL to reduce the training time and improve the performance in speech recognition, a popular application of HCI. We achieve significantly improved performance in less time on a publicly available speech command recognition dataset. |
Tasks | Speech Recognition |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1910.11256v2 |
https://arxiv.org/pdf/1910.11256v2.pdf | |
PWC | https://paperswithcode.com/paper/pre-training-in-deep-reinforcement-learning |
Repo | |
Framework | |