January 29, 2020

3143 words 15 mins read

Paper Group ANR 616

Generation of Policy-Level Explanations for Reinforcement Learning. TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation. A Stochastic LBFGS Algorithm for Radio Interferometric Calibration. Predicting the Leading Political Ideology of YouTube Channels Using Acoustic, Textual, and Metadata Information. Probabilistic Permuta …

Generation of Policy-Level Explanations for Reinforcement Learning


Title	Generation of Policy-Level Explanations for Reinforcement Learning
Authors	Nicholay Topin, Manuela Veloso
Abstract	Though reinforcement learning has greatly benefited from the incorporation of neural networks, the inability to verify the correctness of such systems limits their use. Current work in explainable deep learning focuses on explaining only a single decision in terms of input features, making it unsuitable for explaining a sequence of decisions. To address this need, we introduce Abstracted Policy Graphs, which are Markov chains of abstract states. This representation concisely summarizes a policy so that individual decisions can be explained in the context of expected future transitions. Additionally, we propose a method to generate these Abstracted Policy Graphs for deterministic policies given a learned value function and a set of observed transitions, potentially off-policy transitions used during training. Since no restrictions are placed on how the value function is generated, our method is compatible with many existing reinforcement learning methods. We prove that the worst-case time complexity of our method is quadratic in the number of features and linear in the number of provided transitions, $O(F^2 tr_samples)$. By applying our method to a family of domains, we show that our method scales well in practice and produces Abstracted Policy Graphs which reliably capture relationships within these domains.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.12044v1
PDF	https://arxiv.org/pdf/1905.12044v1.pdf
PWC	https://paperswithcode.com/paper/generation-of-policy-level-explanations-for
Repo
Framework

TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation


Title	TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation
Authors	Wubo Li, Wei Zou, Xiangang Li
Abstract	Multimodalities provide promising performance than unimodality in most tasks. However, learning the semantic of the representations from multimodalities efficiently is extremely challenging. To tackle this, we propose the Transformer based Cross-modal Translator (TCT) to learn unimodal sequence representations by translating from other related multimodal sequences on a supervised learning method. Combined TCT with Multimodal Transformer Network (MTN), we evaluate MTN-TCT on the video-grounded dialogue which uses multimodality. The proposed method reports new state-of-the-art performance on video-grounded dialogue which indicates representations learned by TCT are more semantics compared to directly use unimodality.
Tasks
Published	2019-10-23
URL	https://arxiv.org/abs/1911.05186v1
PDF	https://arxiv.org/pdf/1911.05186v1.pdf
PWC	https://paperswithcode.com/paper/tct-a-cross-supervised-learning-method-for
Repo
Framework

A Stochastic LBFGS Algorithm for Radio Interferometric Calibration


Title	A Stochastic LBFGS Algorithm for Radio Interferometric Calibration
Authors	Sarod Yatawatta, Lukas De Clercq, Hanno Spreeuw, Faruk Diblen
Abstract	We present a stochastic, limited-memory Broyden Fletcher Goldfarb Shanno (LBFGS) algorithm that is suitable for handling very large amounts of data. A direct application of this algorithm is radio interferometric calibration of raw data at fine time and frequency resolution. Almost all existing radio interferometric calibration algorithms assume that it is possible to fit the dataset being calibrated into memory. Therefore, the raw data is averaged in time and frequency to reduce its size by many orders of magnitude before calibration is performed. However, this averaging is detrimental for the detection of some signals of interest that have narrow bandwidth and time duration such as fast radio bursts (FRBs). Using the proposed algorithm, it is possible to calibrate data at such a fine resolution that they cannot be entirely loaded into memory, thus preserving such signals. As an additional demonstration, we use the proposed algorithm for training deep neural networks and compare the performance against the mainstream first order optimization algorithms that are used in deep learning.
Tasks	Calibration
Published	2019-04-11
URL	http://arxiv.org/abs/1904.05619v2
PDF	http://arxiv.org/pdf/1904.05619v2.pdf
PWC	https://paperswithcode.com/paper/a-stochastic-lbfgs-algorithm-for-radio
Repo
Framework

Predicting the Leading Political Ideology of YouTube Channels Using Acoustic, Textual, and Metadata Information


Title	Predicting the Leading Political Ideology of YouTube Channels Using Acoustic, Textual, and Metadata Information
Authors	Yoan Dinkov, Ahmed Ali, Ivan Koychev, Preslav Nakov
Abstract	We address the problem of predicting the leading political ideology, i.e., left-center-right bias, for YouTube channels of news media. Previous work on the problem has focused exclusively on text and on analysis of the language used, topics discussed, sentiment, and the like. In contrast, here we study videos, which yields an interesting multimodal setup. Starting with gold annotations about the leading political ideology of major world news media from Media Bias/Fact Check, we searched on YouTube to find their corresponding channels, and we downloaded a recent sample of videos from each channel. We crawled more than 1,000 YouTube hours along with the corresponding subtitles and metadata, thus producing a new multimodal dataset. We further developed a multimodal deep-learning architecture for the task. Our analysis shows that the use of acoustic signal helped to improve bias detection by more than 6% absolute over using text and metadata only. We release the dataset to the research community, hoping to help advance the field of multi-modal political bias detection.
Tasks
Published	2019-10-20
URL	https://arxiv.org/abs/1910.08948v1
PDF	https://arxiv.org/pdf/1910.08948v1.pdf
PWC	https://paperswithcode.com/paper/predicting-the-leading-political-ideology-of
Repo
Framework

Probabilistic Permutation Synchronization using the Riemannian Structure of the Birkhoff Polytope


Title	Probabilistic Permutation Synchronization using the Riemannian Structure of the Birkhoff Polytope
Authors	Tolga Birdal, Umut Şimşekli
Abstract	We present an entirely new geometric and probabilistic approach to synchronization of correspondences across multiple sets of objects or images. In particular, we present two algorithms: (1) Birkhoff-Riemannian L-BFGS for optimizing the relaxed version of the combinatorially intractable cycle consistency loss in a principled manner, (2) Birkhoff-Riemannian Langevin Monte Carlo for generating samples on the Birkhoff Polytope and estimating the confidence of the found solutions. To this end, we first introduce the very recently developed Riemannian geometry of the Birkhoff Polytope. Next, we introduce a new probabilistic synchronization model in the form of a Markov Random Field (MRF). Finally, based on the first order retraction operators, we formulate our problem as simulating a stochastic differential equation and devise new integrators. We show on both synthetic and real datasets that we achieve high quality multi-graph matching results with faster convergence and reliable confidence/uncertainty estimates.
Tasks	Graph Matching
Published	2019-04-11
URL	http://arxiv.org/abs/1904.05814v1
PDF	http://arxiv.org/pdf/1904.05814v1.pdf
PWC	https://paperswithcode.com/paper/probabilistic-permutation-synchronization
Repo
Framework

Distant Learning for Entity Linking with Automatic Noise Detection


Title	Distant Learning for Entity Linking with Automatic Noise Detection
Authors	Phong Le, Ivan Titov
Abstract	Accurate entity linkers have been produced for domains and languages where annotated data (i.e., texts linked to a knowledge base) is available. However, little progress has been made for the settings where no or very limited amounts of labeled data are present (e.g., legal or most scientific domains). In this work, we show how we can learn to link mentions without having any labeled examples, only a knowledge base and a collection of unannotated texts from the corresponding domain. In order to achieve this, we frame the task as a multi-instance learning problem and rely on surface matching to create initial noisy labels. As the learning signal is weak and our surrogate labels are noisy, we introduce a noise detection component in our model: it lets the model detect and disregard examples which are likely to be noisy. Our method, jointly learning to detect noise and link entities, greatly outperforms the surface matching baseline. For a subset of entity categories, it even approaches the performance of supervised learning.
Tasks	Entity Linking
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07189v2
PDF	https://arxiv.org/pdf/1905.07189v2.pdf
PWC	https://paperswithcode.com/paper/distant-learning-for-entity-linking-with
Repo
Framework

A Multi-Scale Mapping Approach Based on a Deep Learning CNN Model for Reconstructing High-Resolution Urban DEMs


Title	A Multi-Scale Mapping Approach Based on a Deep Learning CNN Model for Reconstructing High-Resolution Urban DEMs
Authors	Ling Jiang, Yang Hu, Xilin Xia, Qiuhua Liang, Andrea Soltoggio
Abstract	The shortage of high-resolution urban digital elevation model (DEM) datasets has been a challenge for modelling urban flood and managing its risk. A solution is to develop effective approaches to reconstruct high-resolution DEMs from their low-resolution equivalents that are more widely available. However, the current high-resolution DEM reconstruction approaches mainly focus on natural topography. Few attempts have been made for urban topography which is typically an integration of complex man-made and natural features. This study proposes a novel multi-scale mapping approach based on convolutional neural network (CNN) to deal with the complex characteristics of urban topography and reconstruct high-resolution urban DEMs. The proposed multi-scale CNN model is firstly trained using urban DEMs that contain topographic features at different resolutions, and then used to reconstruct the urban DEM at a specified (high) resolution from a low-resolution equivalent. A two-level accuracy assessment approach is also designed to evaluate the performance of the proposed urban DEM reconstruction method, in terms of numerical accuracy and morphological accuracy. The proposed DEM reconstruction approach is applied to a 121 km2 urbanized area in London, UK. Compared with other commonly used methods, the current CNN based approach produces superior results, providing a cost-effective innovative method to acquire high-resolution DEMs in other data-scarce environments.
Tasks
Published	2019-07-19
URL	https://arxiv.org/abs/1907.12898v2
PDF	https://arxiv.org/pdf/1907.12898v2.pdf
PWC	https://paperswithcode.com/paper/a-multi-scale-mapping-approach-based-on-a
Repo
Framework

Deep Structured Neural Network for Event Temporal Relation Extraction


Title	Deep Structured Neural Network for Event Temporal Relation Extraction
Authors	Rujun Han, I-Hung Hsu, Mu Yang, Aram Galstyan, Ralph Weischedel, Nanyun Peng
Abstract	We propose a novel deep structured learning framework for event temporal relation extraction. The model consists of 1) a recurrent neural network (RNN) to learn scoring functions for pair-wise relations, and 2) a structured support vector machine (SSVM) to make joint predictions. The neural network automatically learns representations that account for long-term contexts to provide robust features for the structured model, while the SSVM incorporates domain knowledge such as transitive closure of temporal relations as constraints to make better globally consistent decisions. By jointly training the two components, our model combines the benefits of both data-driven learning and knowledge exploitation. Experimental results on three high-quality event temporal relation datasets (TCR, MATRES, and TB-Dense) demonstrate that incorporated with pre-trained contextualized embeddings, the proposed model achieves significantly better performances than the state-of-the-art methods on all three datasets. We also provide thorough ablation studies to investigate our model.
Tasks	Relation Extraction
Published	2019-09-22
URL	https://arxiv.org/abs/1909.10094v2
PDF	https://arxiv.org/pdf/1909.10094v2.pdf
PWC	https://paperswithcode.com/paper/190910094
Repo
Framework

Deep Octonion Networks


Title	Deep Octonion Networks
Authors	Jiasong Wu, Ling Xu, Youyong Kong, Lotfi Senhadji, Huazhong Shu
Abstract	Deep learning is a research hot topic in the field of machine learning. Real-value neural networks (Real NNs), especially deep real networks (DRNs), have been widely used in many research fields. In recent years, the deep complex networks (DCNs) and the deep quaternion networks (DQNs) have attracted more and more attentions. The octonion algebra, which is an extension of complex algebra and quaternion algebra, can provide more efficient and compact expression. This paper constructs a general framework of deep octonion networks (DONs) and provides the main building blocks of DONs such as octonion convolution, octonion batch normalization and octonion weight initialization; DONs are then used in image classification tasks for CIFAR-10 and CIFAR-100 data sets. Compared with the DRNs, the DCNs, and the DQNs, the proposed DONs have better convergence and higher classification accuracy. The success of DONs is also explained by multi-task learning.
Tasks	Image Classification, Multi-Task Learning
Published	2019-03-20
URL	http://arxiv.org/abs/1903.08478v1
PDF	http://arxiv.org/pdf/1903.08478v1.pdf
PWC	https://paperswithcode.com/paper/deep-octonion-networks
Repo
Framework

Sparsely Activated Networks: A new method for decomposing and compressing data


Title	Sparsely Activated Networks: A new method for decomposing and compressing data
Authors	Paschalis Bizopoulos
Abstract	Recent literature on unsupervised learning focused on designing structural priors with the aim of learning meaningful features, but without considering the description length of the representations. In this thesis, first we introduce the{\phi}metric that evaluates unsupervised models based on their reconstruction accuracy and the degree of compression of their internal representations. We then present and define two activation functions (Identity, ReLU) as base of reference and three sparse activation functions (top-k absolutes, Extrema-Pool indices, Extrema) as candidate structures that minimize the previously defined metric $\varphi$. We lastly present Sparsely Activated Networks (SANs) that consist of kernels with shared weights that, during encoding, are convolved with the input and then passed through a sparse activation function. During decoding, the same weights are convolved with the sparse activation map and subsequently the partial reconstructions from each weight are summed to reconstruct the input. We compare SANs using the five previously defined activation functions on a variety of datasets (Physionet, UCI-epilepsy, MNIST, FMNIST) and show that models that are selected using $\varphi$ have small description representation length and consist of interpretable kernels.
Tasks
Published	2019-10-30
URL	https://arxiv.org/abs/1911.00400v1
PDF	https://arxiv.org/pdf/1911.00400v1.pdf
PWC	https://paperswithcode.com/paper/sparsely-activated-networks-a-new-method-for
Repo
Framework

Training ASR models by Generation of Contextual Information


Title	Training ASR models by Generation of Contextual Information
Authors	Kritika Singh, Dmytro Okhonko, Jun Liu, Yongqiang Wang, Frank Zhang, Ross Girshick, Sergey Edunov, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed
Abstract	Supervised ASR models have reached unprecedented levels of accuracy, thanks in part to ever-increasing amounts of labelled training data. However, in many applications and locales, only moderate amounts of data are available, which has led to a surge in semi- and weakly-supervised learning research. In this paper, we conduct a large-scale study evaluating the effectiveness of weakly-supervised learning for speech recognition by using loosely related contextual information as a surrogate for ground-truth labels. For weakly supervised training, we use 50k hours of public English social media videos along with their respective titles and post text to train an encoder-decoder transformer model. Our best encoder-decoder models achieve an average of 20.8% WER reduction over a 1000 hours supervised baseline, and an average of 13.4% WER reduction when using only the weakly supervised encoder for CTC fine-tuning. Our results show that our setup for weak supervision improved both the encoder acoustic representations as well as the decoder language generation abilities.
Tasks	Speech Recognition, Text Generation
Published	2019-10-27
URL	https://arxiv.org/abs/1910.12367v2
PDF	https://arxiv.org/pdf/1910.12367v2.pdf
PWC	https://paperswithcode.com/paper/training-asr-models-by-generation-of
Repo
Framework

Large-Batch Training for LSTM and Beyond


Title	Large-Batch Training for LSTM and Beyond
Authors	Yang You, Jonathan Hseu, Chris Ying, James Demmel, Kurt Keutzer, Cho-Jui Hsieh
Abstract	Large-batch training approaches have enabled researchers to utilize large-scale distributed processing and greatly accelerate deep-neural net (DNN) training. For example, by scaling the batch size from 256 to 32K, researchers have been able to reduce the training time of ResNet50 on ImageNet from 29 hours to 2.2 minutes (Ying et al., 2018). In this paper, we propose a new approach called linear-epoch gradual-warmup (LEGW) for better large-batch training. With LEGW, we are able to conduct large-batch training for both CNNs and RNNs with the Sqrt Scaling scheme. LEGW enables Sqrt Scaling scheme to be useful in practice and as a result we achieve much better results than the Linear Scaling learning rate scheme. For LSTM applications, we are able to scale the batch size by a factor of 64 without losing accuracy and without tuning the hyper-parameters. For CNN applications, LEGW is able to achieve the same accuracy even as we scale the batch size to 32K. LEGW works better than previous large-batch auto-tuning techniques. LEGW achieves a 5.3X average speedup over the baselines for four LSTM-based applications on the same hardware. We also provide some theoretical explanations for LEGW.
Tasks
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08256v1
PDF	http://arxiv.org/pdf/1901.08256v1.pdf
PWC	https://paperswithcode.com/paper/large-batch-training-for-lstm-and-beyond
Repo
Framework

AquaSight: Automatic Water Impurity Detection Utilizing Convolutional Neural Networks


Title	AquaSight: Automatic Water Impurity Detection Utilizing Convolutional Neural Networks
Authors	Ankit Gupta, Elliott Ruebush
Abstract	According to the United Nations World Water Assessment Programme, every day, 2 million tons of sewage and industrial and agricultural waste are discharged into the worlds water. In order to address this pervasive issue of increasing water pollution, while ensuring that the global population has an efficient, accurate, and low cost method to assess whether the water they drink is contaminated, we propose AquaSight, a novel mobile application that utilizes deep learning methods, specifically Convolutional Neural Networks, for automated water impurity detection. After comprehensive training with a dataset of 105 images representing varying magnitudes of contamination, the deep learning algorithm achieved a 96 percent accuracy and loss of 0.108. Furthermore, the machine learning model uses efficient analysis of the turbidity and transparency levels of water to estimate a particular sample of waters level of contamination. When deployed, the AquaSight system will provide an efficient way for individuals to secure an estimation of water quality, alerting local and national government to take action and potentially saving millions of lives worldwide.
Tasks
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07573v1
PDF	https://arxiv.org/pdf/1907.07573v1.pdf
PWC	https://paperswithcode.com/paper/aquasight-automatic-water-impurity-detection
Repo
Framework

Abductive Commonsense Reasoning


Title	Abductive Commonsense Reasoning
Authors	Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Scott Wen-tau Yih, Yejin Choi
Abstract	Abductive reasoning is inference to the most plausible explanation. For example, if Jenny finds her house in a mess when she returns from work, and remembers that she left a window open, she can hypothesize that a thief broke into her house and caused the mess, as the most plausible explanation. While abduction has long been considered to be at the core of how people interpret and read between the lines in natural language (Hobbs et al., 1988), there has been relatively little research in support of abductive natural language inference and generation. We present the first study that investigates the viability of language-based abductive reasoning. We introduce a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations. Based on this dataset, we conceptualize two new tasks – (i) Abductive NLI: a multiple-choice question answering task for choosing the more likely explanation, and (ii) Abductive NLG: a conditional generation task for explaining given observations in natural language. On Abductive NLI, the best model achieves 68.9% accuracy, well below human performance of 91.4%. On Abductive NLG, the current best language generators struggle even more, as they lack reasoning capabilities that are trivial for humans. Our analysis leads to new insights into the types of reasoning that deep pre-trained language models fail to perform–despite their strong performance on the related but more narrowly defined task of entailment NLI–pointing to interesting avenues for future research.
Tasks	Natural Language Inference, Question Answering
Published	2019-08-15
URL	https://arxiv.org/abs/1908.05739v2
PDF	https://arxiv.org/pdf/1908.05739v2.pdf
PWC	https://paperswithcode.com/paper/abductive-commonsense-reasoning
Repo
Framework

Pre-training in Deep Reinforcement Learning for Automatic Speech Recognition


Title	Pre-training in Deep Reinforcement Learning for Automatic Speech Recognition
Authors	Thejan Rajapakshe, Rajib Rana, Siddique Latif, Sara Khalifa, Björn W. Schuller
Abstract	Deep reinforcement learning (deep RL) is a combination of deep learning with reinforcement learning principles to create efficient methods that can learn by interacting with its environment. This led to breakthroughs in many complex tasks that were previously difficult to solve. However, deep RL requires a large amount of training time that makes it difficult to use in various real-life applications like human-computer interaction (HCI). Therefore, in this paper, we study pre-training in deep RL to reduce the training time and improve the performance in speech recognition, a popular application of HCI. We achieve significantly improved performance in less time on a publicly available speech command recognition dataset.
Tasks	Speech Recognition
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11256v2
PDF	https://arxiv.org/pdf/1910.11256v2.pdf
PWC	https://paperswithcode.com/paper/pre-training-in-deep-reinforcement-learning
Repo
Framework