April 2, 2020

3274 words 16 mins read

Paper Group ANR 314

Paper Group ANR 314

RECAST: Interactive Auditing of Automatic Toxicity Detection Models. Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis. Object Relational Graph with Teacher-Recommended Learning for Video Captioning. Quantized Neural Network Inference with Precision Batching. A more abstractive summarization model. Interval Neural Networks as …

RECAST: Interactive Auditing of Automatic Toxicity Detection Models

Title RECAST: Interactive Auditing of Automatic Toxicity Detection Models
Authors Austin P. Wright, Omar Shaikh, Haekyu Park, Will Epperson, Muhammed Ahmed, Stephane Pinel, Diyi Yang, Duen Horng, Chau
Abstract As toxic language becomes nearly pervasive online, there has been increasing interest in leveraging the advancements in natural language processing (NLP), from very large transformer models to automatically detecting and removing toxic comments. Despite the fairness concerns, lack of adversarial robustness, and limited prediction explainability for deep learning systems, there is currently little work for auditing these systems and understanding how they work for both developers and users. We present our ongoing work, RECAST, an interactive tool for examining toxicity detection models by visualizing explanations for predictions and providing alternative wordings for detected toxic speech.
Tasks
Published 2020-01-07
URL https://arxiv.org/abs/2001.01819v1
PDF https://arxiv.org/pdf/2001.01819v1.pdf
PWC https://paperswithcode.com/paper/recast-interactive-auditing-of-automatic
Repo
Framework

Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

Title Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis
Authors Koulik Khamaru, Ashwin Pananjady, Feng Ruan, Martin J. Wainwright, Michael I. Jordan
Abstract We address the problem of policy evaluation in discounted Markov decision processes, and provide instance-dependent guarantees on the $\ell_\infty$-error under a generative model. We establish both asymptotic and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms. Theory-inspired simulations show that the widely-used temporal difference (TD) algorithm is strictly suboptimal when evaluated in a non-asymptotic setting, even when combined with Polyak-Ruppert iterate averaging. We remedy this issue by introducing and analyzing variance-reduced forms of stochastic approximation, showing that they achieve non-asymptotic, instance-dependent optimality up to logarithmic factors.
Tasks
Published 2020-03-16
URL https://arxiv.org/abs/2003.07337v1
PDF https://arxiv.org/pdf/2003.07337v1.pdf
PWC https://paperswithcode.com/paper/is-temporal-difference-learning-optimal-an
Repo
Framework
Title Object Relational Graph with Teacher-Recommended Learning for Video Captioning
Authors Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, Zhengjun Zha
Abstract Taking full advantage of the information from both vision and language is critical for the video captioning task. Existing models lack adequate visual representation due to the neglect of interaction between object, and sufficient training for content-related words due to long-tailed problems. In this paper, we propose a complete video captioning system including both a novel model and an effective training strategy. Specifically, we propose an object relational graph (ORG) based encoder, which captures more detailed interaction features to enrich visual representation. Meanwhile, we design a teacher-recommended learning (TRL) method to make full use of the successful external language model (ELM) to integrate the abundant linguistic knowledge into the caption model. The ELM generates more semantically similar word proposals which extend the ground-truth words used for training to deal with the long-tailed problem. Experimental evaluations on three benchmarks: MSVD, MSR-VTT and VATEX show the proposed ORG-TRL system achieves state-of-the-art performance. Extensive ablation studies and visualizations illustrate the effectiveness of our system.
Tasks Language Modelling, Video Captioning
Published 2020-02-26
URL https://arxiv.org/abs/2002.11566v1
PDF https://arxiv.org/pdf/2002.11566v1.pdf
PWC https://paperswithcode.com/paper/object-relational-graph-with-teacher
Repo
Framework

Quantized Neural Network Inference with Precision Batching

Title Quantized Neural Network Inference with Precision Batching
Authors Maximilian Lam, Zachary Yedidia, Colby Banbury, Vijay Janapa Reddi
Abstract We present PrecisionBatching, a quantized inference algorithm for speeding up neural network execution on traditional hardware platforms at low bitwidths without the need for retraining or recalibration. PrecisionBatching decomposes a neural network into individual bitlayers and accumulates them using fast 1-bit operations while maintaining activations in full precision. PrecisionBatching not only facilitates quantized inference at low bitwidths (< 8 bits) without the need for retraining/recalibration, but also 1) enables traditional hardware platforms the ability to realize inference speedups at a finer granularity of quantization (e.g: 1-16 bit execution) and 2) allows accuracy and speedup tradeoffs at runtime by exposing the number of bitlayers to accumulate as a tunable parameter. Across a variety of applications (MNIST, language modeling, natural language inference) and neural network architectures (fully connected, RNN, LSTM), PrecisionBatching yields end-to-end speedups of over 8x on a GPU within a < 1% error margin of the full precision baseline, outperforming traditional 8-bit quantized inference by over 1.5x-2x at the same error tolerance.
Tasks Language Modelling, Natural Language Inference, Quantization
Published 2020-02-26
URL https://arxiv.org/abs/2003.00822v1
PDF https://arxiv.org/pdf/2003.00822v1.pdf
PWC https://paperswithcode.com/paper/quantized-neural-network-inference-with
Repo
Framework

A more abstractive summarization model

Title A more abstractive summarization model
Authors Satyaki Chakraborty, Xinya Li, Sayak Chakraborty
Abstract Pointer-generator network is an extremely popular method of text summarization. More recent works in this domain still build on top of the baseline pointer generator by augmenting a content selection phase, or by decomposing the decoder into a contextual network and a language model. However, all such models that are based on the pointer-generator base architecture cannot generate novel words in the summary and mostly copy words from the source text. In our work, we first thoroughly investigate why the pointer-generator network is unable to generate novel words, and then address that by adding an Out-of-vocabulary (OOV) penalty. This enables us to improve the amount of novelty/abstraction significantly. We use normalized n-gram novelty scores as a metric for determining the level of abstraction. Moreover, we also report rouge scores of our model since most summarization models are evaluated with R-1, R-2, R-L scores.
Tasks Abstractive Text Summarization, Language Modelling, Text Summarization
Published 2020-02-25
URL https://arxiv.org/abs/2002.10959v1
PDF https://arxiv.org/pdf/2002.10959v1.pdf
PWC https://paperswithcode.com/paper/a-more-abstractive-summarization-model
Repo
Framework

Interval Neural Networks as Instability Detectors for Image Reconstructions

Title Interval Neural Networks as Instability Detectors for Image Reconstructions
Authors Jan Macdonald, Maximilian März, Luis Oala, Wojciech Samek
Abstract This work investigates the detection of instabilities that may occur when utilizing deep learning models for image reconstruction tasks. Although neural networks often empirically outperform traditional reconstruction methods, their usage for sensitive medical applications remains controversial. Indeed, in a recent series of works, it has been demonstrated that deep learning approaches are susceptible to various types of instabilities, caused for instance by adversarial noise or out-of-distribution features. It is argued that this phenomenon can be observed regardless of the underlying architecture and that there is no easy remedy. Based on this insight, the present work demonstrates on two use cases how uncertainty quantification methods can be employed as instability detectors. In particular, it is shown that the recently proposed Interval Neural Networks are highly effective in revealing instabilities of reconstructions. Such an ability is crucial to ensure a safe use of deep learning-based methods for medical image reconstruction.
Tasks Image Reconstruction
Published 2020-03-27
URL https://arxiv.org/abs/2003.13471v1
PDF https://arxiv.org/pdf/2003.13471v1.pdf
PWC https://paperswithcode.com/paper/interval-neural-networks-as-instability
Repo
Framework

Interval Neural Networks: Uncertainty Scores

Title Interval Neural Networks: Uncertainty Scores
Authors Luis Oala, Cosmas Heiß, Jan Macdonald, Maximilian März, Wojciech Samek, Gitta Kutyniok
Abstract We propose a fast, non-Bayesian method for producing uncertainty scores in the output of pre-trained deep neural networks (DNNs) using a data-driven interval propagating network. This interval neural network (INN) has interval valued parameters and propagates its input using interval arithmetic. The INN produces sensible lower and upper bounds encompassing the ground truth. We provide theoretical justification for the validity of these bounds. Furthermore, its asymmetric uncertainty scores offer additional, directional information beyond what Gaussian-based, symmetric variance estimation can provide. We find that noise in the data is adequately captured by the intervals produced with our method. In numerical experiments on an image reconstruction task, we demonstrate the practical utility of INNs as a proxy for the prediction error in comparison to two state-of-the-art uncertainty quantification methods. In summary, INNs produce fast, theoretically justified uncertainty scores for DNNs that are easy to interpret, come with added information and pose as improved error proxies - features that may prove useful in advancing the usability of DNNs especially in sensitive applications such as health care.
Tasks Image Reconstruction
Published 2020-03-25
URL https://arxiv.org/abs/2003.11566v1
PDF https://arxiv.org/pdf/2003.11566v1.pdf
PWC https://paperswithcode.com/paper/interval-neural-networks-uncertainty-scores
Repo
Framework

Sequence Preserving Network Traffic Generation

Title Sequence Preserving Network Traffic Generation
Authors Sigal Shaked, Amos Zamir, Roman Vainshtein, Moshe Unger, Lior Rokach, Rami Puzis, Bracha Shapira
Abstract We present the Network Traffic Generator (NTG), a framework for perturbing recorded network traffic with the purpose of generating diverse but realistic background traffic for network simulation and what-if analysis in enterprise environments. The framework preserves many characteristics of the original traffic recorded in an enterprise, as well as sequences of network activities. Using the proposed framework, the original traffic flows are profiled using 200 cross-protocol features. The traffic is aggregated into flows of packets between IP pairs and clustered into groups of similar network activities. Sequences of network activities are then extracted. We examined two methods for extracting sequences of activities: a Markov model and a neural language model. Finally, new traffic is generated using the extracted model. We developed a prototype of the framework and conducted extensive experiments based on two real network traffic collections. Hypothesis testing was used to examine the difference between the distribution of original and generated features, showing that 30-100% of the extracted features were preserved. Small differences between n-gram perplexities in sequences of network activities in the original and generated traffic, indicate that sequences of network activities were well preserved.
Tasks Language Modelling
Published 2020-02-23
URL https://arxiv.org/abs/2002.09832v1
PDF https://arxiv.org/pdf/2002.09832v1.pdf
PWC https://paperswithcode.com/paper/sequence-preserving-network-traffic
Repo
Framework

Active Learning for Entity Alignment

Title Active Learning for Entity Alignment
Authors Max Berrendorf, Evgeniy Faerman, Volker Tresp
Abstract In this work, we propose a novel framework for the labeling of entity alignments in knowledge graph datasets. Different strategies to select informative instances for the human labeler build the core of our framework . We illustrate how the labeling of entity alignments is different from assigning class labels to single instances and how these differences affect the labeling efficiency. Based on these considerations we propose and evaluate different active and passive learning strategies. One of our main findings is that passive learning approaches, which can be efficiently precomputed and deployed more easily, achieve performance comparable to the active learning strategies. Moreover, we can dynamically learn to combine these scores to obtain an even better heuristic.
Tasks Active Learning, Entity Alignment
Published 2020-01-24
URL https://arxiv.org/abs/2001.08943v1
PDF https://arxiv.org/pdf/2001.08943v1.pdf
PWC https://paperswithcode.com/paper/active-learning-for-entity-alignment
Repo
Framework

Quantum Semantic Learning by Reverse Annealing an Adiabatic Quantum Computer

Title Quantum Semantic Learning by Reverse Annealing an Adiabatic Quantum Computer
Authors Lorenzo Rocutto, Claudio Destri, Enrico Prati
Abstract Boltzmann Machines constitute a class of neural networks with applications to image reconstruction, pattern classification and unsupervised learning in general. Their most common variants, called Restricted Boltzmann Machines (RBMs) exhibit a good trade-off between computability on existing silicon-based hardware and generality of possible applications. Still, the diffusion of RBMs is quite limited, since their training process proves to be hard. The advent of commercial Adiabatic Quantum Computers (AQCs) raised the expectation that the implementations of RBMs on such quantum devices could increase the training speed with respect to conventional hardware. To date, however, the implementation of RBM networks on AQCs has been limited by the low qubit connectivity when each qubit acts as a node of the neural network. Here we demonstrate the feasibility of a complete RBM on AQCs, thanks to an embedding that associates its nodes to virtual qubits, thus outperforming previous implementations based on incomplete graphs. Moreover, to accelerate the learning, we implement a semantic quantum search which, contrary to previous proposals, takes the input data as initial boundary conditions to start each learning step of the RBM, thanks to a reverse annealing schedule. Such an approach, unlike the more conventional forward annealing schedule, allows sampling configurations in a meaningful neighborhood of the training data, mimicking the behavior of the classical Gibbs sampling algorithm. We show that the learning based on reverse annealing quickly raises the sampling probability of a meaningful subset of the set of the configurations. Even without a proper optimization of the annealing schedule, the RBM semantically trained by reverse annealing achieves better scores on reconstruction tasks.
Tasks Image Reconstruction
Published 2020-03-25
URL https://arxiv.org/abs/2003.11945v2
PDF https://arxiv.org/pdf/2003.11945v2.pdf
PWC https://paperswithcode.com/paper/quantum-semantic-learning-by-reverse
Repo
Framework

A Dataset Independent Set of Baselines for Relation Prediction in Argument Mining

Title A Dataset Independent Set of Baselines for Relation Prediction in Argument Mining
Authors Oana Cocarascu, Elena Cabrio, Serena Villata, Francesca Toni
Abstract Argument Mining is the research area which aims at extracting argument components and predicting argumentative relations (i.e.,support and attack) from text. In particular, numerous approaches have been proposed in the literature to predict the relations holding between the arguments, and application-specific annotated resources were built for this purpose. Despite the fact that these resources have been created to experiment on the same task, the definition of a single relation prediction method to be successfully applied to a significant portion of these datasets is an open research problem in Argument Mining. This means that none of the methods proposed in the literature can be easily ported from one resource to another. In this paper, we address this problem by proposing a set of dataset independent strong neural baselines which obtain homogeneous results on all the datasets proposed in the literature for the argumentative relation prediction task. Thus, our baselines can be employed by the Argument Mining community to compare more effectively how well a method performs on the argumentative relation prediction task.
Tasks Argument Mining
Published 2020-02-14
URL https://arxiv.org/abs/2003.04970v1
PDF https://arxiv.org/pdf/2003.04970v1.pdf
PWC https://paperswithcode.com/paper/a-dataset-independent-set-of-baselines-for
Repo
Framework
Title Path Planning in Dynamic Environments using Generative RNNs and Monte Carlo Tree Search
Authors Stuart Eiffert, He Kong, Navid Pirmarzdashti, Salah Sukkarieh
Abstract State of the art methods for robotic path planning in dynamic environments, such as crowds or traffic, rely on hand crafted motion models for agents. These models often do not reflect interactions of agents in real world scenarios. To overcome this limitation, this paper proposes an integrated path planning framework using generative Recurrent Neural Networks within a Monte Carlo Tree Search (MCTS). This approach uses a learnt model of social response to predict crowd dynamics during planning across the action space. This extends our recent work using generative RNNs to learn the relationship between planned robotic actions and the likely response of a crowd. We show that the proposed framework can considerably improve motion prediction accuracy during interactions, allowing more effective path planning. The performance of our method is compared in simulation with existing methods for collision avoidance in a crowd of pedestrians, demonstrating the ability to control future states of nearby individuals. We also conduct preliminary real world tests to validate the effectiveness of our method.
Tasks motion prediction
Published 2020-01-30
URL https://arxiv.org/abs/2001.11597v1
PDF https://arxiv.org/pdf/2001.11597v1.pdf
PWC https://paperswithcode.com/paper/path-planning-in-dynamic-environments-using
Repo
Framework

Automated Labelling using an Attention model for Radiology reports of MRI scans (ALARM)

Title Automated Labelling using an Attention model for Radiology reports of MRI scans (ALARM)
Authors David A. Wood, Jeremy Lynch, Sina Kafiabadi, Emily Guilhem, Aisha Al Busaidi, Antanas Montvila, Thomas Varsavsky, Juveria Siddiqui, Naveen Gadapa, Matthew Townend, Martin Kiik, Keena Patel, Gareth Barker, Sebastian Ourselin, James H. Cole, Thomas C. Booth
Abstract Labelling large datasets for training high-capacity neural networks is a major obstacle to the development of deep learning-based medical imaging applications. Here we present a transformer-based network for magnetic resonance imaging (MRI) radiology report classification which automates this task by assigning image labels on the basis of free-text expert radiology reports. Our model’s performance is comparable to that of an expert radiologist, and better than that of an expert physician, demonstrating the feasibility of this approach. We make code available online for researchers to label their own MRI datasets for medical imaging applications.
Tasks
Published 2020-02-16
URL https://arxiv.org/abs/2002.06588v1
PDF https://arxiv.org/pdf/2002.06588v1.pdf
PWC https://paperswithcode.com/paper/automated-labelling-using-an-attention-model
Repo
Framework

On the performance of different excitation-residual blocks for Acoustic Scene Classification

Title On the performance of different excitation-residual blocks for Acoustic Scene Classification
Authors Javier Naranjo-Alcazar, Sergi Perez-Castanos, Pedro Zuccarello, Maximo Cobos
Abstract Acoustic Scene Classification (ASC) is a problem related to the field of machine listening whose objective is to classify/tag an audio clip in a predefined label describing a scene location. Interest in this topic has grown so much over the years that an annual international challenge (Dectection and Classification of Acoustic Scenes and Events) is held to propose novel solutions. Solutions to these problems often incorporate different methods such as data augmentation or with an ensemble of various models. Although the main line of research in the state-of-the-art usually implements these methods, considerable improvements and state-of-the-art results can also be achieved only by modifying the architecture of convolutional neural networks (CNNs). In this work we propose two novel squeeze-excitation blocks to improve the accuracy of an ASC framework by modifying the architecture of the residual block in a CNN together with an analysis of several state-of-the-art blocks. The main idea of squeeze-excitation blocks is to learn spatial and channel-wise feature maps independently instead of jointly as standard CNNs do. This is done by some global grouping operators, linear operators and a final calibration between the input of the block and the relationships obtained by that block. The behavior of the block that implements these operators and, therefore, the entire neural network can be modified depending on the input to the block, the residual configurations and the non-linear activations, that is, at what point of the block they are performed. The analysis has been carried out using TAU Urban Acoustic Scenes 2019 dataset presented in DCASE 2019 edition. All configurations discussed in this document exceed baseline proposed by DCASE organization by 13% percentage points. In turn, the novel configurations proposed in this paper exceed the residual configuration proposed in previous works.
Tasks Acoustic Scene Classification, Calibration, Data Augmentation, Scene Classification
Published 2020-03-20
URL https://arxiv.org/abs/2003.09284v1
PDF https://arxiv.org/pdf/2003.09284v1.pdf
PWC https://paperswithcode.com/paper/on-the-performance-of-different-excitation
Repo
Framework

On transfer learning of neural networks using bi-fidelity data for uncertainty propagation

Title On transfer learning of neural networks using bi-fidelity data for uncertainty propagation
Authors Subhayan De, Jolene Britton, Matthew Reynolds, Ryan Skinner, Kenneth Jansen, Alireza Doostan
Abstract Due to their high degree of expressiveness, neural networks have recently been used as surrogate models for mapping inputs of an engineering system to outputs of interest. Once trained, neural networks are computationally inexpensive to evaluate and remove the need for repeated evaluations of computationally expensive models in uncertainty quantification applications. However, given the highly parameterized construction of neural networks, especially deep neural networks, accurate training often requires large amounts of simulation data that may not be available in the case of computationally expensive systems. In this paper, to alleviate this issue for uncertainty propagation, we explore the application of transfer learning techniques using training data generated from both high- and low-fidelity models. We explore two strategies for coupling these two datasets during the training procedure, namely, the standard transfer learning and the bi-fidelity weighted learning. In the former approach, a neural network model mapping the inputs to the outputs of interest is trained based on the low-fidelity data. The high-fidelity data is then used to adapt the parameters of the upper layer(s) of the low-fidelity network, or train a simpler neural network to map the output of the low-fidelity network to that of the high-fidelity model. In the latter approach, the entire low-fidelity network parameters are updated using data generated via a Gaussian process model trained with a small high-fidelity dataset. The parameter updates are performed via a variant of stochastic gradient descent with learning rates given by the Gaussian process model. Using three numerical examples, we illustrate the utility of these bi-fidelity transfer learning methods where we focus on accuracy improvement achieved by transfer learning over standard training approaches.
Tasks Transfer Learning
Published 2020-02-11
URL https://arxiv.org/abs/2002.04495v1
PDF https://arxiv.org/pdf/2002.04495v1.pdf
PWC https://paperswithcode.com/paper/on-transfer-learning-of-neural-networks-using
Repo
Framework
comments powered by Disqus