January 31, 2020

3055 words 15 mins read

Paper Group ANR 91

Multilingual ASR with Massive Data Augmentation. Composing Diverse Policies for Temporally Extended Tasks. An Introduction to Variational Autoencoders. A Joint Model for Multimodal Document Quality Assessment. A binary-activation, multi-level weight RNN and training algorithm for processing-in-memory inference with eNVM. TACAM: Topic And Context Aw …

Multilingual ASR with Massive Data Augmentation


Title	Multilingual ASR with Massive Data Augmentation
Authors	Chunxi Liu, Qiaochu Zhang, Xiaohui Zhang, Kritika Singh, Yatharth Saraf, Geoffrey Zweig
Abstract	Towards developing high-performing ASR for low-resource languages, approaches to address the lack of resources are to make use of data from multiple languages, and to augment the training data by creating acoustic variations. In this work we present a single grapheme-based ASR model learned on 7 geographically proximal languages, using standard hybrid BLSTM-HMM acoustic models with lattice-free MMI objective. We build the single ASR grapheme set via taking the union over each language-specific grapheme set, and we find such multilingual ASR model can perform language-independent recognition on all 7 languages, and substantially outperform each monolingual ASR model. Secondly, we evaluate the efficacy of multiple data augmentation alternatives within language, as well as their complementarity with multilingual modeling. Overall, we show that the proposed multilingual ASR with various data augmentation can not only recognize any within training set languages, but also provide large ASR performance improvements.
Tasks	Data Augmentation
Published	2019-09-14
URL	https://arxiv.org/abs/1909.06522v1
PDF	https://arxiv.org/pdf/1909.06522v1.pdf
PWC	https://paperswithcode.com/paper/multilingual-asr-with-massive-data
Repo
Framework

Composing Diverse Policies for Temporally Extended Tasks


Title	Composing Diverse Policies for Temporally Extended Tasks
Authors	Daniel Angelov, Yordan Hristov, Michael Burke, Subramanian Ramamoorthy
Abstract	Robot control policies for temporally extended and sequenced tasks are often characterized by discontinuous switches between different local dynamics. These change-points are often exploited in hierarchical motion planning to build approximate models and to facilitate the design of local, region-specific controllers. However, it becomes combinatorially challenging to implement such a pipeline for complex temporally extended tasks, especially when the sub-controllers work on different information streams, time scales and action spaces. In this paper, we introduce a method that can compose diverse policies comprising motion planning trajectories, dynamic motion primitives and neural network controllers. We introduce a global goal scoring estimator that uses local, per-motion primitive dynamics models and corresponding activation state-space sets to sequence diverse policies in a locally optimal fashion. We use expert demonstrations to convert what is typically viewed as a gradient-based learning process into a planning process without explicitly specifying pre- and post-conditions. We first illustrate the proposed framework using an MDP benchmark to showcase robustness to action and model dynamics mismatch, and then with a particularly complex physical gear assembly task, solved on a PR2 robot. We show that the proposed approach successfully discovers the optimal sequence of controllers and solves both tasks efficiently.
Tasks	Hierarchical Reinforcement Learning, Motion Planning
Published	2019-07-18
URL	https://arxiv.org/abs/1907.08199v3
PDF	https://arxiv.org/pdf/1907.08199v3.pdf
PWC	https://paperswithcode.com/paper/composing-diverse-policies-for-temporally
Repo
Framework

An Introduction to Variational Autoencoders


Title	An Introduction to Variational Autoencoders
Authors	Diederik P. Kingma, Max Welling
Abstract	Variational autoencoders provide a principled framework for learning deep latent-variable models and corresponding inference models. In this work, we provide an introduction to variational autoencoders and some important extensions.
Tasks	Latent Variable Models
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02691v3
PDF	https://arxiv.org/pdf/1906.02691v3.pdf
PWC	https://paperswithcode.com/paper/an-introduction-to-variational-autoencoders
Repo
Framework

A Joint Model for Multimodal Document Quality Assessment


Title	A Joint Model for Multimodal Document Quality Assessment
Authors	Aili Shen, Bahar Salehi, Timothy Baldwin, Jianzhong Qi
Abstract	The quality of a document is affected by various factors, including grammaticality, readability, stylistics, and expertise depth, making the task of document quality assessment a complex one. In this paper, we explore this task in the context of assessing the quality of Wikipedia articles and academic papers. Observing that the visual rendering of a document can capture implicit quality indicators that are not present in the document text — such as images, font choices, and visual layout — we propose a joint model that combines the text content with a visual rendering of the document for document quality assessment. Experimental results over two datasets reveal that textual and visual features are complementary, achieving state-of-the-art results.
Tasks
Published	2019-01-04
URL	http://arxiv.org/abs/1901.01010v2
PDF	http://arxiv.org/pdf/1901.01010v2.pdf
PWC	https://paperswithcode.com/paper/a-joint-model-for-multimodal-document-quality
Repo
Framework

A binary-activation, multi-level weight RNN and training algorithm for processing-in-memory inference with eNVM


Title	A binary-activation, multi-level weight RNN and training algorithm for processing-in-memory inference with eNVM
Authors	Siming Ma, David Brooks, Gu-Yeon Wei
Abstract	We present a new algorithm for training neural networks with binary activations and multi-level weights, which enables efficient processing-in-memory circuits with eNVM. Binary activations obviate costly DACs and ADCs. Multi-level weights leverage multi-level eNVM cells. Compared with previous quantization algorithms, our method not only works for feed-forward networks including fully-connected and convolutional, but also achieves higher accuracy and noise resilience for recurrent networks. In particular, we present a RNN trigger-word detection PIM accelerator, whose modeling results demonstrate high performance using our new training algorithm.
Tasks	Quantization
Published	2019-11-30
URL	https://arxiv.org/abs/1912.00106v2
PDF	https://arxiv.org/pdf/1912.00106v2.pdf
PWC	https://paperswithcode.com/paper/a-binary-activation-multi-level-weight-rnn
Repo
Framework

TACAM: Topic And Context Aware Argument Mining


Title	TACAM: Topic And Context Aware Argument Mining
Authors	Michael Fromm, Evgeniy Faerman, Thomas Seidl
Abstract	In this work we address the problem of argument search. The purpose of argument search is the distillation of pro and contra arguments for requested topics from large text corpora. In previous works, the usual approach is to use a standard search engine to extract text parts which are relevant to the given topic and subsequently use an argument recognition algorithm to select arguments from them. The main challenge in the argument recognition task, which is also known as argument mining, is that often sentences containing arguments are structurally similar to purely informative sentences without any stance about the topic. In fact, they only differ semantically. Most approaches use topic or search term information only for the first search step and therefore assume that arguments can be classified independently of a topic. We argue that topic information is crucial for argument mining, since the topic defines the semantic context of an argument. Precisely, we propose different models for the classification of arguments, which take information about a topic of an argument into account. Moreover, to enrich the context of a topic and to let models understand the context of the potential argument better, we integrate information from different external sources such as Knowledge Graphs or pre-trained NLP models. Our evaluation shows that considering topic information, especially in connection with external information, provides a significant performance boost for the argument mining task.
Tasks	Argument Mining, Knowledge Graphs
Published	2019-05-26
URL	https://arxiv.org/abs/1906.00923v2
PDF	https://arxiv.org/pdf/1906.00923v2.pdf
PWC	https://paperswithcode.com/paper/190600923
Repo
Framework

Learning-Based Animation of Clothing for Virtual Try-On


Title	Learning-Based Animation of Clothing for Virtual Try-On
Authors	Igor Santesteban, Miguel A. Otaduy, Dan Casas
Abstract	This paper presents a learning-based clothing animation method for highly efficient virtual try-on simulation. Given a garment, we preprocess a rich database of physically-based dressed character simulations, for multiple body shapes and animations. Then, using this database, we train a learning-based model of cloth drape and wrinkles, as a function of body shape and dynamics. We propose a model that separates global garment fit, due to body shape, from local garment wrinkles, due to both pose dynamics and body shape. We use a recurrent neural network to regress garment wrinkles, and we achieve highly plausible nonlinear effects, in contrast to the blending artifacts suffered by previous methods. At runtime, dynamic virtual try-on animations are produced in just a few milliseconds for garments with thousands of triangles. We show qualitative and quantitative analysis of results
Tasks
Published	2019-03-17
URL	http://arxiv.org/abs/1903.07190v1
PDF	http://arxiv.org/pdf/1903.07190v1.pdf
PWC	https://paperswithcode.com/paper/learning-based-animation-of-clothing-for
Repo
Framework

ZuCo 2.0: A Dataset of Physiological Recordings During Natural Reading and Annotation


Title	ZuCo 2.0: A Dataset of Physiological Recordings During Natural Reading and Annotation
Authors	Nora Hollenstein, Marius Troendle, Ce Zhang, Nicolas Langer
Abstract	We recorded and preprocessed ZuCo 2.0, a new dataset of simultaneous eye-tracking and electroencephalography during natural reading and during annotation. This corpus contains gaze and brain activity data of 739 sentences, 349 in a normal reading paradigm and 390 in a task-specific paradigm, in which the 18 participants actively search for a semantic relation type in the given sentences as a linguistic annotation task. This new dataset complements ZuCo 1.0 by providing experiments designed to analyze the differences in cognitive processing between natural reading and annotation. The data is freely available here: https://osf.io/2urht/.
Tasks	Eye Tracking
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00903v3
PDF	https://arxiv.org/pdf/1912.00903v3.pdf
PWC	https://paperswithcode.com/paper/zuco-20-a-dataset-of-physiological-recordings
Repo
Framework

SID4VAM: A Benchmark Dataset with Synthetic Images for Visual Attention Modeling


Title	SID4VAM: A Benchmark Dataset with Synthetic Images for Visual Attention Modeling
Authors	David Berga, Xosé R. Fdez-Vidal, Xavier Otazu, Xosé M. Pardo
Abstract	A benchmark of saliency models performance with a synthetic image dataset is provided. Model performance is evaluated through saliency metrics as well as the influence of model inspiration and consistency with human psychophysics. SID4VAM is composed of 230 synthetic images, with known salient regions. Images were generated with 15 distinct types of low-level features (e.g. orientation, brightness, color, size…) with a target-distractor pop-out type of synthetic patterns. We have used Free-Viewing and Visual Search task instructions and 7 feature contrasts for each feature category. Our study reveals that state-of-the-art Deep Learning saliency models do not perform well with synthetic pattern images, instead, models with Spectral/Fourier inspiration outperform others in saliency metrics and are more consistent with human psychophysical experimentation. This study proposes a new way to evaluate saliency models in the forthcoming literature, accounting for synthetic images with uniquely low-level feature contexts, distinct from previous eye tracking image datasets.
Tasks	Eye Tracking
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13066v1
PDF	https://arxiv.org/pdf/1910.13066v1.pdf
PWC	https://paperswithcode.com/paper/sid4vam-a-benchmark-dataset-with-synthetic-1
Repo
Framework

ISTHMUS: Secure, Scalable, Real-time and Robust Machine Learning Platform for Healthcare


Title	ISTHMUS: Secure, Scalable, Real-time and Robust Machine Learning Platform for Healthcare
Authors	Akshay Arora, Arun Nethi, Priyanka Kharat, Vency Verghese, Grant Jenkins, Steve Miff, Vikas Chowdhry, Xiao Wang
Abstract	In recent times, machine learning (ML) and artificial intelligence (AI) based systems have evolved and scaled across different industries such as finance, retail, insurance, energy utilities, etc. Among other things, they have been used to predict patterns of customer behavior, to generate pricing models, and to predict the return on investments. But the successes in deploying machine learning models at scale in those industries have not translated into the healthcare setting. There are multiple reasons why integrating ML models into healthcare has not been widely successful, but from a technical perspective, general-purpose commercial machine learning platforms are not a good fit for healthcare due to complexities in handling data quality issues, mandates to demonstrate clinical relevance, and a lack of ability to monitor performance in a highly regulated environment with stringent security and privacy needs. In this paper, we describe Isthmus, a turnkey, cloud-based platform which addresses the challenges above and reduces time to market for operationalizing ML/AI in healthcare. Towards the end, we describe three case studies which shed light on Isthmus capabilities. These include (1) supporting an end-to-end lifecycle of a model which predicts trauma survivability at hospital trauma centers, (2) bringing in and harmonizing data from disparate sources to create a community data platform for inferring population as well as patient level insights for Social Determinants of Health (SDoH), and (3) ingesting live-streaming data from various IoT sensors to build models, which can leverage real-time and longitudinal information to make advanced time-sensitive predictions.
Tasks
Published	2019-09-29
URL	https://arxiv.org/abs/1909.13343v2
PDF	https://arxiv.org/pdf/1909.13343v2.pdf
PWC	https://paperswithcode.com/paper/isthmus-secure-scalable-real-time-and-robust
Repo
Framework


Title	Towards Learning Cross-Modal Perception-Trace Models
Authors	Achim Rettinger, Viktoria Bogdanova, Philipp Niemann
Abstract	Representation learning is a key element of state-of-the-art deep learning approaches. It enables to transform raw data into structured vector space embeddings. Such embeddings are able to capture the distributional semantics of their context, e.g. by word windows on natural language sentences, graph walks on knowledge graphs or convolutions on images. So far, this context is manually defined, resulting in heuristics which are solely optimized for computational performance on certain tasks like link-prediction. However, such heuristic models of context are fundamentally different to how humans capture information. For instance, when reading a multi-modal webpage (i) humans do not perceive all parts of a document equally: Some words and parts of images are skipped, others are revisited several times which makes the perception trace highly non-sequential; (ii) humans construct meaning from a document’s content by shifting their attention between text and image, among other things, guided by layout and design elements. In this paper we empirically investigate the difference between human perception and context heuristics of basic embedding models. We conduct eye tracking experiments to capture the underlying characteristics of human perception of media documents containing a mixture of text and images. Based on that, we devise a prototypical computational perception-trace model, called CMPM. We evaluate empirically how CMPM can improve a basic skip-gram embedding approach. Our results suggest, that even with a basic human-inspired computational perception model, there is a huge potential for improving embeddings since such a model does inherently capture multiple modalities, as well as layout and design elements.
Tasks	Eye Tracking, Knowledge Graphs, Link Prediction, Representation Learning
Published	2019-10-18
URL	https://arxiv.org/abs/1910.08549v1
PDF	https://arxiv.org/pdf/1910.08549v1.pdf
PWC	https://paperswithcode.com/paper/towards-learning-cross-modal-perception-trace
Repo
Framework

Wasserstein distances for evaluating cross-lingual embeddings


Title	Wasserstein distances for evaluating cross-lingual embeddings
Authors	Georgios Balikas, Ioannis Partalas
Abstract	Word embeddings are high dimensional vector representations of words that capture their semantic similarity in the vector space. There exist several algorithms for learning such embeddings both for a single language as well as for several languages jointly. In this work we propose to evaluate collections of embeddings by adapting downstream natural language tasks to the optimal transport framework. We show how the family of Wasserstein distances can be used to solve cross-lingual document retrieval and the cross-lingual document classification problems. We argue on the advantages of this approach compared to more traditional evaluation methods of embeddings like bilingual lexical induction. Our experimental results suggest that using Wasserstein distances on these problems out-performs several strong baselines and performs on par with state-of-the-art models.
Tasks	Cross-Lingual Document Classification, Document Classification, Semantic Similarity, Semantic Textual Similarity, Word Embeddings
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11005v2
PDF	https://arxiv.org/pdf/1910.11005v2.pdf
PWC	https://paperswithcode.com/paper/wasserstein-distances-for-evaluating-cross
Repo
Framework

Hierarchical Transformers for Long Document Classification


Title	Hierarchical Transformers for Long Document Classification
Authors	Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Yishay Carmiel, Najim Dehak
Abstract	BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm. We extend its fine-tuning procedure to address one of its major limitations - applicability to inputs longer than a few hundred words, such as transcripts of human call conversations. Our method is conceptually simple. We segment the input into smaller chunks and feed each of them into the base model. Then, we propagate each output through a single recurrent layer, or another transformer, followed by a softmax activation. We obtain the final classification decision after the last segment has been consumed. We show that both BERT extensions are quick to fine-tune and converge after as little as 1 epoch of training on a small, domain-specific data set. We successfully apply them in three different tasks involving customer call satisfaction prediction and topic classification, and obtain a significant improvement over the baseline models in two of them.
Tasks	Document Classification, Transfer Learning
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10781v1
PDF	https://arxiv.org/pdf/1910.10781v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-transformers-for-long-document
Repo
Framework

Adaptive Compression-based Lifelong Learning


Title	Adaptive Compression-based Lifelong Learning
Authors	Shivangi Srivastava, Maxim Berman, Matthew B. Blaschko, Devis Tuia
Abstract	The problem of a deep learning model losing performance on a previously learned task when fine-tuned to a new one is a phenomenon known as Catastrophic forgetting. There are two major ways to mitigate this problem: either preserving activations of the initial network during training with a new task; or restricting the new network activations to remain close to the initial ones. The latter approach falls under the denomination of lifelong learning, where the model is updated in a way that it performs well on both old and new tasks, without having access to the old task’s training samples anymore. Recently, approaches like pruning networks for freeing network capacity during sequential learning of tasks have been gaining in popularity. Such approaches allow learning small networks while making redundant parameters available for the next tasks. The common problem encountered with these approaches is that the pruning percentage is hard-coded, irrespective of the number of samples, of the complexity of the learning task and of the number of classes in the dataset. We propose a method based on Bayesian optimization to perform adaptive compression/pruning of the network and show its effectiveness in lifelong learning. Our method learns to perform heavy pruning for small and/or simple datasets while using milder compression rates for large and/or complex data. Experiments on classification and semantic segmentation demonstrate the applicability of learning network compression, where we are able to effectively preserve performances along sequences of tasks of varying complexity.
Tasks	Semantic Segmentation
Published	2019-07-23
URL	https://arxiv.org/abs/1907.09695v1
PDF	https://arxiv.org/pdf/1907.09695v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-compression-based-lifelong-learning
Repo
Framework

Comparing Knowledge-based Reinforcement Learning to Neural Networks in a Strategy Game


Title	Comparing Knowledge-based Reinforcement Learning to Neural Networks in a Strategy Game
Authors	Liudmyla Nechepurenko, Viktor Voss, Vyacheslav Gritsenko
Abstract	The paper reports on an experiment, in which a Knowledge-Based Reinforcement Learning (KB-RL) method was compared to a Neural Network (NN) approach in solving a classical Artificial Intelligence (AI) task. In contrast to NNs, which require a substantial amount of data to learn a good policy, the KB-RL method seeks to encode human knowledge into the solution, considerably reducing the amount of data needed for a good policy. By means of Reinforcement Learning (RL), KB-RL learns to optimize the model and improves the output of the system. Furthermore, KB-RL offers the advantage of a clear explanation of the taken decisions as well as transparent reasoning behind the solution. The goal of the reported experiment was to examine the performance of the KB-RL method in contrast to the Neural Network and to explore the capabilities of KB-RL to deliver a strong solution for the AI tasks. The results show that, within the designed settings, KB-RL outperformed the NN, and was able to learn a better policy from the available amount of data. These results support the opinion that Artificial Intelligence can benefit from the discovery and study of alternative approaches, potentially extending the frontiers of AI.
Tasks	Game of Go
Published	2019-01-15
URL	https://arxiv.org/abs/1901.04626v2
PDF	https://arxiv.org/pdf/1901.04626v2.pdf
PWC	https://paperswithcode.com/paper/comparing-knowledge-based-reinforcement
Repo
Framework