Paper Group ANR 91
Multilingual ASR with Massive Data Augmentation. Composing Diverse Policies for Temporally Extended Tasks. An Introduction to Variational Autoencoders. A Joint Model for Multimodal Document Quality Assessment. A binary-activation, multi-level weight RNN and training algorithm for processing-in-memory inference with eNVM. TACAM: Topic And Context Aw …
Multilingual ASR with Massive Data Augmentation
Title | Multilingual ASR with Massive Data Augmentation |
Authors | Chunxi Liu, Qiaochu Zhang, Xiaohui Zhang, Kritika Singh, Yatharth Saraf, Geoffrey Zweig |
Abstract | Towards developing high-performing ASR for low-resource languages, approaches to address the lack of resources are to make use of data from multiple languages, and to augment the training data by creating acoustic variations. In this work we present a single grapheme-based ASR model learned on 7 geographically proximal languages, using standard hybrid BLSTM-HMM acoustic models with lattice-free MMI objective. We build the single ASR grapheme set via taking the union over each language-specific grapheme set, and we find such multilingual ASR model can perform language-independent recognition on all 7 languages, and substantially outperform each monolingual ASR model. Secondly, we evaluate the efficacy of multiple data augmentation alternatives within language, as well as their complementarity with multilingual modeling. Overall, we show that the proposed multilingual ASR with various data augmentation can not only recognize any within training set languages, but also provide large ASR performance improvements. |
Tasks | Data Augmentation |
Published | 2019-09-14 |
URL | https://arxiv.org/abs/1909.06522v1 |
https://arxiv.org/pdf/1909.06522v1.pdf | |
PWC | https://paperswithcode.com/paper/multilingual-asr-with-massive-data |
Repo | |
Framework | |
Composing Diverse Policies for Temporally Extended Tasks
Title | Composing Diverse Policies for Temporally Extended Tasks |
Authors | Daniel Angelov, Yordan Hristov, Michael Burke, Subramanian Ramamoorthy |
Abstract | Robot control policies for temporally extended and sequenced tasks are often characterized by discontinuous switches between different local dynamics. These change-points are often exploited in hierarchical motion planning to build approximate models and to facilitate the design of local, region-specific controllers. However, it becomes combinatorially challenging to implement such a pipeline for complex temporally extended tasks, especially when the sub-controllers work on different information streams, time scales and action spaces. In this paper, we introduce a method that can compose diverse policies comprising motion planning trajectories, dynamic motion primitives and neural network controllers. We introduce a global goal scoring estimator that uses local, per-motion primitive dynamics models and corresponding activation state-space sets to sequence diverse policies in a locally optimal fashion. We use expert demonstrations to convert what is typically viewed as a gradient-based learning process into a planning process without explicitly specifying pre- and post-conditions. We first illustrate the proposed framework using an MDP benchmark to showcase robustness to action and model dynamics mismatch, and then with a particularly complex physical gear assembly task, solved on a PR2 robot. We show that the proposed approach successfully discovers the optimal sequence of controllers and solves both tasks efficiently. |
Tasks | Hierarchical Reinforcement Learning, Motion Planning |
Published | 2019-07-18 |
URL | https://arxiv.org/abs/1907.08199v3 |
https://arxiv.org/pdf/1907.08199v3.pdf | |
PWC | https://paperswithcode.com/paper/composing-diverse-policies-for-temporally |
Repo | |
Framework | |
An Introduction to Variational Autoencoders
Title | An Introduction to Variational Autoencoders |
Authors | Diederik P. Kingma, Max Welling |
Abstract | Variational autoencoders provide a principled framework for learning deep latent-variable models and corresponding inference models. In this work, we provide an introduction to variational autoencoders and some important extensions. |
Tasks | Latent Variable Models |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02691v3 |
https://arxiv.org/pdf/1906.02691v3.pdf | |
PWC | https://paperswithcode.com/paper/an-introduction-to-variational-autoencoders |
Repo | |
Framework | |
A Joint Model for Multimodal Document Quality Assessment
Title | A Joint Model for Multimodal Document Quality Assessment |
Authors | Aili Shen, Bahar Salehi, Timothy Baldwin, Jianzhong Qi |
Abstract | The quality of a document is affected by various factors, including grammaticality, readability, stylistics, and expertise depth, making the task of document quality assessment a complex one. In this paper, we explore this task in the context of assessing the quality of Wikipedia articles and academic papers. Observing that the visual rendering of a document can capture implicit quality indicators that are not present in the document text — such as images, font choices, and visual layout — we propose a joint model that combines the text content with a visual rendering of the document for document quality assessment. Experimental results over two datasets reveal that textual and visual features are complementary, achieving state-of-the-art results. |
Tasks | |
Published | 2019-01-04 |
URL | http://arxiv.org/abs/1901.01010v2 |
http://arxiv.org/pdf/1901.01010v2.pdf | |
PWC | https://paperswithcode.com/paper/a-joint-model-for-multimodal-document-quality |
Repo | |
Framework | |
A binary-activation, multi-level weight RNN and training algorithm for processing-in-memory inference with eNVM
Title | A binary-activation, multi-level weight RNN and training algorithm for processing-in-memory inference with eNVM |
Authors | Siming Ma, David Brooks, Gu-Yeon Wei |
Abstract | We present a new algorithm for training neural networks with binary activations and multi-level weights, which enables efficient processing-in-memory circuits with eNVM. Binary activations obviate costly DACs and ADCs. Multi-level weights leverage multi-level eNVM cells. Compared with previous quantization algorithms, our method not only works for feed-forward networks including fully-connected and convolutional, but also achieves higher accuracy and noise resilience for recurrent networks. In particular, we present a RNN trigger-word detection PIM accelerator, whose modeling results demonstrate high performance using our new training algorithm. |
Tasks | Quantization |
Published | 2019-11-30 |
URL | https://arxiv.org/abs/1912.00106v2 |
https://arxiv.org/pdf/1912.00106v2.pdf | |
PWC | https://paperswithcode.com/paper/a-binary-activation-multi-level-weight-rnn |
Repo | |
Framework | |
TACAM: Topic And Context Aware Argument Mining
Title | TACAM: Topic And Context Aware Argument Mining |
Authors | Michael Fromm, Evgeniy Faerman, Thomas Seidl |
Abstract | In this work we address the problem of argument search. The purpose of argument search is the distillation of pro and contra arguments for requested topics from large text corpora. In previous works, the usual approach is to use a standard search engine to extract text parts which are relevant to the given topic and subsequently use an argument recognition algorithm to select arguments from them. The main challenge in the argument recognition task, which is also known as argument mining, is that often sentences containing arguments are structurally similar to purely informative sentences without any stance about the topic. In fact, they only differ semantically. Most approaches use topic or search term information only for the first search step and therefore assume that arguments can be classified independently of a topic. We argue that topic information is crucial for argument mining, since the topic defines the semantic context of an argument. Precisely, we propose different models for the classification of arguments, which take information about a topic of an argument into account. Moreover, to enrich the context of a topic and to let models understand the context of the potential argument better, we integrate information from different external sources such as Knowledge Graphs or pre-trained NLP models. Our evaluation shows that considering topic information, especially in connection with external information, provides a significant performance boost for the argument mining task. |
Tasks | Argument Mining, Knowledge Graphs |
Published | 2019-05-26 |
URL | https://arxiv.org/abs/1906.00923v2 |
https://arxiv.org/pdf/1906.00923v2.pdf | |
PWC | https://paperswithcode.com/paper/190600923 |
Repo | |
Framework | |
Learning-Based Animation of Clothing for Virtual Try-On
Title | Learning-Based Animation of Clothing for Virtual Try-On |
Authors | Igor Santesteban, Miguel A. Otaduy, Dan Casas |
Abstract | This paper presents a learning-based clothing animation method for highly efficient virtual try-on simulation. Given a garment, we preprocess a rich database of physically-based dressed character simulations, for multiple body shapes and animations. Then, using this database, we train a learning-based model of cloth drape and wrinkles, as a function of body shape and dynamics. We propose a model that separates global garment fit, due to body shape, from local garment wrinkles, due to both pose dynamics and body shape. We use a recurrent neural network to regress garment wrinkles, and we achieve highly plausible nonlinear effects, in contrast to the blending artifacts suffered by previous methods. At runtime, dynamic virtual try-on animations are produced in just a few milliseconds for garments with thousands of triangles. We show qualitative and quantitative analysis of results |
Tasks | |
Published | 2019-03-17 |
URL | http://arxiv.org/abs/1903.07190v1 |
http://arxiv.org/pdf/1903.07190v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-based-animation-of-clothing-for |
Repo | |
Framework | |
ZuCo 2.0: A Dataset of Physiological Recordings During Natural Reading and Annotation
Title | ZuCo 2.0: A Dataset of Physiological Recordings During Natural Reading and Annotation |
Authors | Nora Hollenstein, Marius Troendle, Ce Zhang, Nicolas Langer |
Abstract | We recorded and preprocessed ZuCo 2.0, a new dataset of simultaneous eye-tracking and electroencephalography during natural reading and during annotation. This corpus contains gaze and brain activity data of 739 sentences, 349 in a normal reading paradigm and 390 in a task-specific paradigm, in which the 18 participants actively search for a semantic relation type in the given sentences as a linguistic annotation task. This new dataset complements ZuCo 1.0 by providing experiments designed to analyze the differences in cognitive processing between natural reading and annotation. The data is freely available here: https://osf.io/2urht/. |
Tasks | Eye Tracking |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00903v3 |
https://arxiv.org/pdf/1912.00903v3.pdf | |
PWC | https://paperswithcode.com/paper/zuco-20-a-dataset-of-physiological-recordings |
Repo | |
Framework | |
SID4VAM: A Benchmark Dataset with Synthetic Images for Visual Attention Modeling
Title | SID4VAM: A Benchmark Dataset with Synthetic Images for Visual Attention Modeling |
Authors | David Berga, Xosé R. Fdez-Vidal, Xavier Otazu, Xosé M. Pardo |
Abstract | A benchmark of saliency models performance with a synthetic image dataset is provided. Model performance is evaluated through saliency metrics as well as the influence of model inspiration and consistency with human psychophysics. SID4VAM is composed of 230 synthetic images, with known salient regions. Images were generated with 15 distinct types of low-level features (e.g. orientation, brightness, color, size…) with a target-distractor pop-out type of synthetic patterns. We have used Free-Viewing and Visual Search task instructions and 7 feature contrasts for each feature category. Our study reveals that state-of-the-art Deep Learning saliency models do not perform well with synthetic pattern images, instead, models with Spectral/Fourier inspiration outperform others in saliency metrics and are more consistent with human psychophysical experimentation. This study proposes a new way to evaluate saliency models in the forthcoming literature, accounting for synthetic images with uniquely low-level feature contexts, distinct from previous eye tracking image datasets. |
Tasks | Eye Tracking |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1910.13066v1 |
https://arxiv.org/pdf/1910.13066v1.pdf | |
PWC | https://paperswithcode.com/paper/sid4vam-a-benchmark-dataset-with-synthetic-1 |
Repo | |
Framework | |
ISTHMUS: Secure, Scalable, Real-time and Robust Machine Learning Platform for Healthcare
Title | ISTHMUS: Secure, Scalable, Real-time and Robust Machine Learning Platform for Healthcare |
Authors | Akshay Arora, Arun Nethi, Priyanka Kharat, Vency Verghese, Grant Jenkins, Steve Miff, Vikas Chowdhry, Xiao Wang |
Abstract | In recent times, machine learning (ML) and artificial intelligence (AI) based systems have evolved and scaled across different industries such as finance, retail, insurance, energy utilities, etc. Among other things, they have been used to predict patterns of customer behavior, to generate pricing models, and to predict the return on investments. But the successes in deploying machine learning models at scale in those industries have not translated into the healthcare setting. There are multiple reasons why integrating ML models into healthcare has not been widely successful, but from a technical perspective, general-purpose commercial machine learning platforms are not a good fit for healthcare due to complexities in handling data quality issues, mandates to demonstrate clinical relevance, and a lack of ability to monitor performance in a highly regulated environment with stringent security and privacy needs. In this paper, we describe Isthmus, a turnkey, cloud-based platform which addresses the challenges above and reduces time to market for operationalizing ML/AI in healthcare. Towards the end, we describe three case studies which shed light on Isthmus capabilities. These include (1) supporting an end-to-end lifecycle of a model which predicts trauma survivability at hospital trauma centers, (2) bringing in and harmonizing data from disparate sources to create a community data platform for inferring population as well as patient level insights for Social Determinants of Health (SDoH), and (3) ingesting live-streaming data from various IoT sensors to build models, which can leverage real-time and longitudinal information to make advanced time-sensitive predictions. |
Tasks | |
Published | 2019-09-29 |
URL | https://arxiv.org/abs/1909.13343v2 |
https://arxiv.org/pdf/1909.13343v2.pdf | |
PWC | https://paperswithcode.com/paper/isthmus-secure-scalable-real-time-and-robust |
Repo | |
Framework | |
Towards Learning Cross-Modal Perception-Trace Models
Title | Towards Learning Cross-Modal Perception-Trace Models |
Authors | Achim Rettinger, Viktoria Bogdanova, Philipp Niemann |
Abstract | Representation learning is a key element of state-of-the-art deep learning approaches. It enables to transform raw data into structured vector space embeddings. Such embeddings are able to capture the distributional semantics of their context, e.g. by word windows on natural language sentences, graph walks on knowledge graphs or convolutions on images. So far, this context is manually defined, resulting in heuristics which are solely optimized for computational performance on certain tasks like link-prediction. However, such heuristic models of context are fundamentally different to how humans capture information. For instance, when reading a multi-modal webpage (i) humans do not perceive all parts of a document equally: Some words and parts of images are skipped, others are revisited several times which makes the perception trace highly non-sequential; (ii) humans construct meaning from a document’s content by shifting their attention between text and image, among other things, guided by layout and design elements. In this paper we empirically investigate the difference between human perception and context heuristics of basic embedding models. We conduct eye tracking experiments to capture the underlying characteristics of human perception of media documents containing a mixture of text and images. Based on that, we devise a prototypical computational perception-trace model, called CMPM. We evaluate empirically how CMPM can improve a basic skip-gram embedding approach. Our results suggest, that even with a basic human-inspired computational perception model, there is a huge potential for improving embeddings since such a model does inherently capture multiple modalities, as well as layout and design elements. |
Tasks | Eye Tracking, Knowledge Graphs, Link Prediction, Representation Learning |
Published | 2019-10-18 |
URL | https://arxiv.org/abs/1910.08549v1 |
https://arxiv.org/pdf/1910.08549v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-learning-cross-modal-perception-trace |
Repo | |
Framework | |
Wasserstein distances for evaluating cross-lingual embeddings
Title | Wasserstein distances for evaluating cross-lingual embeddings |
Authors | Georgios Balikas, Ioannis Partalas |
Abstract | Word embeddings are high dimensional vector representations of words that capture their semantic similarity in the vector space. There exist several algorithms for learning such embeddings both for a single language as well as for several languages jointly. In this work we propose to evaluate collections of embeddings by adapting downstream natural language tasks to the optimal transport framework. We show how the family of Wasserstein distances can be used to solve cross-lingual document retrieval and the cross-lingual document classification problems. We argue on the advantages of this approach compared to more traditional evaluation methods of embeddings like bilingual lexical induction. Our experimental results suggest that using Wasserstein distances on these problems out-performs several strong baselines and performs on par with state-of-the-art models. |
Tasks | Cross-Lingual Document Classification, Document Classification, Semantic Similarity, Semantic Textual Similarity, Word Embeddings |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1910.11005v2 |
https://arxiv.org/pdf/1910.11005v2.pdf | |
PWC | https://paperswithcode.com/paper/wasserstein-distances-for-evaluating-cross |
Repo | |
Framework | |
Hierarchical Transformers for Long Document Classification
Title | Hierarchical Transformers for Long Document Classification |
Authors | Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Yishay Carmiel, Najim Dehak |
Abstract | BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm. We extend its fine-tuning procedure to address one of its major limitations - applicability to inputs longer than a few hundred words, such as transcripts of human call conversations. Our method is conceptually simple. We segment the input into smaller chunks and feed each of them into the base model. Then, we propagate each output through a single recurrent layer, or another transformer, followed by a softmax activation. We obtain the final classification decision after the last segment has been consumed. We show that both BERT extensions are quick to fine-tune and converge after as little as 1 epoch of training on a small, domain-specific data set. We successfully apply them in three different tasks involving customer call satisfaction prediction and topic classification, and obtain a significant improvement over the baseline models in two of them. |
Tasks | Document Classification, Transfer Learning |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10781v1 |
https://arxiv.org/pdf/1910.10781v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-transformers-for-long-document |
Repo | |
Framework | |
Adaptive Compression-based Lifelong Learning
Title | Adaptive Compression-based Lifelong Learning |
Authors | Shivangi Srivastava, Maxim Berman, Matthew B. Blaschko, Devis Tuia |
Abstract | The problem of a deep learning model losing performance on a previously learned task when fine-tuned to a new one is a phenomenon known as Catastrophic forgetting. There are two major ways to mitigate this problem: either preserving activations of the initial network during training with a new task; or restricting the new network activations to remain close to the initial ones. The latter approach falls under the denomination of lifelong learning, where the model is updated in a way that it performs well on both old and new tasks, without having access to the old task’s training samples anymore. Recently, approaches like pruning networks for freeing network capacity during sequential learning of tasks have been gaining in popularity. Such approaches allow learning small networks while making redundant parameters available for the next tasks. The common problem encountered with these approaches is that the pruning percentage is hard-coded, irrespective of the number of samples, of the complexity of the learning task and of the number of classes in the dataset. We propose a method based on Bayesian optimization to perform adaptive compression/pruning of the network and show its effectiveness in lifelong learning. Our method learns to perform heavy pruning for small and/or simple datasets while using milder compression rates for large and/or complex data. Experiments on classification and semantic segmentation demonstrate the applicability of learning network compression, where we are able to effectively preserve performances along sequences of tasks of varying complexity. |
Tasks | Semantic Segmentation |
Published | 2019-07-23 |
URL | https://arxiv.org/abs/1907.09695v1 |
https://arxiv.org/pdf/1907.09695v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-compression-based-lifelong-learning |
Repo | |
Framework | |
Comparing Knowledge-based Reinforcement Learning to Neural Networks in a Strategy Game
Title | Comparing Knowledge-based Reinforcement Learning to Neural Networks in a Strategy Game |
Authors | Liudmyla Nechepurenko, Viktor Voss, Vyacheslav Gritsenko |
Abstract | The paper reports on an experiment, in which a Knowledge-Based Reinforcement Learning (KB-RL) method was compared to a Neural Network (NN) approach in solving a classical Artificial Intelligence (AI) task. In contrast to NNs, which require a substantial amount of data to learn a good policy, the KB-RL method seeks to encode human knowledge into the solution, considerably reducing the amount of data needed for a good policy. By means of Reinforcement Learning (RL), KB-RL learns to optimize the model and improves the output of the system. Furthermore, KB-RL offers the advantage of a clear explanation of the taken decisions as well as transparent reasoning behind the solution. The goal of the reported experiment was to examine the performance of the KB-RL method in contrast to the Neural Network and to explore the capabilities of KB-RL to deliver a strong solution for the AI tasks. The results show that, within the designed settings, KB-RL outperformed the NN, and was able to learn a better policy from the available amount of data. These results support the opinion that Artificial Intelligence can benefit from the discovery and study of alternative approaches, potentially extending the frontiers of AI. |
Tasks | Game of Go |
Published | 2019-01-15 |
URL | https://arxiv.org/abs/1901.04626v2 |
https://arxiv.org/pdf/1901.04626v2.pdf | |
PWC | https://paperswithcode.com/paper/comparing-knowledge-based-reinforcement |
Repo | |
Framework | |