October 17, 2019

2746 words 13 mins read

Paper Group ANR 841

End-to-End Speech-Driven Facial Animation with Temporal GANs. Intensity and Rescale Invariant Copy Move Forgery Detection Techniques. A Factoid Question Answering System for Vietnamese. Streaming End-to-end Speech Recognition For Mobile Devices. A Dialogue Annotation Scheme for Weight Management Chat using the Trans-Theoretical Model of Health Beha …

End-to-End Speech-Driven Facial Animation with Temporal GANs


Title	End-to-End Speech-Driven Facial Animation with Temporal GANs
Authors	Konstantinos Vougioukas, Stavros Petridis, Maja Pantic
Abstract	Speech-driven facial animation is the process which uses speech signals to automatically synthesize a talking character. The majority of work in this domain creates a mapping from audio features to visual features. This often requires post-processing using computer graphics techniques to produce realistic albeit subject dependent results. We present a system for generating videos of a talking head, using a still image of a person and an audio clip containing speech, that doesn’t rely on any handcrafted intermediate features. To the best of our knowledge, this is the first method capable of generating subject independent realistic videos directly from raw audio. Our method can generate videos which have (a) lip movements that are in sync with the audio and (b) natural facial expressions such as blinks and eyebrow movements. We achieve this by using a temporal GAN with 2 discriminators, which are capable of capturing different aspects of the video. The effect of each component in our system is quantified through an ablation study. The generated videos are evaluated based on their sharpness, reconstruction quality, and lip-reading accuracy. Finally, a user study is conducted, confirming that temporal GANs lead to more natural sequences than a static GAN-based approach.
Tasks
Published	2018-05-23
URL	http://arxiv.org/abs/1805.09313v4
PDF	http://arxiv.org/pdf/1805.09313v4.pdf
PWC	https://paperswithcode.com/paper/end-to-end-speech-driven-facial-animation
Repo
Framework

Intensity and Rescale Invariant Copy Move Forgery Detection Techniques


Title	Intensity and Rescale Invariant Copy Move Forgery Detection Techniques
Authors	Tejas K, Swathi C, Rajesh Kumar M
Abstract	In this contemporary world digital media such as videos and images behave as an active medium to carry valuable information across the globe on all fronts. However there are several techniques evolved to tamper the image which has made their authenticity untrustworthy. CopyMove Forgery CMF is one of the most common forgeries present in an image where a cluster of pixels are duplicated in the same image with potential postprocessing techniques. Various state-of-art techniques are developed in the recent years which are effective in detecting passive image forgery. However most methods do fail when the copied image is rescaled or added with certain intensity before being pasted due to de-synchronization of pixels in the searching process. To tackle this problem the paper proposes distinct novel algorithms which recognize a unique approach of using Hus invariant moments and Discreet Cosine Transformations DCT to attain the desired rescale invariant and intensity invariant CMF detection techniques respectively. The experiments conducted quantitatively and qualitatively demonstrate the effectiveness of the algorithm.
Tasks
Published	2018-09-11
URL	http://arxiv.org/abs/1809.04154v1
PDF	http://arxiv.org/pdf/1809.04154v1.pdf
PWC	https://paperswithcode.com/paper/intensity-and-rescale-invariant-copy-move
Repo
Framework

A Factoid Question Answering System for Vietnamese


Title	A Factoid Question Answering System for Vietnamese
Authors	Phuong Le-Hong, Duc-Thien Bui
Abstract	In this paper, we describe the development of an end-to-end factoid question answering system for the Vietnamese language. This system combines both statistical models and ontology-based methods in a chain of processing modules to provide high-quality mappings from natural language text to entities. We present the challenges in the development of such an intelligent user interface for an isolating language like Vietnamese and show that techniques developed for inflectional languages cannot be applied “as is”. Our question answering system can answer a wide range of general knowledge questions with promising accuracy on a test set.
Tasks	Question Answering
Published	2018-03-02
URL	http://arxiv.org/abs/1803.00712v3
PDF	http://arxiv.org/pdf/1803.00712v3.pdf
PWC	https://paperswithcode.com/paper/a-factoid-question-answering-system-for
Repo
Framework

Streaming End-to-end Speech Recognition For Mobile Devices


Title	Streaming End-to-end Speech Recognition For Mobile Devices
Authors	Yanzhang He, Tara N. Sainath, Rohit Prabhavalkar, Ian McGraw, Raziel Alvarez, Ding Zhao, David Rybach, Anjuli Kannan, Yonghui Wu, Ruoming Pang, Qiao Liang, Deepti Bhatia, Yuan Shangguan, Bo Li, Golan Pundak, Khe Chai Sim, Tom Bagby, Shuo-yiin Chang, Kanishka Rao, Alexander Gruenstein
Abstract	End-to-end (E2E) models, which directly predict output character sequences given input speech, are good candidates for on-device speech recognition. E2E models, however, present numerous challenges: In order to be truly useful, such models must decode speech utterances in a streaming fashion, in real time; they must be robust to the long tail of use cases; they must be able to leverage user-specific context (e.g., contact lists); and above all, they must be extremely accurate. In this work, we describe our efforts at building an E2E speech recognizer using a recurrent neural network transducer. In experimental evaluations, we find that the proposed approach can outperform a conventional CTC-based model in terms of both latency and accuracy in a number of evaluation categories.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2018-11-15
URL	http://arxiv.org/abs/1811.06621v1
PDF	http://arxiv.org/pdf/1811.06621v1.pdf
PWC	https://paperswithcode.com/paper/streaming-end-to-end-speech-recognition-for
Repo
Framework

A Dialogue Annotation Scheme for Weight Management Chat using the Trans-Theoretical Model of Health Behavior Change


Title	A Dialogue Annotation Scheme for Weight Management Chat using the Trans-Theoretical Model of Health Behavior Change
Authors	Ramesh Manuvinakurike, Sumanth Bharadwaj, Kallirroi Georgila
Abstract	In this study we collect and annotate human-human role-play dialogues in the domain of weight management. There are two roles in the conversation: the “seeker” who is looking for ways to lose weight and the “helper” who provides suggestions to help the “seeker” in their weight loss journey. The chat dialogues collected are then annotated with a novel annotation scheme inspired by a popular health behavior change theory called “trans-theoretical model of health behavior change”. We also build classifiers to automatically predict the annotation labels used in our corpus. We find that classification accuracy improves when oracle segmentations of the interlocutors’ sentences are provided compared to directly classifying unsegmented sentences.
Tasks
Published	2018-07-11
URL	http://arxiv.org/abs/1807.03948v1
PDF	http://arxiv.org/pdf/1807.03948v1.pdf
PWC	https://paperswithcode.com/paper/a-dialogue-annotation-scheme-for-weight-1
Repo
Framework

Selecting Machine-Translated Data for Quick Bootstrapping of a Natural Language Understanding System


Title	Selecting Machine-Translated Data for Quick Bootstrapping of a Natural Language Understanding System
Authors	Judith Gaspers, Penny Karanasou, Rajen Chatterjee
Abstract	This paper investigates the use of Machine Translation (MT) to bootstrap a Natural Language Understanding (NLU) system for a new language for the use case of a large-scale voice-controlled device. The goal is to decrease the cost and time needed to get an annotated corpus for the new language, while still having a large enough coverage of user requests. Different methods of filtering MT data in order to keep utterances that improve NLU performance and language-specific post-processing methods are investigated. These methods are tested in a large-scale NLU task with translating around 10 millions training utterances from English to German. The results show a large improvement for using MT data over a grammar-based and over an in-house data collection baseline, while reducing the manual effort greatly. Both filtering and post-processing approaches improve results further.
Tasks	Machine Translation
Published	2018-05-23
URL	http://arxiv.org/abs/1805.09119v1
PDF	http://arxiv.org/pdf/1805.09119v1.pdf
PWC	https://paperswithcode.com/paper/selecting-machine-translated-data-for-quick
Repo
Framework

Dimension-free Information Concentration via Exp-Concavity


Title	Dimension-free Information Concentration via Exp-Concavity
Authors	Ya-Ping Hsieh, Volkan Cevher
Abstract	Information concentration of probability measures have important implications in learning theory. Recently, it is discovered that the information content of a log-concave distribution concentrates around their differential entropy, albeit with an unpleasant dependence on the ambient dimension. In this work, we prove that if the potentials of the log-concave distribution are exp-concave, which is a central notion for fast rates in online and statistical learning, then the concentration of information can be further improved to depend only on the exp-concavity parameter, and hence, it can be dimension independent. Central to our proof is a novel yet simple application of the variance Brascamp-Lieb inequality. In the context of learning theory, our concentration-of-information result immediately implies high-probability results to many of the previous bounds that only hold in expectation.
Tasks
Published	2018-02-26
URL	http://arxiv.org/abs/1802.09301v1
PDF	http://arxiv.org/pdf/1802.09301v1.pdf
PWC	https://paperswithcode.com/paper/dimension-free-information-concentration-via
Repo
Framework

Context-encoding Variational Autoencoder for Unsupervised Anomaly Detection


Title	Context-encoding Variational Autoencoder for Unsupervised Anomaly Detection
Authors	David Zimmerer, Simon A. A. Kohl, Jens Petersen, Fabian Isensee, Klaus H. Maier-Hein
Abstract	Unsupervised learning can leverage large-scale data sources without the need for annotations. In this context, deep learning-based auto encoders have shown great potential in detecting anomalies in medical images. However, state-of-the-art anomaly scores are still based on the reconstruction error, which lacks in two essential parts: it ignores the model-internal representation employed for reconstruction, and it lacks formal assertions and comparability between samples. We address these shortcomings by proposing the Context-encoding Variational Autoencoder (ceVAE) which combines reconstruction- with density-based anomaly scoring. This improves the sample- as well as pixel-wise results. In our experiments on the BraTS-2017 and ISLES-2015 segmentation benchmarks, the ceVAE achieves unsupervised ROC-AUCs of 0.95 and 0.89, respectively, thus outperforming state-of-the-art methods by a considerable margin.
Tasks	Anomaly Detection, Unsupervised Anomaly Detection
Published	2018-12-14
URL	http://arxiv.org/abs/1812.05941v1
PDF	http://arxiv.org/pdf/1812.05941v1.pdf
PWC	https://paperswithcode.com/paper/context-encoding-variational-autoencoder-for
Repo
Framework

Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence


Title	Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence
Authors	Yi Xu, Qi Qi, Qihang Lin, Rong Jin, Tianbao Yang
Abstract	Difference of convex (DC) functions cover a broad family of non-convex and possibly non-smooth and non-differentiable functions, and have wide applications in machine learning and statistics. Although deterministic algorithms for DC functions have been extensively studied, stochastic optimization that is more suitable for learning with big data remains under-explored. In this paper, we propose new stochastic optimization algorithms and study their first-order convergence theories for solving a broad family of DC functions. We improve the existing algorithms and theories of stochastic optimization for DC functions from both practical and theoretical perspectives. On the practical side, our algorithm is more user-friendly without requiring a large mini-batch size and more efficient by saving unnecessary computations. On the theoretical side, our convergence analysis does not necessarily require the involved functions to be smooth with Lipschitz continuous gradient. Instead, the convergence rate of the proposed stochastic algorithm is automatically adaptive to the H"{o}lder continuity of the gradient of one component function. Moreover, we extend the proposed stochastic algorithms for DC functions to solve problems with a general non-convex non-differentiable regularizer, which does not necessarily have a DC decomposition but enjoys an efficient proximal mapping. To the best of our knowledge, this is the first work that gives the first non-asymptotic convergence for solving non-convex optimization whose objective has a general non-convex non-differentiable regularizer.
Tasks	Stochastic Optimization
Published	2018-11-28
URL	http://arxiv.org/abs/1811.11829v2
PDF	http://arxiv.org/pdf/1811.11829v2.pdf
PWC	https://paperswithcode.com/paper/stochastic-optimization-for-dc-functions-and
Repo
Framework

Class Representative Autoencoder for Low Resolution Multi-Spectral Gender Classification


Title	Class Representative Autoencoder for Low Resolution Multi-Spectral Gender Classification
Authors	Maneet Singh, Shruti Nagpal, Richa Singh, Mayank Vatsa
Abstract	Gender is one of the most common attributes used to describe an individual. It is used in multiple domains such as human computer interaction, marketing, security, and demographic reports. Research has been performed to automate the task of gender recognition in constrained environment using face images, however, limited attention has been given to gender classification in unconstrained scenarios. This work attempts to address the challenging problem of gender classification in multi-spectral low resolution face images. We propose a robust Class Representative Autoencoder model, termed as AutoGen for the same. The proposed model aims to minimize the intra-class variations while maximizing the inter-class variations for the learned feature representations. Results on visible as well as near infrared spectrum data for different resolutions and multiple databases depict the efficacy of the proposed model. Comparative results with existing approaches and two commercial off-the-shelf systems further motivate the use of class representative features for classification.
Tasks
Published	2018-05-21
URL	http://arxiv.org/abs/1805.07905v1
PDF	http://arxiv.org/pdf/1805.07905v1.pdf
PWC	https://paperswithcode.com/paper/class-representative-autoencoder-for-low
Repo
Framework

SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning


Title	SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning
Authors	Daoming Lyu, Fangkai Yang, Bo Liu, Steven Gustafson
Abstract	Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options.This framework features a planner – controller – meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches.
Tasks	Decision Making
Published	2018-10-31
URL	http://arxiv.org/abs/1811.00090v4
PDF	http://arxiv.org/pdf/1811.00090v4.pdf
PWC	https://paperswithcode.com/paper/sdrl-interpretable-and-data-efficient-deep
Repo
Framework

Generative adversarial interpolative autoencoding: adversarial training on latent space interpolations encourage convex latent distributions


Title	Generative adversarial interpolative autoencoding: adversarial training on latent space interpolations encourage convex latent distributions
Authors	Tim Sainburg, Marvin Thielk, Brad Theilman, Benjamin Migliori, Timothy Gentner
Abstract	We present a neural network architecture based upon the Autoencoder (AE) and Generative Adversarial Network (GAN) that promotes a convex latent distribution by training adversarially on latent space interpolations. By using an AE as both the generator and discriminator of a GAN, we pass a pixel-wise error function across the discriminator, yielding an AE which produces non-blurry samples that match both high- and low-level features of the original images. Interpolations between images in this space remain within the latent-space distribution of real images as trained by the discriminator, and therfore preserve realistic resemblances to the network inputs. Code available at https://github.com/timsainb/GAIA
Tasks
Published	2018-07-17
URL	http://arxiv.org/abs/1807.06650v3
PDF	http://arxiv.org/pdf/1807.06650v3.pdf
PWC	https://paperswithcode.com/paper/generative-adversarial-interpolative
Repo
Framework

Deep RNNs Encode Soft Hierarchical Syntax


Title	Deep RNNs Encode Soft Hierarchical Syntax
Authors	Terra Blevins, Omer Levy, Luke Zettlemoyer
Abstract	We present a set of experiments to demonstrate that deep recurrent neural networks (RNNs) learn internal representations that capture soft hierarchical notions of syntax from highly varied supervision. We consider four syntax tasks at different depths of the parse tree; for each word, we predict its part of speech as well as the first (parent), second (grandparent) and third level (great-grandparent) constituent labels that appear above it. These predictions are made from representations produced at different depths in networks that are pretrained with one of four objectives: dependency parsing, semantic role labeling, machine translation, or language modeling. In every case, we find a correspondence between network depth and syntactic depth, suggesting that a soft syntactic hierarchy emerges. This effect is robust across all conditions, indicating that the models encode significant amounts of syntax even in the absence of an explicit syntactic training supervision.
Tasks	Dependency Parsing, Language Modelling, Machine Translation, Semantic Role Labeling
Published	2018-05-11
URL	http://arxiv.org/abs/1805.04218v1
PDF	http://arxiv.org/pdf/1805.04218v1.pdf
PWC	https://paperswithcode.com/paper/deep-rnns-encode-soft-hierarchical-syntax
Repo
Framework

Robust Artificial Intelligence and Robust Human Organizations


Title	Robust Artificial Intelligence and Robust Human Organizations
Authors	Thomas G. Dietterich
Abstract	Every AI system is deployed by a human organization. In high risk applications, the combined human plus AI system must function as a high-reliability organization in order to avoid catastrophic errors. This short note reviews the properties of high-reliability organizations and draws implications for the development of AI technology and the safe application of that technology.
Tasks
Published	2018-11-27
URL	http://arxiv.org/abs/1811.10840v1
PDF	http://arxiv.org/pdf/1811.10840v1.pdf
PWC	https://paperswithcode.com/paper/robust-artificial-intelligence-and-robust
Repo
Framework

Long Short-Term Memory Networks for CSI300 Volatility Prediction with Baidu Search Volume


Title	Long Short-Term Memory Networks for CSI300 Volatility Prediction with Baidu Search Volume
Authors	Yu-Long Zhou, Ren-Jie Han, Qian Xu, Wei-Ke Zhang
Abstract	Intense volatility in financial markets affect humans worldwide. Therefore, relatively accurate prediction of volatility is critical. We suggest that massive data sources resulting from human interaction with the Internet may offer a new perspective on the behavior of market participants in periods of large market movements. First we select 28 key words, which are related to finance as indicators of the public mood and macroeconomic factors. Then those 28 words of the daily search volume based on Baidu index are collected manually, from June 1, 2006 to October 29, 2017. We apply a Long Short-Term Memory neural network to forecast CSI300 volatility using those search volume data. Compared to the benchmark GARCH model, our forecast is more accurate, which demonstrates the effectiveness of the LSTM neural network in volatility forecasting.
Tasks
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11954v1
PDF	http://arxiv.org/pdf/1805.11954v1.pdf
PWC	https://paperswithcode.com/paper/long-short-term-memory-networks-for-csi300
Repo
Framework