January 30, 2020

3263 words 16 mins read

Paper Group ANR 424

Paper Group ANR 424

LMVP: Video Predictor with Leaked Motion Information. A Simple and Effective Approach to Automatic Post-Editing with Transfer Learning. A concrete example of inclusive design: deaf-oriented accessibility. Domain-Relevant Embeddings for Medical Question Similarity. Traditional Machine Learning for Pitch Detection. Neural Rerendering in the Wild. Cor …

LMVP: Video Predictor with Leaked Motion Information

Title LMVP: Video Predictor with Leaked Motion Information
Authors Dong Wang, Yitong Li, Wei Cao, Liqun Chen, Qi Wei, Lawrence Carin
Abstract We propose a Leaked Motion Video Predictor (LMVP) to predict future frames by capturing the spatial and temporal dependencies from given inputs. The motion is modeled by a newly proposed component, motion guider, which plays the role of both learner and teacher. Specifically, it {\em learns} the temporal features from real data and {\em guides} the generator to predict future frames. The spatial consistency in video is modeled by an adaptive filtering network. To further ensure the spatio-temporal consistency of the prediction, a discriminator is also adopted to distinguish the real and generated frames. Further, the discriminator leaks information to the motion guider and the generator to help the learning of motion. The proposed LMVP can effectively learn the static and temporal features in videos without the need for human labeling. Experiments on synthetic and real data demonstrate that LMVP can yield state-of-the-art results.
Tasks
Published 2019-06-24
URL https://arxiv.org/abs/1906.10101v1
PDF https://arxiv.org/pdf/1906.10101v1.pdf
PWC https://paperswithcode.com/paper/lmvp-video-predictor-with-leaked-motion
Repo
Framework

A Simple and Effective Approach to Automatic Post-Editing with Transfer Learning

Title A Simple and Effective Approach to Automatic Post-Editing with Transfer Learning
Authors Gonçalo M. Correia, André F. T. Martins
Abstract Automatic post-editing (APE) seeks to automatically refine the output of a black-box machine translation (MT) system through human post-edits. APE systems are usually trained by complementing human post-edited data with large, artificial data generated through back-translations, a time-consuming process often no easier than training an MT system from scratch. In this paper, we propose an alternative where we fine-tune pre-trained BERT models on both the encoder and decoder of an APE system, exploring several parameter sharing strategies. By only training on a dataset of 23K sentences for 3 hours on a single GPU, we obtain results that are competitive with systems that were trained on 5M artificial sentences. When we add this artificial data, our method obtains state-of-the-art results.
Tasks Automatic Post-Editing, Machine Translation, Transfer Learning
Published 2019-06-14
URL https://arxiv.org/abs/1906.06253v1
PDF https://arxiv.org/pdf/1906.06253v1.pdf
PWC https://paperswithcode.com/paper/a-simple-and-effective-approach-to-automatic
Repo
Framework

A concrete example of inclusive design: deaf-oriented accessibility

Title A concrete example of inclusive design: deaf-oriented accessibility
Authors Claudia Bianchini, Fabrizio Borgia, Maria de Marsico
Abstract One of the continuing challenges of Human Computer Interaction research is the full inclusion of people with special needs into the digital world. In particular, this crucial category includes people that experiences some kind of limitation in exploiting traditional information communication channels. One immediately thinks about blind people, and several researches aim at addressing their needs. On the contrary, limitations suffered by deaf people are often underestimated. This often the result of a kind of ignorance or misunderstanding of the real nature of their communication difficulties. This chapter aims at both increasing the awareness of deaf problems in the digital world, and at proposing the project of a comprehensive solution for their better inclusion. As for the former goal, we will provide a bird’s-eye presentation of history and evolution of understanding of deafness issues, and of strategies to address them. As for the latter, we will present the design, implementation and evaluation of the first nucleus of a comprehensive digital framework to facilitate the access of deaf people into the digital world.
Tasks
Published 2019-11-27
URL https://arxiv.org/abs/1911.13207v1
PDF https://arxiv.org/pdf/1911.13207v1.pdf
PWC https://paperswithcode.com/paper/a-concrete-example-of-inclusive-design-deaf
Repo
Framework

Domain-Relevant Embeddings for Medical Question Similarity

Title Domain-Relevant Embeddings for Medical Question Similarity
Authors Clara McCreery, Namit Katariya, Anitha Kannan, Manish Chablani, Xavier Amatriain
Abstract The rate at which medical questions are asked online far exceeds the capacity of qualified people to answer them, and many of these questions are not unique. Identifying same-question pairs could enable questions to be answered more effectively. While many research efforts have focused on the problem of general question similarity for non-medical applications, these approaches do not generalize well to the medical domain, where medical expertise is often required to determine semantic similarity. In this paper, we show how a semi-supervised approach of pre-training a neural network on medical question-answer pairs is a particularly useful intermediate task for the ultimate goal of determining medical question similarity. While other pre-training tasks yield an accuracy below 78.7% on this task, our model achieves an accuracy of 82.6% with the same number of training examples, and an accuracy of 80.0% with a much smaller training set.
Tasks Question Answering, Question Similarity, Semantic Similarity, Semantic Textual Similarity
Published 2019-10-09
URL https://arxiv.org/abs/1910.04192v2
PDF https://arxiv.org/pdf/1910.04192v2.pdf
PWC https://paperswithcode.com/paper/domain-relevant-embeddings-for-medical
Repo
Framework

Traditional Machine Learning for Pitch Detection

Title Traditional Machine Learning for Pitch Detection
Authors Thomas Drugman, Goeric Huybrechts, Viacheslav Klimkov, Alexis Moinet
Abstract Pitch detection is a fundamental problem in speech processing as F0 is used in a large number of applications. Recent articles have proposed deep learning for robust pitch tracking. In this paper, we consider voicing detection as a classification problem and F0 contour estimation as a regression problem. For both tasks, acoustic features from multiple domains and traditional machine learning methods are used. The discrimination power of existing and proposed features is assessed through mutual information. Multiple supervised and unsupervised approaches are compared. A significant relative reduction of voicing errors over the best baseline is obtained: 20% with the best clustering method (K-means) and 45% with a Multi-Layer Perceptron. For F0 contour estimation, the benefits of regression techniques are limited though. We investigate whether those objective gains translate in a parametric synthesis task. Clear perceptual preferences are observed for the proposed approach over two widely-used baselines (RAPT and DIO).
Tasks
Published 2019-03-04
URL http://arxiv.org/abs/1903.01290v1
PDF http://arxiv.org/pdf/1903.01290v1.pdf
PWC https://paperswithcode.com/paper/traditional-machine-learning-for-pitch
Repo
Framework

Neural Rerendering in the Wild

Title Neural Rerendering in the Wild
Authors Moustafa Meshry, Dan B Goldman, Sameh Khamis, Hugues Hoppe, Rohit Pandey, Noah Snavely, Ricardo Martin-Brualla
Abstract We explore total scene capture – recording, modeling, and rerendering a scene under varying appearance such as season and time of day. Starting from internet photos of a tourist landmark, we apply traditional 3D reconstruction to register the photos and approximate the scene as a point cloud. For each photo, we render the scene points into a deep framebuffer, and train a neural network to learn the mapping of these initial renderings to the actual photos. This rerendering network also takes as input a latent appearance vector and a semantic mask indicating the location of transient objects like pedestrians. The model is evaluated on several datasets of publicly available images spanning a broad range of illumination conditions. We create short videos demonstrating realistic manipulation of the image viewpoint, appearance, and semantic labeling. We also compare results with prior work on scene reconstruction from internet photos.
Tasks 3D Reconstruction
Published 2019-04-08
URL http://arxiv.org/abs/1904.04290v1
PDF http://arxiv.org/pdf/1904.04290v1.pdf
PWC https://paperswithcode.com/paper/neural-rerendering-in-the-wild
Repo
Framework

Corporate IT-support Help-Desk Process Hybrid-Automation Solution with Machine Learning Approach

Title Corporate IT-support Help-Desk Process Hybrid-Automation Solution with Machine Learning Approach
Authors Kuruparan Shanmugalingam, Nisal Chandrasekara, Calvin Hindle, Gihan Fernando, Chanaka Gunawardhana
Abstract Comprehensive IT support teams in large scale organizations require more man power for handling engagement and requests of employees from different channels on a 24*7 basis. Automated email technical queries help desk is proposed to have instant real-time quick solutions and email categorisation. Email topic modelling with various machine learning, deep-learning approaches are compared with different features for a scalable, generalised solution along with sure-shot static rules. Email’s title, body, attachment, OCR text, and some feature engineered custom features are given as input elements. XGBoost cascaded hierarchical models, Bi-LSTM model with word embeddings perform well showing 77.3 overall accuracy For the real world corporate email data set. By introducing the thresholding techniques, the overall automation system architecture provides 85.6 percentage of accuracy for real world corporate emails. Combination of quick fixes, static rules, ML categorization as a low cost inference solution reduces 81 percentage of the human effort in the process of automation and real time implementation.
Tasks Optical Character Recognition, Word Embeddings
Published 2019-09-18
URL https://arxiv.org/abs/1909.09018v1
PDF https://arxiv.org/pdf/1909.09018v1.pdf
PWC https://paperswithcode.com/paper/corporate-it-support-help-desk-process-hybrid
Repo
Framework

Hybrid Reinforcement Learning with Expert State Sequences

Title Hybrid Reinforcement Learning with Expert State Sequences
Authors Xiaoxiao Guo, Shiyu Chang, Mo Yu, Gerald Tesauro, Murray Campbell
Abstract Existing imitation learning approaches often require that the complete demonstration data, including sequences of actions and states, are available. In this paper, we consider a more realistic and difficult scenario where a reinforcement learning agent only has access to the state sequences of an expert, while the expert actions are unobserved. We propose a novel tensor-based model to infer the unobserved actions of the expert state sequences. The policy of the agent is then optimized via a hybrid objective combining reinforcement learning and imitation learning. We evaluated our hybrid approach on an illustrative domain and Atari games. The empirical results show that (1) the agents are able to leverage state expert sequences to learn faster than pure reinforcement learning baselines, (2) our tensor-based action inference model is advantageous compared to standard deep neural networks in inferring expert actions, and (3) the hybrid policy optimization objective is robust against noise in expert state sequences.
Tasks Atari Games, Imitation Learning
Published 2019-03-11
URL http://arxiv.org/abs/1903.04110v1
PDF http://arxiv.org/pdf/1903.04110v1.pdf
PWC https://paperswithcode.com/paper/hybrid-reinforcement-learning-with-expert
Repo
Framework

A Matrix Factorization Model for Hellinger-based Trust Management in Social Internet of Things

Title A Matrix Factorization Model for Hellinger-based Trust Management in Social Internet of Things
Authors Soroush Aalibagi, Hamidreza Mahyar, Ali Movaghar, H. Eugene Stanley
Abstract The Social Internet of Things (SIoT), integration of Internet of Things and Social networks paradigms, has been introduced to build a network of smart nodes which are capable of establishing social links. In order to deal with misbehavioral service provider nodes, service requestor nodes must evaluate their trustworthiness levels. In this paper, we propose a novel trust management mechanism in the SIoT to predict the most reliable service provider for a service requestor, that leads to reduce the risk of exposing to malicious nodes. We model an SIoT with a flexible bipartite graph (containing two sets of nodes: service providers and requestors), then build the corresponding social network among service requestor nodes, using Hellinger distance. After that, we develop a social trust model, by using nodes’ centrality and similarity measures, to extract behavioral trust between the network nodes. Finally, a matrix factorization technique is designed to extract latent features of SIoT nodes to mitigate the data sparsity and cold start problems. We analyze the effect of parameters in the proposed trust prediction mechanism on prediction accuracy. The results indicate that feedbacks from the neighboring nodes of a specific service requestor with high Hellinger similarity in our mechanism outperforms the best existing methods. We also show that utilizing social trust model, which only considers the similarity measure, significantly improves the accuracy of the prediction mechanism. Furthermore, we evaluate the effectiveness of the proposed trust management system through a real-world SIoT application. Our results demonstrate that the proposed mechanism is resilient to different types of network attacks and it can accurately find the proper service provider with high trustworthiness.
Tasks
Published 2019-09-26
URL https://arxiv.org/abs/1909.12432v2
PDF https://arxiv.org/pdf/1909.12432v2.pdf
PWC https://paperswithcode.com/paper/a-matrix-factorization-model-for-hellinger
Repo
Framework

Almost Unsupervised Text to Speech and Automatic Speech Recognition

Title Almost Unsupervised Text to Speech and Automatic Speech Recognition
Authors Yi Ren, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu
Abstract Text to speech (TTS) and automatic speech recognition (ASR) are two dual tasks in speech processing and both achieve impressive performance thanks to the recent advance in deep learning and large amount of aligned speech and text data. However, the lack of aligned data poses a major practical problem for TTS and ASR on low-resource languages. In this paper, by leveraging the dual nature of the two tasks, we propose an almost unsupervised learning method that only leverages few hundreds of paired data and extra unpaired data for TTS and ASR. Our method consists of the following components: (1) a denoising auto-encoder, which reconstructs speech and text sequences respectively to develop the capability of language modeling both in speech and text domain; (2) dual transformation, where the TTS model transforms the text $y$ into speech $\hat{x}$, and the ASR model leverages the transformed pair $(\hat{x},y)$ for training, and vice versa, to boost the accuracy of the two tasks; (3) bidirectional sequence modeling, which addresses error propagation especially in the long speech and text sequence when training with few paired data; (4) a unified model structure, which combines all the above components for TTS and ASR based on Transformer model. Our method achieves 99.84% in terms of word level intelligible rate and 2.68 MOS for TTS, and 11.7% PER for ASR on LJSpeech dataset, by leveraging only 200 paired speech and text data (about 20 minutes audio), together with extra unpaired speech and text data.
Tasks Denoising, Language Modelling, Speech Recognition
Published 2019-05-13
URL https://arxiv.org/abs/1905.06791v2
PDF https://arxiv.org/pdf/1905.06791v2.pdf
PWC https://paperswithcode.com/paper/almost-unsupervised-text-to-speech-and
Repo
Framework

Evaluating aleatoric and epistemic uncertainties of time series deep learning models for soil moisture predictions

Title Evaluating aleatoric and epistemic uncertainties of time series deep learning models for soil moisture predictions
Authors Kuai Fang, Chaopeng Shen, Daniel Kifer
Abstract Soil moisture is an important variable that determines floods, vegetation health, agriculture productivity, and land surface feedbacks to the atmosphere, etc. Accurately modeling soil moisture has important implications in both weather and climate models. The recently available satellite-based observations give us a unique opportunity to build data-driven models to predict soil moisture instead of using land surface models, but previously there was no uncertainty estimate. We tested Monte Carlo dropout (MCD) with an aleatoric term for our long short-term memory models for this problem, and asked if the uncertainty terms behave as they were argued to. We show that the method successfully captures the predictive error after tuning a hyperparameter on a representative training dataset. We show the MCD uncertainty estimate, as previously argued, does detect dissimilarity.
Tasks Time Series
Published 2019-06-10
URL https://arxiv.org/abs/1906.04595v1
PDF https://arxiv.org/pdf/1906.04595v1.pdf
PWC https://paperswithcode.com/paper/evaluating-aleatoric-and-epistemic
Repo
Framework

One-Shot Learning for Text-to-SQL Generation

Title One-Shot Learning for Text-to-SQL Generation
Authors Dongjun Lee, Jaesik Yoon, Jongyun Song, Sanggil Lee, Sungroh Yoon
Abstract Most deep learning approaches for text-to-SQL generation are limited to the WikiSQL dataset, which only supports very simple queries. Recently, template-based and sequence-to-sequence approaches were proposed to support complex queries, which contain join queries, nested queries, and other types. However, Finegan-Dollak et al. (2018) demonstrated that both the approaches lack the ability to generate SQL of unseen templates. In this paper, we propose a template-based one-shot learning model for the text-to-SQL generation so that the model can generate SQL of an untrained template based on a single example. First, we classify the SQL template using the Matching Network that is augmented by our novel architecture Candidate Search Network. Then, we fill the variable slots in the predicted template using the Pointer Network. We show that our model outperforms state-of-the-art approaches for various text-to-SQL datasets in two aspects: 1) the SQL generation accuracy for the trained templates, and 2) the adaptability to the unseen SQL templates based on a single example without any additional training.
Tasks One-Shot Learning, Text-To-Sql
Published 2019-04-26
URL http://arxiv.org/abs/1905.11499v1
PDF http://arxiv.org/pdf/1905.11499v1.pdf
PWC https://paperswithcode.com/paper/190511499
Repo
Framework

Causally Driven Incremental Multi Touch Attribution Using a Recurrent Neural Network

Title Causally Driven Incremental Multi Touch Attribution Using a Recurrent Neural Network
Authors Ruihuan Du, Yu Zhong, Harikesh Nair, Bo Cui, Ruyang Shou
Abstract This paper describes a practical system for Multi Touch Attribution (MTA) for use by a publisher of digital ads. We developed this system for JD.com, an eCommerce company, which is also a publisher of digital ads in China. The approach has two steps. The first step (‘response modeling’) fits a user-level model for purchase of a product as a function of the user’s exposure to ads. The second (‘credit allocation’) uses the fitted model to allocate the incremental part of the observed purchase due to advertising, to the ads the user is exposed to over the previous T days. To implement step one, we train a Recurrent Neural Network (RNN) on user-level conversion and exposure data. The RNN has the advantage of flexibly handling the sequential dependence in the data in a semi-parametric way. The specific RNN formulation we implement captures the impact of advertising intensity, timing, competition, and user-heterogeneity, which are known to be relevant to ad-response. To implement step two, we compute Shapley Values, which have the advantage of having axiomatic foundations and satisfying fairness considerations. The specific formulation of the Shapley Value we implement respects incrementality by allocating the overall incremental improvement in conversion to the exposed ads, while handling the sequence-dependence of exposures on the observed outcomes. The system is under production at JD.com, and scales to handle the high dimensionality of the problem on the platform (attribution of the orders of about 300M users, for roughly 160K brands, across 200+ ad-types, served about 80B ad-impressions over a typical 15-day period).
Tasks
Published 2019-02-01
URL http://arxiv.org/abs/1902.00215v3
PDF http://arxiv.org/pdf/1902.00215v3.pdf
PWC https://paperswithcode.com/paper/causally-driven-incremental-multi-touch
Repo
Framework

A Neural Virtual Anchor Synthesizer based on Seq2Seq and GAN Models

Title A Neural Virtual Anchor Synthesizer based on Seq2Seq and GAN Models
Authors Zipeng Wang, Zhaoxiang Liu, Zezhou Chen, Huan Hu, Shiguo Lian
Abstract This paper presents a novel framework to generate realistic face video of an anchor, who is reading certain news. This task is also known as Virtual Anchor. Given some paragraphs of words, we first utilize a pretrained Word2Vec model to embed each word into a vector; then we utilize a Seq2Seq-based model to translate these word embeddings into action units and head poses of the target anchor; these action units and head poses will be concatenated with facial landmarks as well as the former $n$ synthesized frames, and the concatenation serves as input of a Pix2PixHD-based model to synthesize realistic facial images for the virtual anchor. The experimental results demonstrate our framework is feasible for the synthesis of virtual anchor.
Tasks Word Embeddings
Published 2019-08-20
URL https://arxiv.org/abs/1908.07262v2
PDF https://arxiv.org/pdf/1908.07262v2.pdf
PWC https://paperswithcode.com/paper/a-neural-virtual-anchor-synthesizer-based-on
Repo
Framework

A CNN-RNN Architecture for Multi-Label Weather Recognition

Title A CNN-RNN Architecture for Multi-Label Weather Recognition
Authors Bin Zhao, Xuelong Li, Xiaoqiang Lu, Zhigang Wang
Abstract Weather Recognition plays an important role in our daily lives and many computer vision applications. However, recognizing the weather conditions from a single image remains challenging and has not been studied thoroughly. Generally, most previous works treat weather recognition as a single-label classification task, namely, determining whether an image belongs to a specific weather class or not. This treatment is not always appropriate, since more than one weather conditions may appear simultaneously in a single image. To address this problem, we make the first attempt to view weather recognition as a multi-label classification task, i.e., assigning an image more than one labels according to the displayed weather conditions. Specifically, a CNN-RNN based multi-label classification approach is proposed in this paper. The convolutional neural network (CNN) is extended with a channel-wise attention model to extract the most correlated visual features. The Recurrent Neural Network (RNN) further processes the features and excavates the dependencies among weather classes. Finally, the weather labels are predicted step by step. Besides, we construct two datasets for the weather recognition task and explore the relationships among different weather conditions. Experimental results demonstrate the superiority and effectiveness of the proposed approach. The new constructed datasets will be available at https://github.com/wzgwzg/Multi-Label-Weather-Recognition.
Tasks Multi-Label Classification
Published 2019-04-24
URL http://arxiv.org/abs/1904.10709v1
PDF http://arxiv.org/pdf/1904.10709v1.pdf
PWC https://paperswithcode.com/paper/a-cnn-rnn-architecture-for-multi-label
Repo
Framework
comments powered by Disqus