January 30, 2020

3087 words 15 mins read

Paper Group ANR 241

Conversational Agents for Insurance Companies: From Theory to Practice. Wi-Fringe: Leveraging Text Semantics in WiFi CSI-Based Device-Free Named Gesture Recognition. Unsupervised Neural Mask Estimator For Generalized Eigen-Value Beamforming Based ASR. Reducing Lateral Visual Biases in Displays. Learning Navigation by Visual Localization and Traject …

Conversational Agents for Insurance Companies: From Theory to Practice


Title	Conversational Agents for Insurance Companies: From Theory to Practice
Authors	Falko Koetter, Matthias Blohm, Jens Drawehn, Monika Kochanowski, Joscha Goetzer, Daniel Graziotin, Stefan Wagner
Abstract	Advances in artificial intelligence have renewed interest in conversational agents. Additionally to software developers, today all kinds of employees show interest in new technologies and their possible applications for customers. German insurance companies generally are interested in improving their customer service and digitizing their business processes. In this work we investigate the potential use of conversational agents in insurance companies theoretically by determining which classes of agents exist which are of interest to insurance companies, finding relevant use cases and requirements. We add two practical parts: First we develop a showcase prototype for an exemplary insurance scenario in claim management. Additionally in a second step, we create a prototype focusing on customer service in a chatbot hackathon, fostering innovation in interdisciplinary teams. In this work, we describe the results of both prototypes in detail. We evaluate both chatbots defining criteria for both settings in detail and compare the results and draw conclusions for the maturity of chatbot technology for practical use, describing the opportunities and challenges companies, especially small and medium enterprises, face.
Tasks	Chatbot
Published	2019-12-18
URL	https://arxiv.org/abs/1912.08473v1
PDF	https://arxiv.org/pdf/1912.08473v1.pdf
PWC	https://paperswithcode.com/paper/conversational-agents-for-insurance-companies
Repo
Framework

Wi-Fringe: Leveraging Text Semantics in WiFi CSI-Based Device-Free Named Gesture Recognition


Title	Wi-Fringe: Leveraging Text Semantics in WiFi CSI-Based Device-Free Named Gesture Recognition
Authors	Md Tamzeed Islam, Shahriar Nirjon
Abstract	The lack of adequate training data is one of the major hurdles in WiFi-based activity recognition systems. In this paper, we propose Wi-Fringe, which is a WiFi CSI-based device-free human gesture recognition system that recognizes named gestures, i.e., activities and gestures that have a semantically meaningful name in English language, as opposed to arbitrary free-form gestures. Given a list of activities (only their names in English text), along with zero or more training examples (WiFi CSI values) per activity, Wi-Fringe is able to detect all activities at runtime. In other words, a subset of activities that Wi-Fringe detects do not require any training examples at all.
Tasks	Activity Recognition, Gesture Recognition
Published	2019-08-16
URL	https://arxiv.org/abs/1908.06803v1
PDF	https://arxiv.org/pdf/1908.06803v1.pdf
PWC	https://paperswithcode.com/paper/wi-fringe-leveraging-text-semantics-in-wifi
Repo
Framework

Unsupervised Neural Mask Estimator For Generalized Eigen-Value Beamforming Based ASR


Title	Unsupervised Neural Mask Estimator For Generalized Eigen-Value Beamforming Based ASR
Authors	Rohit Kumar, Anirudh Sreeram, Anurenjan Purushothaman, Sriram Ganapathy
Abstract	The state-of-art methods for acoustic beamforming in multi-channel ASR are based on a neural mask estimator that predicts the presence of speech and noise. These models are trained using a paired corpus of clean and noisy recordings (teacher model). In this paper, we attempt to move away from the requirements of having supervised clean recordings for training the mask estimator. The models based on signal enhancement and beamforming using multi-channel linear prediction serve as the required mask estimate. In this way, the model training can also be carried out on real recordings of noisy speech rather than simulated ones alone done in a typical teacher model. Several experiments performed on noisy and reverberant environments in the CHiME-3 corpus as well as the REVERB challenge corpus highlight the effectiveness of the proposed approach. The ASR results for the proposed approach provide performances that are significantly better than a teacher model trained on an out-of-domain dataset and on par with the oracle mask estimators trained on the in-domain dataset.
Tasks
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12617v1
PDF	https://arxiv.org/pdf/1911.12617v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-neural-mask-estimator-for
Repo
Framework

Reducing Lateral Visual Biases in Displays


Title	Reducing Lateral Visual Biases in Displays
Authors	Inbar Huberman, Raanan Fattal
Abstract	The human visual system is composed of multiple physiological components that apply multiple mechanisms in order to cope with the rich visual content it encounters. The complexity of this system leads to non-trivial relations between what we see and what we perceive, and in particular, between the raw intensities of an image that we display and the ones we perceive where various visual biases and illusions are introduced. In this paper we describe a method for reducing a large class of biases related to the lateral inhibition mechanism in the human retina where neurons suppress the activity of neighboring receptors. Among these biases are the well-known Mach bands and halos that appear around smooth and sharp image gradients as well as the appearance of false contrasts between identical regions. The new method removes these visual biases by computing an image that contains counter biases such that when this laterally-compensated image is viewed on a display, the inserted biases cancel the ones created in the retina.
Tasks
Published	2019-04-11
URL	http://arxiv.org/abs/1904.05614v1
PDF	http://arxiv.org/pdf/1904.05614v1.pdf
PWC	https://paperswithcode.com/paper/reducing-lateral-visual-biases-in-displays
Repo
Framework


Title	Learning Navigation by Visual Localization and Trajectory Prediction
Authors	Iulia Paraicu, Marius Leordeanu
Abstract	When driving, people make decisions based on current traffic as well as their desired route. They have a mental map of known routes and are often able to navigate without needing directions. Current self-driving models improve their performances when using additional GPS information. Here we aim to push forward self-driving research and perform route planning even in the absence of GPS. Our system learns to predict in real-time vehicle’s current location and future trajectory, as a function of time, on a known map, given only the raw video stream and the intended destination. The GPS signal is available only at training time, with training data annotation being fully automatic. Different from other published models, we predict the vehicle’s trajectory for up to seven seconds ahead, from which complete steering, speed and acceleration information can be derived for the entire time span. Trajectories capture navigational information on multiple levels, from instant steering commands that depend on present traffic and obstacles ahead, to longer-term navigation decisions, towards a specific destination. We collect our dataset with a regular car and a smartphone that records video and GPS streams. The GPS data is used to derive ground-truth supervision labels and create an analytical representation of the traversed map. In tests, our system outperforms published methods on visual localization and steering and gives accurate navigation assistance between any two known locations.
Tasks	Trajectory Prediction, Visual Localization
Published	2019-10-07
URL	https://arxiv.org/abs/1910.02818v1
PDF	https://arxiv.org/pdf/1910.02818v1.pdf
PWC	https://paperswithcode.com/paper/learning-navigation-by-visual-localization
Repo
Framework

Vid2Game: Controllable Characters Extracted from Real-World Videos


Title	Vid2Game: Controllable Characters Extracted from Real-World Videos
Authors	Oran Gafni, Lior Wolf, Yaniv Taigman
Abstract	We are given a video of a person performing a certain activity, from which we extract a controllable model. The model generates novel image sequences of that person, according to arbitrary user-defined control signals, typically marking the displacement of the moving body. The generated video can have an arbitrary background, and effectively capture both the dynamics and appearance of the person. The method is based on two networks. The first network maps a current pose, and a single-instance control signal to the next pose. The second network maps the current pose, the new pose, and a given background, to an output frame. Both networks include multiple novelties that enable high-quality performance. This is demonstrated on multiple characters extracted from various videos of dancers and athletes.
Tasks
Published	2019-04-17
URL	http://arxiv.org/abs/1904.08379v1
PDF	http://arxiv.org/pdf/1904.08379v1.pdf
PWC	https://paperswithcode.com/paper/vid2game-controllable-characters-extracted
Repo
Framework

Unified Semantic Parsing with Weak Supervision


Title	Unified Semantic Parsing with Weak Supervision
Authors	Priyanka Agrawal, Parag Jain, Ayushi Dalmia, Abhishek Bansal, Ashish Mittal, Karthik Sankaranarayanan
Abstract	Semantic parsing over multiple knowledge bases enables a parser to exploit structural similarities of programs across the multiple domains. However, the fundamental challenge lies in obtaining high-quality annotations of (utterance, program) pairs across various domains needed for training such models. To overcome this, we propose a novel framework to build a unified multi-domain enabled semantic parser trained only with weak supervision (denotations). Weakly supervised training is particularly arduous as the program search space grows exponentially in a multi-domain setting. To solve this, we incorporate a multi-policy distillation mechanism in which we first train domain-specific semantic parsers (teachers) using weak supervision in the absence of the ground truth programs, followed by training a single unified parser (student) from the domain specific policies obtained from these teachers. The resultant semantic parser is not only compact but also generalizes better, and generates more accurate programs. It further does not require the user to provide a domain label while querying. On the standard Overnight dataset (containing multiple domains), we demonstrate that the proposed model improves performance by 20% in terms of denotation accuracy in comparison to baseline techniques.
Tasks	Semantic Parsing
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05062v1
PDF	https://arxiv.org/pdf/1906.05062v1.pdf
PWC	https://paperswithcode.com/paper/unified-semantic-parsing-with-weak
Repo
Framework

A Smart Sliding Chinese Pinyin Input Method Editor on Touchscreen


Title	A Smart Sliding Chinese Pinyin Input Method Editor on Touchscreen
Authors	Zhuosheng Zhang, Zhen Meng, Hai Zhao
Abstract	This paper presents a smart sliding Chinese pinyin Input Method Editor (IME) for touchscreen devices which allows user finger sliding from one key to another on the touchscreen instead of tapping keys one by one, while the target Chinese character sequence will be predicted during the sliding process to help user input Chinese characters efficiently. Moreover, the layout of the virtual keyboard of our IME adapts to user sliding for more efficient inputting. The layout adaption process is utilized with Recurrent Neural Networks (RNN) and deep reinforcement learning. The pinyin-to-character converter is implemented with a sequence-to-sequence (Seq2Seq) model to predict the target Chinese sequence. A sliding simulator is built to automatically produce sliding samples for model training and virtual keyboard test. The key advantage of our proposed IME is that nearly all its built-in tactics can be optimized automatically with deep learning algorithms only following user behavior. Empirical studies verify the effectiveness of the proposed model and show a better user input efficiency.
Tasks
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01063v2
PDF	https://arxiv.org/pdf/1909.01063v2.pdf
PWC	https://paperswithcode.com/paper/a-smart-sliding-chinese-pinyin-input-method
Repo
Framework

Solving Irregular and Data-enriched Differential Equations using Deep Neural Networks


Title	Solving Irregular and Data-enriched Differential Equations using Deep Neural Networks
Authors	Craig Michoski, Milos Milosavljevic, Todd Oliver, David Hatch
Abstract	Recent work has introduced a simple numerical method for solving partial differential equations (PDEs) with deep neural networks (DNNs). This paper reviews and extends the method while applying it to analyze one of the most fundamental features in numerical PDEs and nonlinear analysis: irregular solutions. First, the Sod shock tube solution to compressible Euler equations is discussed, analyzed, and then compared to conventional finite element and finite volume methods. These methods are extended to consider performance improvements and simultaneous parameter space exploration. Next, a shock solution to compressible magnetohydrodynamics (MHD) is solved for, and used in a scenario where experimental data is utilized to enhance a PDE system that is \emph{a priori} insufficient to validate against the observed/experimental data. This is accomplished by enriching the model PDE system with source terms and using supervised training on synthetic experimental data. The resulting DNN framework for PDEs seems to demonstrate almost fantastical ease of system prototyping, natural integration of large data sets (be they synthetic or experimental), all while simultaneously enabling single-pass exploration of the entire parameter space.
Tasks
Published	2019-05-10
URL	https://arxiv.org/abs/1905.04351v1
PDF	https://arxiv.org/pdf/1905.04351v1.pdf
PWC	https://paperswithcode.com/paper/solving-irregular-and-data-enriched
Repo
Framework

Self-Enhanced Convolutional Network for Facial Video Hallucination


Title	Self-Enhanced Convolutional Network for Facial Video Hallucination
Authors	Chaowei Fang, Guanbin Li, Xiaoguang Han, Yizhou Yu
Abstract	As a domain-specific super-resolution problem, facial image hallucination has enjoyed a series of breakthroughs thanks to the advances of deep convolutional neural networks. However, the direct migration of existing methods to video is still difficult to achieve good performance due to its lack of alignment and consistency modelling in temporal domain. Taking advantage of high inter-frame dependency in videos, we propose a self-enhanced convolutional network for facial video hallucination. It is implemented by making full usage of preceding super-resolved frames and a temporal window of adjacent low-resolution frames. Specifically, the algorithm first obtains the initial high-resolution inference of each frame by taking into consideration a sequence of consecutive low-resolution inputs through temporal consistency modelling. It further recurrently exploits the reconstructed results and intermediate features of a sequence of preceding frames to improve the initial super-resolution of the current frame by modelling the coherence of structural facial features across frames. Quantitative and qualitative evaluations demonstrate the superiority of the proposed algorithm against state-of-the-art methods. Moreover, our algorithm also achieves excellent performance in the task of general video super-resolution in a single-shot setting.
Tasks	Super-Resolution, Video Super-Resolution
Published	2019-11-23
URL	https://arxiv.org/abs/1911.11136v1
PDF	https://arxiv.org/pdf/1911.11136v1.pdf
PWC	https://paperswithcode.com/paper/self-enhanced-convolutional-network-for
Repo
Framework


Title	Dreaddit: A Reddit Dataset for Stress Analysis in Social Media
Authors	Elsbeth Turcan, Kathleen McKeown
Abstract	Stress is a nigh-universal human experience, particularly in the online world. While stress can be a motivator, too much stress is associated with many negative health outcomes, making its identification useful across a range of domains. However, existing computational research typically only studies stress in domains such as speech, or in short genres such as Twitter. We present Dreaddit, a new text corpus of lengthy multi-domain social media data for the identification of stress. Our dataset consists of 190K posts from five different categories of Reddit communities; we additionally label 3.5K total segments taken from 3K posts using Amazon Mechanical Turk. We present preliminary supervised learning methods for identifying stress, both neural and traditional, and analyze the complexity and diversity of the data and characteristics of each category.
Tasks
Published	2019-10-31
URL	https://arxiv.org/abs/1911.00133v1
PDF	https://arxiv.org/pdf/1911.00133v1.pdf
PWC	https://paperswithcode.com/paper/dreaddit-a-reddit-dataset-for-stress-analysis-1
Repo
Framework

Using 3D Convolutional Neural Networks to Learn Spatiotemporal Features for Automatic Surgical Gesture Recognition in Video


Title	Using 3D Convolutional Neural Networks to Learn Spatiotemporal Features for Automatic Surgical Gesture Recognition in Video
Authors	Isabel Funke, Sebastian Bodenstedt, Florian Oehme, Felix von Bechtolsheim, Jürgen Weitz, Stefanie Speidel
Abstract	Automatically recognizing surgical gestures is a crucial step towards a thorough understanding of surgical skill. Possible areas of application include automatic skill assessment, intra-operative monitoring of critical surgical steps, and semi-automation of surgical tasks. Solutions that rely only on the laparoscopic video and do not require additional sensor hardware are especially attractive as they can be implemented at low cost in many scenarios. However, surgical gesture recognition based only on video is a challenging problem that requires effective means to extract both visual and temporal information from the video. Previous approaches mainly rely on frame-wise feature extractors, either handcrafted or learned, which fail to capture the dynamics in surgical video. To address this issue, we propose to use a 3D Convolutional Neural Network (CNN) to learn spatiotemporal features from consecutive video frames. We evaluate our approach on recordings of robot-assisted suturing on a bench-top model, which are taken from the publicly available JIGSAWS dataset. Our approach achieves high frame-wise surgical gesture recognition accuracies of more than 84%, outperforming comparable models that either extract only spatial features or model spatial and low-level temporal information separately. For the first time, these results demonstrate the benefit of spatiotemporal CNNs for video-based surgical gesture recognition.
Tasks	Gesture Recognition, Surgical Gesture Recognition
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11454v1
PDF	https://arxiv.org/pdf/1907.11454v1.pdf
PWC	https://paperswithcode.com/paper/using-3d-convolutional-neural-networks-to
Repo
Framework

Learning a Neural 3D Texture Space from 2D Exemplars


Title	Learning a Neural 3D Texture Space from 2D Exemplars
Authors	Philipp Henzler, Niloy J. Mitra, Tobias Ritschel
Abstract	We propose a generative model of 2D and 3D natural textures with diversity, visual fidelity and at high computational efficiency. This is enabled by a family of methods that extend ideas from classic stochastic procedural texturing (Perlin noise) to learned, deep, non-linearities. The key idea is a hard-coded, tunable and differentiable step that feeds multiple transformed random 2D or 3D fields into an MLP that can be sampled over infinite domains. Our model encodes all exemplars from a diverse set of textures without a need to be re-trained for each exemplar. Applications include texture interpolation, and learning 3D textures from 2D exemplars.
Tasks
Published	2019-12-09
URL	https://arxiv.org/abs/1912.04158v1
PDF	https://arxiv.org/pdf/1912.04158v1.pdf
PWC	https://paperswithcode.com/paper/learning-a-neural-3d-texture-space-from-2d
Repo
Framework

Towards Disentangled Representations for Human Retargeting by Multi-view Learning


Title	Towards Disentangled Representations for Human Retargeting by Multi-view Learning
Authors	Chao Yang, Xiaofeng Liu, Qingming Tang, C. -C. Jay Kuo
Abstract	We study the problem of learning disentangled representations for data across multiple domains and its applications in human retargeting. Our goal is to map an input image to an identity-invariant latent representation that captures intrinsic factors such as expressions and poses. To this end, we present a novel multi-view learning approach that leverages various data sources such as images, keypoints, and poses. Our model consists of multiple id-conditioned VAEs for different views of the data. During training, we encourage the latent embeddings to be consistent across these views. Our observation is that auxiliary data like keypoints and poses contain critical, id-agnostic semantic information, and it is easier to train a disentangling CVAE on these simpler views to separate such semantics from other id-specific attributes. We show that training multi-view CVAEs and encourage latent-consistency guides the image encoding to preserve the semantics of expressions and poses, leading to improved disentangled representations and better human retargeting results.
Tasks	MULTI-VIEW LEARNING
Published	2019-12-12
URL	https://arxiv.org/abs/1912.06265v1
PDF	https://arxiv.org/pdf/1912.06265v1.pdf
PWC	https://paperswithcode.com/paper/towards-disentangled-representations-for
Repo
Framework

Deep Manifold Embedding for Hyperspectral Image Classification


Title	Deep Manifold Embedding for Hyperspectral Image Classification
Authors	Zhiqiang Gong, Weidong Hu, Xiaoyong Du, Ping Zhong, Panhe Hu
Abstract	Deep learning methods have played a more and more important role in hyperspectral image classification. However, the general deep learning methods mainly take advantage of the information of sample itself or the pairwise information between samples while ignore the intrinsic data structure within the whole data. To tackle this problem, this work develops a novel deep manifold embedding method(DMEM) for hyperspectral image classification. First, each class in the image is modelled as a specific nonlinear manifold and the geodesic distance is used to measure the correlation between the samples. Then, based on the hierarchical clustering, the manifold structure of the data can be captured and each nonlinear data manifold can be divided into several sub-classes. Finally, considering the distribution of each sub-class and the correlation between different subclasses, the DMEM is constructed to preserve the estimated geodesic distances on the data manifold between the learned low dimensional features of different samples. Experiments over three real-world hyperspectral image datasets have demonstrated the effectiveness of the proposed method.
Tasks	Hyperspectral Image Classification, Image Classification
Published	2019-12-24
URL	https://arxiv.org/abs/1912.11264v1
PDF	https://arxiv.org/pdf/1912.11264v1.pdf
PWC	https://paperswithcode.com/paper/deep-manifold-embedding-for-hyperspectral
Repo
Framework