October 18, 2019

3015 words 15 mins read

Paper Group ANR 680

Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks. Holographic Visualisation of Radiology Data and Automated Machine Learning-based Medical Image Segmentation. Deep Sequential Segmentation of Organs in Volumetric Medical Scans. Localization under Topological Uncertainty for Lane Identification of Autonomous Vehicles …

Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks


Title	Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks
Authors	Seyed Ali Jalalifar, Hosein Hasani, Hamid Aghajan
Abstract	We present a novel approach to generating photo-realistic images of a face with accurate lip sync, given an audio input. By using a recurrent neural network, we achieved mouth landmarks based on audio features. We exploited the power of conditional generative adversarial networks to produce highly-realistic face conditioned on a set of landmarks. These two networks together are capable of producing a sequence of natural faces in sync with an input audio track.
Tasks
Published	2018-03-20
URL	http://arxiv.org/abs/1803.07461v1
PDF	http://arxiv.org/pdf/1803.07461v1.pdf
PWC	https://paperswithcode.com/paper/speech-driven-facial-reenactment-using
Repo
Framework

Holographic Visualisation of Radiology Data and Automated Machine Learning-based Medical Image Segmentation


Title	Holographic Visualisation of Radiology Data and Automated Machine Learning-based Medical Image Segmentation
Authors	Lucian Trestioreanu
Abstract	Within this thesis we propose a platform for combining Augmented Reality (AR) hardware with machine learning in a user-oriented pipeline, offering to the medical staff an intuitive 3D visualization of volumetric Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) medical image segmentations inside the AR headset, that does not need human intervention for loading, processing and segmentation of medical images. The AR visualization, based on Microsoft HoloLens, employs a modular and thus scalable frontend-backend architecture for real-time visualizations on multiple AR headsets. As Convolutional Neural Networks (CNNs) have lastly demonstrated superior performance for the machine learning task of image semantic segmentation, the pipeline also includes a fully automated CNN algorithm for the segmentation of the liver from CT scans. The model is based on the Deep Retinal Image Understanding (DRIU) model which is a Fully Convolutional Network with side outputs from feature maps with different resolution, extracted at different stages of the network. The algorithm is 2.5D which means that the input is a set of consecutive scan slices. The experiments have been performed on the Liver Tumor Segmentation Challenge (LiTS) dataset for liver segmentation and demonstrated good results and flexibility. While multiple approaches exist in the domain, only few of them have focused on overcoming the practical aspects which still largely hold this technology away from the operating rooms. In line with this, we also are next planning an evaluation from medical doctors and radiologists in a real-world environment.
Tasks	Computed Tomography (CT), Liver Segmentation, Medical Image Segmentation, Semantic Segmentation
Published	2018-08-15
URL	http://arxiv.org/abs/1808.04929v1
PDF	http://arxiv.org/pdf/1808.04929v1.pdf
PWC	https://paperswithcode.com/paper/holographic-visualisation-of-radiology-data
Repo
Framework

Deep Sequential Segmentation of Organs in Volumetric Medical Scans


Title	Deep Sequential Segmentation of Organs in Volumetric Medical Scans
Authors	Alexey Novikov, David Major, Maria Wimmer, Dimitrios Lenis, Katja Bühler
Abstract	Segmentation in 3D scans is playing an increasingly important role in current clinical practice supporting diagnosis, tissue quantification, or treatment planning. The current 3D approaches based on convolutional neural networks usually suffer from at least three main issues caused predominantly by implementation constraints - first, they require resizing the volume to the lower-resolutional reference dimensions, second, the capacity of such approaches is very limited due to memory restrictions, and third, all slices of volumes have to be available at any given training or testing time. We address these problems by a U-Net-like architecture consisting of bidirectional convolutional LSTM and convolutional, pooling, upsampling and concatenation layers enclosed into time-distributed wrappers. Our network can either process the full volumes in a sequential manner, or segment slabs of slices on demand. We demonstrate performance of our architecture on vertebrae and liver segmentation tasks in 3D CT scans.
Tasks	Liver Segmentation
Published	2018-07-06
URL	http://arxiv.org/abs/1807.02437v2
PDF	http://arxiv.org/pdf/1807.02437v2.pdf
PWC	https://paperswithcode.com/paper/deep-sequential-segmentation-of-organs-in
Repo
Framework

Localization under Topological Uncertainty for Lane Identification of Autonomous Vehicles


Title	Localization under Topological Uncertainty for Lane Identification of Autonomous Vehicles
Authors	Samer B. Nashed, David M. Ilstrup, Joydeep Biswas
Abstract	Autonomous vehicles (AVs) require accurate metric and topological location estimates for safe, effective navigation and decision-making. Although many high-definition (HD) roadmaps exist, they are not always accurate since public roads are dynamic, shaped unpredictably by both human activity and nature. Thus, AVs must be able to handle situations in which the topology specified by the map does not agree with reality. We present the Variable Structure Multiple Hidden Markov Model (VSM-HMM) as a framework for localizing in the presence of topological uncertainty, and demonstrate its effectiveness on an AV where lane membership is modeled as a topological localization process. VSM-HMMs use a dynamic set of HMMs to simultaneously reason about location within a set of most likely current topologies and therefore may also be applied to topological structure estimation as well as AV lane estimation. In addition, we present an extension to the Earth Mover’s Distance which allows uncertainty to be taken into account when computing the distance between belief distributions on simplices of arbitrary relative sizes.
Tasks	Autonomous Vehicles, Decision Making
Published	2018-03-04
URL	http://arxiv.org/abs/1803.01378v1
PDF	http://arxiv.org/pdf/1803.01378v1.pdf
PWC	https://paperswithcode.com/paper/localization-under-topological-uncertainty
Repo
Framework

Face Destylization


Title	Face Destylization
Authors	Fatemeh Shiri, Xin Yu, Fatih Porikli, Piotr Koniusz
Abstract	Numerous style transfer methods which produce artistic styles of portraits have been proposed to date. However, the inverse problem of converting the stylized portraits back into realistic faces is yet to be investigated thoroughly. Reverting an artistic portrait to its original photo-realistic face image has potential to facilitate human perception and identity analysis. In this paper, we propose a novel Face Destylization Neural Network (FDNN) to restore the latent photo-realistic faces from the stylized ones. We develop a Style Removal Network composed of convolutional, fully-connected and deconvolutional layers. The convolutional layers are designed to extract facial components from stylized face images. Consecutively, the fully-connected layer transfers the extracted feature maps of stylized images into the corresponding feature maps of real faces and the deconvolutional layers generate real faces from the transferred feature maps. To enforce the destylized faces to be similar to authentic face images, we employ a discriminative network, which consists of convolutional and fully connected layers. We demonstrate the effectiveness of our network by conducting experiments on an extensive set of synthetic images. Furthermore, we illustrate our network can recover faces from stylized portraits and real paintings for which the stylized data was unavailable during the training phase.
Tasks	Style Transfer
Published	2018-02-05
URL	http://arxiv.org/abs/1802.01237v1
PDF	http://arxiv.org/pdf/1802.01237v1.pdf
PWC	https://paperswithcode.com/paper/face-destylization
Repo
Framework

CReaM: Condensed Real-time Models for Depth Prediction using Convolutional Neural Networks


Title	CReaM: Condensed Real-time Models for Depth Prediction using Convolutional Neural Networks
Authors	Andrew Spek, Thanuja Dharmasiri, Tom Drummond
Abstract	Since the resurgence of CNNs the robotic vision community has developed a range of algorithms that perform classification, semantic segmentation and structure prediction (depths, normals, surface curvature) using neural networks. While some of these models achieve state-of-the art results and super human level performance, deploying these models in a time critical robotic environment remains an ongoing challenge. Real-time frameworks are of paramount importance to build a robotic society where humans and robots integrate seamlessly. To this end, we present a novel real-time structure prediction framework that predicts depth at 30fps on an NVIDIA-TX2. At the time of writing, this is the first piece of work to showcase such a capability on a mobile platform. We also demonstrate with extensive experiments that neural networks with very large model capacities can be leveraged in order to train accurate condensed model architectures in a “from teacher to student” style knowledge transfer.
Tasks	Depth Estimation, Semantic Segmentation, Transfer Learning
Published	2018-07-24
URL	http://arxiv.org/abs/1807.08931v1
PDF	http://arxiv.org/pdf/1807.08931v1.pdf
PWC	https://paperswithcode.com/paper/cream-condensed-real-time-models-for-depth
Repo
Framework

An EMG Gesture Recognition System with Flexible High-Density Sensors and Brain-Inspired High-Dimensional Classifier


Title	An EMG Gesture Recognition System with Flexible High-Density Sensors and Brain-Inspired High-Dimensional Classifier
Authors	Ali Moin, Andy Zhou, Abbas Rahimi, Simone Benatti, Alisha Menon, Senam Tamakloe, Jonathan Ting, Natasha Yamamoto, Yasser Khan, Fred Burghardt, Luca Benini, Ana C. Arias, Jan M. Rabaey
Abstract	EMG-based gesture recognition shows promise for human-machine interaction. Systems are often afflicted by signal and electrode variability which degrades performance over time. We present an end-to-end system combating this variability using a large-area, high-density sensor array and a robust classification algorithm. EMG electrodes are fabricated on a flexible substrate and interfaced to a custom wireless device for 64-channel signal acquisition and streaming. We use brain-inspired high-dimensional (HD) computing for processing EMG features in one-shot learning. The HD algorithm is tolerant to noise and electrode misplacement and can quickly learn from few gestures without gradient descent or back-propagation. We achieve an average classification accuracy of 96.64% for five gestures, with only 7% degradation when training and testing across different days. Our system maintains this accuracy when trained with only three trials of gestures; it also demonstrates comparable accuracy with the state-of-the-art when trained with one trial.
Tasks	Gesture Recognition, One-Shot Learning
Published	2018-02-28
URL	http://arxiv.org/abs/1802.10237v2
PDF	http://arxiv.org/pdf/1802.10237v2.pdf
PWC	https://paperswithcode.com/paper/an-emg-gesture-recognition-system-with
Repo
Framework

From the User to the Medium: Neural Profiling Across Web Communities


Title	From the User to the Medium: Neural Profiling Across Web Communities
Authors	Mohammad Akbari, Kunal Relia, Anas Elghafari, Rumi Chunara
Abstract	Online communities provide a unique way for individuals to access information from those in similar circumstances, which can be critical for health conditions that require daily and personalized management. As these groups and topics often arise organically, identifying the types of topics discussed is necessary to understand their needs. As well, these communities and people in them can be quite diverse, and existing community detection methods have not been extended towards evaluating these heterogeneities. This has been limited as community detection methodologies have not focused on community detection based on semantic relations between textual features of the user-generated content. Thus here we develop an approach, NeuroCom, that optimally finds dense groups of users as communities in a latent space inferred by neural representation of published contents of users. By embedding of words and messages, we show that NeuroCom demonstrates improved clustering and identifies more nuanced discussion topics in contrast to other common unsupervised learning approaches.
Tasks	Community Detection
Published	2018-12-03
URL	https://arxiv.org/abs/1812.00912v1
PDF	https://arxiv.org/pdf/1812.00912v1.pdf
PWC	https://paperswithcode.com/paper/from-the-user-to-the-medium-neural-profiling
Repo
Framework

Accounting for Unobservable Heterogeneity in Cross Section Using Spatial First Differences


Title	Accounting for Unobservable Heterogeneity in Cross Section Using Spatial First Differences
Authors	Hannah Druckenmiller, Solomon Hsiang
Abstract	We develop a cross-sectional research design to identify causal effects in the presence of unobservable heterogeneity without instruments. When units are dense in physical space, it may be sufficient to regress the “spatial first differences” (SFD) of the outcome on the treatment and omit all covariates. The identifying assumptions of SFD are similar in mathematical structure and plausibility to other quasi-experimental designs. We use SFD to obtain new estimates for the effects of time-invariant geographic factors, soil and climate, on long-run agricultural productivities — relationships crucial for economic decisions, such as land management and climate policy, but notoriously confounded by unobservables.
Tasks	Time Series
Published	2018-10-16
URL	https://arxiv.org/abs/1810.07216v2
PDF	https://arxiv.org/pdf/1810.07216v2.pdf
PWC	https://paperswithcode.com/paper/accounting-for-unobservable-heterogeneity-in
Repo
Framework

#SarcasmDetection is soooo general! Towards a Domain-Independent Approach for Detecting Sarcasm


Title	#SarcasmDetection is soooo general! Towards a Domain-Independent Approach for Detecting Sarcasm
Authors	Natalie Parde, Rodney D. Nielsen
Abstract	Automatic sarcasm detection methods have traditionally been designed for maximum performance on a specific domain. This poses challenges for those wishing to transfer those approaches to other existing or novel domains, which may be typified by very different language characteristics. We develop a general set of features and evaluate it under different training scenarios utilizing in-domain and/or out-of-domain training data. The best-performing scenario, training on both while employing a domain adaptation step, achieves an F1 of 0.780, which is well above baseline F1-measures of 0.515 and 0.345. We also show that the approach outperforms the best results from prior work on the same target domain.
Tasks	Domain Adaptation, Sarcasm Detection
Published	2018-06-08
URL	http://arxiv.org/abs/1806.03369v1
PDF	http://arxiv.org/pdf/1806.03369v1.pdf
PWC	https://paperswithcode.com/paper/sarcasmdetection-is-soooo-general-towards-a
Repo
Framework

Learning Implicit Generative Models by Teaching Explicit Ones


Title	Learning Implicit Generative Models by Teaching Explicit Ones
Authors	Chao Du, Kun Xu, Chongxuan Li, Jun Zhu, Bo Zhang
Abstract	Implicit generative models are difficult to train as no explicit density functions are defined. Generative adversarial nets (GANs) present a minimax framework to train such models, which however can suffer from mode collapse due to the nature of the JS-divergence. This paper presents a learning by teaching (LBT) approach to learning implicit models, which intrinsically avoids the mode collapse problem by optimizing a KL-divergence rather than the JS-divergence in GANs. In LBT, an auxiliary explicit model is introduced to fit the distribution defined by the implicit model while the later one teaches the explicit model to match the data distribution. LBT is formulated as a bilevel optimization problem, whose optimal generator matches the true data distribution. LBT can be naturally integrated with GANs to derive a hybrid LBT-GAN that enjoys complimentary benefits. %implies that we obtain the maximum likelihood estimation of the implicit model. Finally, we present a stochastic gradient ascent algorithm with unrolling to solve the challenging learning problems. Experimental results demonstrate the effectiveness of our method.
Tasks	bilevel optimization
Published	2018-07-10
URL	http://arxiv.org/abs/1807.03870v2
PDF	http://arxiv.org/pdf/1807.03870v2.pdf
PWC	https://paperswithcode.com/paper/learning-implicit-generative-models-by
Repo
Framework

Adaptive Shooting for Bots in First Person Shooter Games Using Reinforcement Learning


Title	Adaptive Shooting for Bots in First Person Shooter Games Using Reinforcement Learning
Authors	Frank G. Glavin, Michael G. Madden
Abstract	In current state-of-the-art commercial first person shooter games, computer controlled bots, also known as non player characters, can often be easily distinguishable from those controlled by humans. Tell-tale signs such as failed navigation, “sixth sense” knowledge of human players’ whereabouts and deterministic, scripted behaviors are some of the causes of this. We propose, however, that one of the biggest indicators of non humanlike behavior in these games can be found in the weapon shooting capability of the bot. Consistently perfect accuracy and “locking on” to opponents in their visual field from any distance are indicative capabilities of bots that are not found in human players. Traditionally, the bot is handicapped in some way with either a timed reaction delay or a random perturbation to its aim, which doesn’t adapt or improve its technique over time. We hypothesize that enabling the bot to learn the skill of shooting through trial and error, in the same way a human player learns, will lead to greater variation in game-play and produce less predictable non player characters. This paper describes a reinforcement learning shooting mechanism for adapting shooting over time based on a dynamic reward signal from the amount of damage caused to opponents.
Tasks
Published	2018-06-14
URL	http://arxiv.org/abs/1806.05554v1
PDF	http://arxiv.org/pdf/1806.05554v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-shooting-for-bots-in-first-person
Repo
Framework

A Boo(n) for Evaluating Architecture Performance


Title	A Boo(n) for Evaluating Architecture Performance
Authors	Ondrej Bajgar, Rudolf Kadlec, Jan Kleindienst
Abstract	We point out important problems with the common practice of using the best single model performance for comparing deep learning architectures, and we propose a method that corrects these flaws. Each time a model is trained, one gets a different result due to random factors in the training process, which include random parameter initialization and random data shuffling. Reporting the best single model performance does not appropriately address this stochasticity. We propose a normalized expected best-out-of-$n$ performance ($\text{Boo}_n$) as a way to correct these problems.
Tasks
Published	2018-07-05
URL	http://arxiv.org/abs/1807.01961v2
PDF	http://arxiv.org/pdf/1807.01961v2.pdf
PWC	https://paperswithcode.com/paper/a-boon-for-evaluating-architecture
Repo
Framework

Visual-Texual Emotion Analysis with Deep Coupled Video and Danmu Neural Networks


Title	Visual-Texual Emotion Analysis with Deep Coupled Video and Danmu Neural Networks
Authors	Chenchen Li, Jialin Wang, Hongwei Wang, Miao Zhao, Wenjie Li, Xiaotie Deng
Abstract	User emotion analysis toward videos is to automatically recognize the general emotional status of viewers from the multimedia content embedded in the online video stream. Existing works fall in two categories: 1) visual-based methods, which focus on visual content and extract a specific set of features of videos. However, it is generally hard to learn a mapping function from low-level video pixels to high-level emotion space due to great intra-class variance. 2) textual-based methods, which focus on the investigation of user-generated comments associated with videos. The learned word representations by traditional linguistic approaches typically lack emotion information and the global comments usually reflect viewers’ high-level understandings rather than instantaneous emotions. To address these limitations, in this paper, we propose to jointly utilize video content and user-generated texts simultaneously for emotion analysis. In particular, we introduce exploiting a new type of user-generated texts, i.e., “danmu”, which are real-time comments floating on the video and contain rich information to convey viewers’ emotional opinions. To enhance the emotion discriminativeness of words in textual feature extraction, we propose Emotional Word Embedding (EWE) to learn text representations by jointly considering their semantics and emotions. Afterwards, we propose a novel visual-textual emotion analysis model with Deep Coupled Video and Danmu Neural networks (DCVDN), in which visual and textual features are synchronously extracted and fused to form a comprehensive representation by deep-canonically-correlated-autoencoder-based multi-view learning. Through extensive experiments on a self-crawled real-world video-danmu dataset, we prove that DCVDN significantly outperforms the state-of-the-art baselines.
Tasks	Emotion Recognition, MULTI-VIEW LEARNING
Published	2018-11-19
URL	http://arxiv.org/abs/1811.07485v1
PDF	http://arxiv.org/pdf/1811.07485v1.pdf
PWC	https://paperswithcode.com/paper/visual-texual-emotion-analysis-with-deep
Repo
Framework

Least Square Error Method Robustness of Computation: What is not usually considered and taught


Title	Least Square Error Method Robustness of Computation: What is not usually considered and taught
Authors	Vaclav Skala
Abstract	There are many practical applications based on the Least Square Error (LSE) approximation. It is based on a square error minimization ‘on a vertical’ axis. The LSE method is simple and easy also for analytical purposes. However, if data span is large over several magnitudes or non-linear LSE is used, severe numerical instability can be expected. The presented contribution describes a simple method for large span of data LSE computation. It is especially convenient if large span of data are to be processed, when the ‘standard’ pseudoinverse matrix is ill conditioned. It is actually based on a LSE solution using orthogonal basis vectors instead of orthonormal basis vectors. The presented approach has been used for a linear regression as well as for approximation using radial basis functions.
Tasks
Published	2018-01-01
URL	http://arxiv.org/abs/1802.07591v1
PDF	http://arxiv.org/pdf/1802.07591v1.pdf
PWC	https://paperswithcode.com/paper/least-square-error-method-robustness-of
Repo
Framework