April 1, 2020

2695 words 13 mins read

Paper Group ANR 522

Robust Multimodal Image Registration Using Deep Recurrent Reinforcement Learning. Multi-Person Pose Estimation with Enhanced Feature Aggregation and Selection. AnimePose: Multi-person 3D pose estimation and animation. Schema-Guided Dialogue State Tracking Task at DSTC8. Data Augmentation for Histopathological Images Based on Gaussian-Laplacian Pyra …

Robust Multimodal Image Registration Using Deep Recurrent Reinforcement Learning


Title	Robust Multimodal Image Registration Using Deep Recurrent Reinforcement Learning
Authors	Shanhui Sun, Jing Hu, Mingqing Yao, Jinrong Hu, Xiaodong Yang, Qi Song, Xi Wu
Abstract	The crucial components of a conventional image registration method are the choice of the right feature representations and similarity measures. These two components, although elaborately designed, are somewhat handcrafted using human knowledge. To this end, these two components are tackled in an end-to-end manner via reinforcement learning in this work. Specifically, an artificial agent, which is composed of a combined policy and value network, is trained to adjust the moving image toward the right direction. We train this network using an asynchronous reinforcement learning algorithm, where a customized reward function is also leveraged to encourage robust image registration. This trained network is further incorporated with a lookahead inference to improve the registration capability. The advantage of this algorithm is fully demonstrated by our superior performance on clinical MR and CT image pairs to other state-of-the-art medical image registration methods.
Tasks	Image Registration, Medical Image Registration
Published	2020-01-29
URL	https://arxiv.org/abs/2002.03733v1
PDF	https://arxiv.org/pdf/2002.03733v1.pdf
PWC	https://paperswithcode.com/paper/robust-multimodal-image-registration-using
Repo
Framework

Multi-Person Pose Estimation with Enhanced Feature Aggregation and Selection


Title	Multi-Person Pose Estimation with Enhanced Feature Aggregation and Selection
Authors	Xixia Xu, Qi Zou, Xue Lin
Abstract	We propose a novel Enhanced Feature Aggregation and Selection network (EFASNet) for multi-person 2D human pose estimation. Due to enhanced feature representation, our method can well handle crowded, cluttered and occluded scenes. More specifically, a Feature Aggregation and Selection Module (FASM), which constructs hierarchical multi-scale feature aggregation and makes the aggregated features discriminative, is proposed to get more accurate fine-grained representation, leading to more precise joint locations. Then, we perform a simple Feature Fusion (FF) strategy which effectively fuses high-resolution spatial features and low-resolution semantic features to obtain more reliable context information for well-estimated joints. Finally, we build a Dense Upsampling Convolution (DUC) module to generate more precise prediction, which can recover missing joint details that are usually unavailable in common upsampling process. As a result, the predicted keypoint heatmaps are more accurate. Comprehensive experiments demonstrate that the proposed approach outperforms the state-of-the-art methods and achieves the superior performance over three benchmark datasets: the recent big dataset CrowdPose, the COCO keypoint detection dataset and the MPII Human Pose dataset. Our code will be released upon acceptance.
Tasks	Keypoint Detection, Multi-Person Pose Estimation, Pose Estimation
Published	2020-03-20
URL	https://arxiv.org/abs/2003.10238v1
PDF	https://arxiv.org/pdf/2003.10238v1.pdf
PWC	https://paperswithcode.com/paper/multi-person-pose-estimation-with-enhanced-1
Repo
Framework

AnimePose: Multi-person 3D pose estimation and animation


Title	AnimePose: Multi-person 3D pose estimation and animation
Authors	Laxman Kumarapu, Prerana Mukherjee
Abstract	3D animation of humans in action is quite challenging as it involves using a huge setup with several motion trackers all over the person’s body to track the movements of every limb. This is time-consuming and may cause the person discomfort in wearing exoskeleton body suits with motion sensors. In this work, we present a trivial yet effective solution to generate 3D animation of multiple persons from a 2D video using deep learning. Although significant improvement has been achieved recently in 3D human pose estimation, most of the prior works work well in case of single person pose estimation and multi-person pose estimation is still a challenging problem. In this work, we firstly propose a supervised multi-person 3D pose estimation and animation framework namely AnimePose for a given input RGB video sequence. The pipeline of the proposed system consists of various modules: i) Person detection and segmentation, ii) Depth Map estimation, iii) Lifting 2D to 3D information for person localization iv) Person trajectory prediction and human pose tracking. Our proposed system produces comparable results on previous state-of-the-art 3D multi-person pose estimation methods on publicly available datasets MuCo-3DHP and MuPoTS-3D datasets and it also outperforms previous state-of-the-art human pose tracking methods by a significant margin of 11.7% performance gain on MOTA score on Posetrack 2018 dataset.
Tasks	3D Human Pose Estimation, 3D Multi-person Pose Estimation, 3D Pose Estimation, Human Detection, Multi-Person Pose Estimation, Pose Estimation, Pose Tracking, Trajectory Prediction
Published	2020-02-06
URL	https://arxiv.org/abs/2002.02792v1
PDF	https://arxiv.org/pdf/2002.02792v1.pdf
PWC	https://paperswithcode.com/paper/animepose-multi-person-3d-pose-estimation-and
Repo
Framework

Schema-Guided Dialogue State Tracking Task at DSTC8


Title	Schema-Guided Dialogue State Tracking Task at DSTC8
Authors	Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, Pranav Khaitan
Abstract	This paper gives an overview of the Schema-Guided Dialogue State Tracking task of the 8th Dialogue System Technology Challenge. The goal of this task is to develop dialogue state tracking models suitable for large-scale virtual assistants, with a focus on data-efficient joint modeling across domains and zero-shot generalization to new APIs. This task provided a new dataset consisting of over 16000 dialogues in the training set spanning 16 domains to highlight these challenges, and a baseline model capable of zero-shot generalization to new APIs. Twenty-five teams participated, developing a range of neural network models, exceeding the performance of the baseline model by a very high margin. The submissions incorporated a variety of pre-trained encoders and data augmentation techniques. This paper describes the task definition, dataset and evaluation methodology. We also summarize the approach and results of the submitted systems to highlight the overall trends in the state-of-the-art.
Tasks	Data Augmentation, Dialogue State Tracking
Published	2020-02-02
URL	https://arxiv.org/abs/2002.01359v1
PDF	https://arxiv.org/pdf/2002.01359v1.pdf
PWC	https://paperswithcode.com/paper/schema-guided-dialogue-state-tracking-task-at
Repo
Framework

Data Augmentation for Histopathological Images Based on Gaussian-Laplacian Pyramid Blending


Title	Data Augmentation for Histopathological Images Based on Gaussian-Laplacian Pyramid Blending
Authors	Steve Tsham Mpinda Ataky, Jonathan de Matos, Alceu de S. Britto Jr., Luiz E. S. Oliveira, Alessandro L. Koerich
Abstract	Data imbalance is a major problem that affects several machine learning algorithms. Such problems are troublesome because most of the learning algorithms attempts to optimize a loss function based on error measures that do not take into account the data imbalance. Accordingly, the learning algorithm simply generates a trivial model that is biased toward predicting the most frequent class in the training data. Data augmentation techniques have been used to mitigate the data imbalance problem. However, in the case of histopathologic images (HIs), low-level as well as high-level data augmentation techniques still present performance issues when applied in the presence of inter-patient variability; whence the model tends to learn color representations, which are in fact related to the stain process. In this paper, we propose an approach capable of not only augmenting HIs database but also distributing the inter-patient variability by means of image blending using Gaussian-Laplacian pyramid. The proposed approach consists in finding the Gaussian pyramids of two images of different patients and finding the Laplacian pyramids thereof. Afterwards, the left half of one image and the right half of another are joined in each level of Laplacian pyramid, and from the joint pyramids, the original image is reconstructed. This composition, resulting from the blending process, combines stain variation of two patients, avoiding that color misleads the learning process. Experimental results on the BreakHis dataset have shown promising gains vis-`a-vis the majority of traditional techniques presented in the literature.
Tasks	Data Augmentation
Published	2020-01-31
URL	https://arxiv.org/abs/2002.00072v1
PDF	https://arxiv.org/pdf/2002.00072v1.pdf
PWC	https://paperswithcode.com/paper/data-augmentation-for-histopathological
Repo
Framework

Real-Time Well Log Prediction From Drilling Data Using Deep Learning


Title	Real-Time Well Log Prediction From Drilling Data Using Deep Learning
Authors	Rayan Kanfar, Obai Shaikh, Mehrdad Yousefzadeh, Tapan Mukerji
Abstract	The objective is to study the feasibility of predicting subsurface rock properties in wells from real-time drilling data. Geophysical logs, namely, density, porosity and sonic logs are of paramount importance for subsurface resource estimation and exploitation. These wireline petro-physical measurements are selectively deployed as they are expensive to acquire; meanwhile, drilling information is recorded in every drilled well. Hence a predictive tool for wireline log prediction from drilling data can help management make decisions about data acquisition, especially for delineation and production wells. This problem is non-linear with strong ineractions between drilling parameters; hence the potential for deep learning to address this problem is explored. We present a workflow for data augmentation and feature engineering using Distance-based Global Sensitivity Analysis. We propose an Inception-based Convolutional Neural Network combined with a Temporal Convolutional Network as the deep learning model. The model is designed to learn both low and high frequency content of the data. 12 wells from the Equinor dataset for the Volve field in the North Sea are used for learning. The model predictions not only capture trends but are also physically consistent across density, porosity, and sonic logs. On the test data, the mean square error reaches a low value of 0.04 but the correlation coefficient plateaus around 0.6. The model is able however to differentiate between different types of rocks such as cemented sandstone, unconsolidated sands, and shale.
Tasks	Data Augmentation, Feature Engineering
Published	2020-01-28
URL	https://arxiv.org/abs/2001.10156v1
PDF	https://arxiv.org/pdf/2001.10156v1.pdf
PWC	https://paperswithcode.com/paper/real-time-well-log-prediction-from-drilling
Repo
Framework

Visualisation of Medical Image Fusion and Translation for Accurate Diagnosis of High Grade Gliomas


Title	Visualisation of Medical Image Fusion and Translation for Accurate Diagnosis of High Grade Gliomas
Authors	Nishant Kumar, Nico Hoffmann, Matthias Kirsch, Stefan Gumhold
Abstract	The medical image fusion combines two or more modalities into a single view while medical image translation synthesizes new images and assists in data augmentation. Together, these methods help in faster diagnosis of high grade malignant gliomas. However, they might be untrustworthy due to which neurosurgeons demand a robust visualisation tool to verify the reliability of the fusion and translation results before they make pre-operative surgical decisions. In this paper, we propose a novel approach to compute a confidence heat map between the source-target image pair by estimating the information transfer from the source to the target image using the joint probability distribution of the two images. We evaluate several fusion and translation methods using our visualisation procedure and showcase its robustness in enabling neurosurgeons to make finer clinical decisions.
Tasks	Data Augmentation
Published	2020-01-26
URL	https://arxiv.org/abs/2001.09535v3
PDF	https://arxiv.org/pdf/2001.09535v3.pdf
PWC	https://paperswithcode.com/paper/visualisation-of-medical-image-fusion-and
Repo
Framework

Imperfect ImaGANation: Implications of GANs Exacerbating Biases on Facial Data Augmentation and Snapchat Selfie Lenses


Title	Imperfect ImaGANation: Implications of GANs Exacerbating Biases on Facial Data Augmentation and Snapchat Selfie Lenses
Authors	Niharika Jain, Alberto Olmo, Sailik Sengupta, Lydia Manikonda, Subbarao Kambhampati
Abstract	Recently, the use of synthetic data generated by GANs has become a popular method to do data augmentation for many applications. While practitioners celebrate this as an economical way to obtain synthetic data for training data-hungry machine learning models, it is not clear that they recognize the perils of such an augmentation technique when applied to an already-biased dataset. Although one expects GANs to replicate the distribution of the original data, in real-world settings with limited data and finite network capacity, GANs suffer from mode collapse. Especially when this data is coming from online social media platforms or the web which are never balanced. In this paper, we show that in settings where data exhibits bias along some axes (eg. gender, race), failure modes of Generative Adversarial Networks (GANs) exacerbate the biases in the generated data. More often than not, this bias is unavoidable; we empirically demonstrate that given input of a dataset of headshots of engineering faculty collected from 47 online university directory webpages in the United States is biased toward white males, a state-of-the-art (unconditional variant of) GAN “imagines” faces of synthetic engineering professors that have masculine facial features and white skin color (inferred using human studies and a state-of-the-art gender recognition system). We also conduct a preliminary case study to highlight how Snapchat’s explosively popular “female” filter (widely accepted to use a conditional variant of GAN), ends up consistently lightening the skin tones in women of color when trying to make face images appear more feminine. Our study is meant to serve as a cautionary tale for the lay practitioners who may unknowingly increase the bias in their training data by using GAN-based augmentation techniques with web data and to showcase the dangers of using biased datasets for facial applications.
Tasks	Data Augmentation
Published	2020-01-26
URL	https://arxiv.org/abs/2001.09528v1
PDF	https://arxiv.org/pdf/2001.09528v1.pdf
PWC	https://paperswithcode.com/paper/imperfect-imaganation-implications-of-gans
Repo
Framework

Improving Dysarthric Speech Intelligibility Using Cycle-consistent Adversarial Training


Title	Improving Dysarthric Speech Intelligibility Using Cycle-consistent Adversarial Training
Authors	Seung Hee Yang, Minhwa Chung
Abstract	Dysarthria is a motor speech impairment affecting millions of people. Dysarthric speech can be far less intelligible than those of non-dysarthric speakers, causing significant communication difficulties. The goal of our work is to develop a model for dysarthric to healthy speech conversion using Cycle-consistent GAN. Using 18,700 dysarthric and 8,610 healthy control Korean utterances that were recorded for the purpose of automatic recognition of voice keyboard in a previous study, the generator is trained to transform dysarthric to healthy speech in the spectral domain, which is then converted back to speech. Objective evaluation using automatic speech recognition of the generated utterance on a held-out test set shows that the recognition performance is improved compared with the original dysarthic speech after performing adversarial training, as the absolute WER has been lowered by 33.4%. It demonstrates that the proposed GAN-based conversion method is useful for improving dysarthric speech intelligibility.
Tasks	Speech Recognition
Published	2020-01-10
URL	https://arxiv.org/abs/2001.04260v1
PDF	https://arxiv.org/pdf/2001.04260v1.pdf
PWC	https://paperswithcode.com/paper/improving-dysarthric-speech-intelligibility
Repo
Framework

Open Challenge for Correcting Errors of Speech Recognition Systems


Title	Open Challenge for Correcting Errors of Speech Recognition Systems
Authors	Marek Kubis, Zygmunt Vetulani, Mikołaj Wypych, Tomasz Ziętkiewicz
Abstract	The paper announces the new long-term challenge for improving the performance of automatic speech recognition systems. The goal of the challenge is to investigate methods of correcting the recognition results on the basis of previously made errors by the speech processing system. The dataset prepared for the task is described and evaluation criteria are presented.
Tasks	Speech Recognition
Published	2020-01-09
URL	https://arxiv.org/abs/2001.03041v1
PDF	https://arxiv.org/pdf/2001.03041v1.pdf
PWC	https://paperswithcode.com/paper/open-challenge-for-correcting-errors-of
Repo
Framework

Streaming automatic speech recognition with the transformer model


Title	Streaming automatic speech recognition with the transformer model
Authors	Niko Moritz, Takaaki Hori, Jonathan Le Roux
Abstract	Encoder-decoder based sequence-to-sequence models have demonstrated state-of-the-art results in end-to-end automatic speech recognition (ASR). Recently, the transformer architecture, which uses self-attention to model temporal context information, has been shown to achieve significantly lower word error rates (WERs) compared to recurrent neural network (RNN) based system architectures. Despite its success, the practical usage is limited to offline ASR tasks, since encoder-decoder architectures typically require an entire speech utterance as input. In this work, we propose a transformer based end-to-end ASR system for streaming ASR, where an output must be generated shortly after each spoken word. To achieve this, we apply time-restricted self-attention for the encoder and triggered attention for the encoder-decoder attention mechanism. Our proposed streaming transformer architecture achieves 2.8% and 7.2% WER for the “clean” and “other” test data of LibriSpeech, which to our knowledge is the best published streaming end-to-end ASR result for this task.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2020-01-08
URL	https://arxiv.org/abs/2001.02674v4
PDF	https://arxiv.org/pdf/2001.02674v4.pdf
PWC	https://paperswithcode.com/paper/streaming-automatic-speech-recognition-with
Repo
Framework

Aspect Term Extraction using Graph-based Semi-Supervised Learning


Title	Aspect Term Extraction using Graph-based Semi-Supervised Learning
Authors	Gunjan Ansari, Chandni Saxena, Tanvir Ahmad, M. N. Doja
Abstract	Aspect based Sentiment Analysis is a major subarea of sentiment analysis. Many supervised and unsupervised approaches have been proposed in the past for detecting and analyzing the sentiment of aspect terms. In this paper, a graph-based semi-supervised learning approach for aspect term extraction is proposed. In this approach, every identified token in the review document is classified as aspect or non-aspect term from a small set of labeled tokens using label spreading algorithm. The k-Nearest Neighbor (kNN) for graph sparsification is employed in the proposed approach to make it more time and memory efficient. The proposed work is further extended to determine the polarity of the opinion words associated with the identified aspect terms in review sentence to generate visual aspect-based summary of review documents. The experimental study is conducted on benchmark and crawled datasets of restaurant and laptop domains with varying value of labeled instances. The results depict that the proposed approach could achieve good result in terms of Precision, Recall and Accuracy with limited availability of labeled data.
Tasks	Aspect-Based Sentiment Analysis, Sentiment Analysis
Published	2020-02-20
URL	https://arxiv.org/abs/2003.04968v1
PDF	https://arxiv.org/pdf/2003.04968v1.pdf
PWC	https://paperswithcode.com/paper/aspect-term-extraction-using-graph-based-semi
Repo
Framework