Paper Group ANR 522
Robust Multimodal Image Registration Using Deep Recurrent Reinforcement Learning. Multi-Person Pose Estimation with Enhanced Feature Aggregation and Selection. AnimePose: Multi-person 3D pose estimation and animation. Schema-Guided Dialogue State Tracking Task at DSTC8. Data Augmentation for Histopathological Images Based on Gaussian-Laplacian Pyra …
Robust Multimodal Image Registration Using Deep Recurrent Reinforcement Learning
Title | Robust Multimodal Image Registration Using Deep Recurrent Reinforcement Learning |
Authors | Shanhui Sun, Jing Hu, Mingqing Yao, Jinrong Hu, Xiaodong Yang, Qi Song, Xi Wu |
Abstract | The crucial components of a conventional image registration method are the choice of the right feature representations and similarity measures. These two components, although elaborately designed, are somewhat handcrafted using human knowledge. To this end, these two components are tackled in an end-to-end manner via reinforcement learning in this work. Specifically, an artificial agent, which is composed of a combined policy and value network, is trained to adjust the moving image toward the right direction. We train this network using an asynchronous reinforcement learning algorithm, where a customized reward function is also leveraged to encourage robust image registration. This trained network is further incorporated with a lookahead inference to improve the registration capability. The advantage of this algorithm is fully demonstrated by our superior performance on clinical MR and CT image pairs to other state-of-the-art medical image registration methods. |
Tasks | Image Registration, Medical Image Registration |
Published | 2020-01-29 |
URL | https://arxiv.org/abs/2002.03733v1 |
https://arxiv.org/pdf/2002.03733v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-multimodal-image-registration-using |
Repo | |
Framework | |
Multi-Person Pose Estimation with Enhanced Feature Aggregation and Selection
Title | Multi-Person Pose Estimation with Enhanced Feature Aggregation and Selection |
Authors | Xixia Xu, Qi Zou, Xue Lin |
Abstract | We propose a novel Enhanced Feature Aggregation and Selection network (EFASNet) for multi-person 2D human pose estimation. Due to enhanced feature representation, our method can well handle crowded, cluttered and occluded scenes. More specifically, a Feature Aggregation and Selection Module (FASM), which constructs hierarchical multi-scale feature aggregation and makes the aggregated features discriminative, is proposed to get more accurate fine-grained representation, leading to more precise joint locations. Then, we perform a simple Feature Fusion (FF) strategy which effectively fuses high-resolution spatial features and low-resolution semantic features to obtain more reliable context information for well-estimated joints. Finally, we build a Dense Upsampling Convolution (DUC) module to generate more precise prediction, which can recover missing joint details that are usually unavailable in common upsampling process. As a result, the predicted keypoint heatmaps are more accurate. Comprehensive experiments demonstrate that the proposed approach outperforms the state-of-the-art methods and achieves the superior performance over three benchmark datasets: the recent big dataset CrowdPose, the COCO keypoint detection dataset and the MPII Human Pose dataset. Our code will be released upon acceptance. |
Tasks | Keypoint Detection, Multi-Person Pose Estimation, Pose Estimation |
Published | 2020-03-20 |
URL | https://arxiv.org/abs/2003.10238v1 |
https://arxiv.org/pdf/2003.10238v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-person-pose-estimation-with-enhanced-1 |
Repo | |
Framework | |
AnimePose: Multi-person 3D pose estimation and animation
Title | AnimePose: Multi-person 3D pose estimation and animation |
Authors | Laxman Kumarapu, Prerana Mukherjee |
Abstract | 3D animation of humans in action is quite challenging as it involves using a huge setup with several motion trackers all over the person’s body to track the movements of every limb. This is time-consuming and may cause the person discomfort in wearing exoskeleton body suits with motion sensors. In this work, we present a trivial yet effective solution to generate 3D animation of multiple persons from a 2D video using deep learning. Although significant improvement has been achieved recently in 3D human pose estimation, most of the prior works work well in case of single person pose estimation and multi-person pose estimation is still a challenging problem. In this work, we firstly propose a supervised multi-person 3D pose estimation and animation framework namely AnimePose for a given input RGB video sequence. The pipeline of the proposed system consists of various modules: i) Person detection and segmentation, ii) Depth Map estimation, iii) Lifting 2D to 3D information for person localization iv) Person trajectory prediction and human pose tracking. Our proposed system produces comparable results on previous state-of-the-art 3D multi-person pose estimation methods on publicly available datasets MuCo-3DHP and MuPoTS-3D datasets and it also outperforms previous state-of-the-art human pose tracking methods by a significant margin of 11.7% performance gain on MOTA score on Posetrack 2018 dataset. |
Tasks | 3D Human Pose Estimation, 3D Multi-person Pose Estimation, 3D Pose Estimation, Human Detection, Multi-Person Pose Estimation, Pose Estimation, Pose Tracking, Trajectory Prediction |
Published | 2020-02-06 |
URL | https://arxiv.org/abs/2002.02792v1 |
https://arxiv.org/pdf/2002.02792v1.pdf | |
PWC | https://paperswithcode.com/paper/animepose-multi-person-3d-pose-estimation-and |
Repo | |
Framework | |
Schema-Guided Dialogue State Tracking Task at DSTC8
Title | Schema-Guided Dialogue State Tracking Task at DSTC8 |
Authors | Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, Pranav Khaitan |
Abstract | This paper gives an overview of the Schema-Guided Dialogue State Tracking task of the 8th Dialogue System Technology Challenge. The goal of this task is to develop dialogue state tracking models suitable for large-scale virtual assistants, with a focus on data-efficient joint modeling across domains and zero-shot generalization to new APIs. This task provided a new dataset consisting of over 16000 dialogues in the training set spanning 16 domains to highlight these challenges, and a baseline model capable of zero-shot generalization to new APIs. Twenty-five teams participated, developing a range of neural network models, exceeding the performance of the baseline model by a very high margin. The submissions incorporated a variety of pre-trained encoders and data augmentation techniques. This paper describes the task definition, dataset and evaluation methodology. We also summarize the approach and results of the submitted systems to highlight the overall trends in the state-of-the-art. |
Tasks | Data Augmentation, Dialogue State Tracking |
Published | 2020-02-02 |
URL | https://arxiv.org/abs/2002.01359v1 |
https://arxiv.org/pdf/2002.01359v1.pdf | |
PWC | https://paperswithcode.com/paper/schema-guided-dialogue-state-tracking-task-at |
Repo | |
Framework | |
Data Augmentation for Histopathological Images Based on Gaussian-Laplacian Pyramid Blending
Title | Data Augmentation for Histopathological Images Based on Gaussian-Laplacian Pyramid Blending |
Authors | Steve Tsham Mpinda Ataky, Jonathan de Matos, Alceu de S. Britto Jr., Luiz E. S. Oliveira, Alessandro L. Koerich |
Abstract | Data imbalance is a major problem that affects several machine learning algorithms. Such problems are troublesome because most of the learning algorithms attempts to optimize a loss function based on error measures that do not take into account the data imbalance. Accordingly, the learning algorithm simply generates a trivial model that is biased toward predicting the most frequent class in the training data. Data augmentation techniques have been used to mitigate the data imbalance problem. However, in the case of histopathologic images (HIs), low-level as well as high-level data augmentation techniques still present performance issues when applied in the presence of inter-patient variability; whence the model tends to learn color representations, which are in fact related to the stain process. In this paper, we propose an approach capable of not only augmenting HIs database but also distributing the inter-patient variability by means of image blending using Gaussian-Laplacian pyramid. The proposed approach consists in finding the Gaussian pyramids of two images of different patients and finding the Laplacian pyramids thereof. Afterwards, the left half of one image and the right half of another are joined in each level of Laplacian pyramid, and from the joint pyramids, the original image is reconstructed. This composition, resulting from the blending process, combines stain variation of two patients, avoiding that color misleads the learning process. Experimental results on the BreakHis dataset have shown promising gains vis-`a-vis the majority of traditional techniques presented in the literature. |
Tasks | Data Augmentation |
Published | 2020-01-31 |
URL | https://arxiv.org/abs/2002.00072v1 |
https://arxiv.org/pdf/2002.00072v1.pdf | |
PWC | https://paperswithcode.com/paper/data-augmentation-for-histopathological |
Repo | |
Framework | |
Real-Time Well Log Prediction From Drilling Data Using Deep Learning
Title | Real-Time Well Log Prediction From Drilling Data Using Deep Learning |
Authors | Rayan Kanfar, Obai Shaikh, Mehrdad Yousefzadeh, Tapan Mukerji |
Abstract | The objective is to study the feasibility of predicting subsurface rock properties in wells from real-time drilling data. Geophysical logs, namely, density, porosity and sonic logs are of paramount importance for subsurface resource estimation and exploitation. These wireline petro-physical measurements are selectively deployed as they are expensive to acquire; meanwhile, drilling information is recorded in every drilled well. Hence a predictive tool for wireline log prediction from drilling data can help management make decisions about data acquisition, especially for delineation and production wells. This problem is non-linear with strong ineractions between drilling parameters; hence the potential for deep learning to address this problem is explored. We present a workflow for data augmentation and feature engineering using Distance-based Global Sensitivity Analysis. We propose an Inception-based Convolutional Neural Network combined with a Temporal Convolutional Network as the deep learning model. The model is designed to learn both low and high frequency content of the data. 12 wells from the Equinor dataset for the Volve field in the North Sea are used for learning. The model predictions not only capture trends but are also physically consistent across density, porosity, and sonic logs. On the test data, the mean square error reaches a low value of 0.04 but the correlation coefficient plateaus around 0.6. The model is able however to differentiate between different types of rocks such as cemented sandstone, unconsolidated sands, and shale. |
Tasks | Data Augmentation, Feature Engineering |
Published | 2020-01-28 |
URL | https://arxiv.org/abs/2001.10156v1 |
https://arxiv.org/pdf/2001.10156v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-well-log-prediction-from-drilling |
Repo | |
Framework | |
Visualisation of Medical Image Fusion and Translation for Accurate Diagnosis of High Grade Gliomas
Title | Visualisation of Medical Image Fusion and Translation for Accurate Diagnosis of High Grade Gliomas |
Authors | Nishant Kumar, Nico Hoffmann, Matthias Kirsch, Stefan Gumhold |
Abstract | The medical image fusion combines two or more modalities into a single view while medical image translation synthesizes new images and assists in data augmentation. Together, these methods help in faster diagnosis of high grade malignant gliomas. However, they might be untrustworthy due to which neurosurgeons demand a robust visualisation tool to verify the reliability of the fusion and translation results before they make pre-operative surgical decisions. In this paper, we propose a novel approach to compute a confidence heat map between the source-target image pair by estimating the information transfer from the source to the target image using the joint probability distribution of the two images. We evaluate several fusion and translation methods using our visualisation procedure and showcase its robustness in enabling neurosurgeons to make finer clinical decisions. |
Tasks | Data Augmentation |
Published | 2020-01-26 |
URL | https://arxiv.org/abs/2001.09535v3 |
https://arxiv.org/pdf/2001.09535v3.pdf | |
PWC | https://paperswithcode.com/paper/visualisation-of-medical-image-fusion-and |
Repo | |
Framework | |
Imperfect ImaGANation: Implications of GANs Exacerbating Biases on Facial Data Augmentation and Snapchat Selfie Lenses
Title | Imperfect ImaGANation: Implications of GANs Exacerbating Biases on Facial Data Augmentation and Snapchat Selfie Lenses |
Authors | Niharika Jain, Alberto Olmo, Sailik Sengupta, Lydia Manikonda, Subbarao Kambhampati |
Abstract | Recently, the use of synthetic data generated by GANs has become a popular method to do data augmentation for many applications. While practitioners celebrate this as an economical way to obtain synthetic data for training data-hungry machine learning models, it is not clear that they recognize the perils of such an augmentation technique when applied to an already-biased dataset. Although one expects GANs to replicate the distribution of the original data, in real-world settings with limited data and finite network capacity, GANs suffer from mode collapse. Especially when this data is coming from online social media platforms or the web which are never balanced. In this paper, we show that in settings where data exhibits bias along some axes (eg. gender, race), failure modes of Generative Adversarial Networks (GANs) exacerbate the biases in the generated data. More often than not, this bias is unavoidable; we empirically demonstrate that given input of a dataset of headshots of engineering faculty collected from 47 online university directory webpages in the United States is biased toward white males, a state-of-the-art (unconditional variant of) GAN “imagines” faces of synthetic engineering professors that have masculine facial features and white skin color (inferred using human studies and a state-of-the-art gender recognition system). We also conduct a preliminary case study to highlight how Snapchat’s explosively popular “female” filter (widely accepted to use a conditional variant of GAN), ends up consistently lightening the skin tones in women of color when trying to make face images appear more feminine. Our study is meant to serve as a cautionary tale for the lay practitioners who may unknowingly increase the bias in their training data by using GAN-based augmentation techniques with web data and to showcase the dangers of using biased datasets for facial applications. |
Tasks | Data Augmentation |
Published | 2020-01-26 |
URL | https://arxiv.org/abs/2001.09528v1 |
https://arxiv.org/pdf/2001.09528v1.pdf | |
PWC | https://paperswithcode.com/paper/imperfect-imaganation-implications-of-gans |
Repo | |
Framework | |
Improving Dysarthric Speech Intelligibility Using Cycle-consistent Adversarial Training
Title | Improving Dysarthric Speech Intelligibility Using Cycle-consistent Adversarial Training |
Authors | Seung Hee Yang, Minhwa Chung |
Abstract | Dysarthria is a motor speech impairment affecting millions of people. Dysarthric speech can be far less intelligible than those of non-dysarthric speakers, causing significant communication difficulties. The goal of our work is to develop a model for dysarthric to healthy speech conversion using Cycle-consistent GAN. Using 18,700 dysarthric and 8,610 healthy control Korean utterances that were recorded for the purpose of automatic recognition of voice keyboard in a previous study, the generator is trained to transform dysarthric to healthy speech in the spectral domain, which is then converted back to speech. Objective evaluation using automatic speech recognition of the generated utterance on a held-out test set shows that the recognition performance is improved compared with the original dysarthic speech after performing adversarial training, as the absolute WER has been lowered by 33.4%. It demonstrates that the proposed GAN-based conversion method is useful for improving dysarthric speech intelligibility. |
Tasks | Speech Recognition |
Published | 2020-01-10 |
URL | https://arxiv.org/abs/2001.04260v1 |
https://arxiv.org/pdf/2001.04260v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-dysarthric-speech-intelligibility |
Repo | |
Framework | |
Open Challenge for Correcting Errors of Speech Recognition Systems
Title | Open Challenge for Correcting Errors of Speech Recognition Systems |
Authors | Marek Kubis, Zygmunt Vetulani, Mikołaj Wypych, Tomasz Ziętkiewicz |
Abstract | The paper announces the new long-term challenge for improving the performance of automatic speech recognition systems. The goal of the challenge is to investigate methods of correcting the recognition results on the basis of previously made errors by the speech processing system. The dataset prepared for the task is described and evaluation criteria are presented. |
Tasks | Speech Recognition |
Published | 2020-01-09 |
URL | https://arxiv.org/abs/2001.03041v1 |
https://arxiv.org/pdf/2001.03041v1.pdf | |
PWC | https://paperswithcode.com/paper/open-challenge-for-correcting-errors-of |
Repo | |
Framework | |
Streaming automatic speech recognition with the transformer model
Title | Streaming automatic speech recognition with the transformer model |
Authors | Niko Moritz, Takaaki Hori, Jonathan Le Roux |
Abstract | Encoder-decoder based sequence-to-sequence models have demonstrated state-of-the-art results in end-to-end automatic speech recognition (ASR). Recently, the transformer architecture, which uses self-attention to model temporal context information, has been shown to achieve significantly lower word error rates (WERs) compared to recurrent neural network (RNN) based system architectures. Despite its success, the practical usage is limited to offline ASR tasks, since encoder-decoder architectures typically require an entire speech utterance as input. In this work, we propose a transformer based end-to-end ASR system for streaming ASR, where an output must be generated shortly after each spoken word. To achieve this, we apply time-restricted self-attention for the encoder and triggered attention for the encoder-decoder attention mechanism. Our proposed streaming transformer architecture achieves 2.8% and 7.2% WER for the “clean” and “other” test data of LibriSpeech, which to our knowledge is the best published streaming end-to-end ASR result for this task. |
Tasks | End-To-End Speech Recognition, Speech Recognition |
Published | 2020-01-08 |
URL | https://arxiv.org/abs/2001.02674v4 |
https://arxiv.org/pdf/2001.02674v4.pdf | |
PWC | https://paperswithcode.com/paper/streaming-automatic-speech-recognition-with |
Repo | |
Framework | |
Aspect Term Extraction using Graph-based Semi-Supervised Learning
Title | Aspect Term Extraction using Graph-based Semi-Supervised Learning |
Authors | Gunjan Ansari, Chandni Saxena, Tanvir Ahmad, M. N. Doja |
Abstract | Aspect based Sentiment Analysis is a major subarea of sentiment analysis. Many supervised and unsupervised approaches have been proposed in the past for detecting and analyzing the sentiment of aspect terms. In this paper, a graph-based semi-supervised learning approach for aspect term extraction is proposed. In this approach, every identified token in the review document is classified as aspect or non-aspect term from a small set of labeled tokens using label spreading algorithm. The k-Nearest Neighbor (kNN) for graph sparsification is employed in the proposed approach to make it more time and memory efficient. The proposed work is further extended to determine the polarity of the opinion words associated with the identified aspect terms in review sentence to generate visual aspect-based summary of review documents. The experimental study is conducted on benchmark and crawled datasets of restaurant and laptop domains with varying value of labeled instances. The results depict that the proposed approach could achieve good result in terms of Precision, Recall and Accuracy with limited availability of labeled data. |
Tasks | Aspect-Based Sentiment Analysis, Sentiment Analysis |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2003.04968v1 |
https://arxiv.org/pdf/2003.04968v1.pdf | |
PWC | https://paperswithcode.com/paper/aspect-term-extraction-using-graph-based-semi |
Repo | |
Framework | |