January 29, 2020

3115 words 15 mins read

Paper Group ANR 632

End-to-End Trainable Non-Collaborative Dialog System. Soccer Team Vectors. Adversarial Pixel-Level Generation of Semantic Images. Identifying Pediatric Vascular Anomalies With Deep Learning. Unbiased Recommender Learning from Missing-Not-At-Random Implicit Feedback. Multi-scale Octave Convolutions for Robust Speech Recognition. Why do These Match? …

End-to-End Trainable Non-Collaborative Dialog System


Title	End-to-End Trainable Non-Collaborative Dialog System
Authors	Yu Li, Kun Qian, Weiyan Shi, Zhou Yu
Abstract	End-to-end task-oriented dialog models have achieved promising performance on collaborative tasks where users willingly coordinate with the system to complete a given task. While in non-collaborative settings, for example, negotiation and persuasion, users and systems do not share a common goal. As a result, compared to collaborate tasks, people use social content to build rapport and trust in these non-collaborative settings in order to advance their goals. To handle social content, we introduce a hierarchical intent annotation scheme, which can be generalized to different non-collaborative dialog tasks. Building upon TransferTransfo (Wolf et al. 2019), we propose an end-to-end neural network model to generate diverse coherent responses. Our model utilizes intent and semantic slots as the intermediate sentence representation to guide the generation process. In addition, we design a filter to select appropriate responses based on whether these intermediate representations fit the designed task and conversation constraints. Our non-collaborative dialog model guides users to complete the task while simultaneously keeps them engaged. We test our approach on our newly proposed ANTISCAM dataset and an existing PERSUASIONFORGOOD dataset. Both automatic and human evaluations suggest that our model outperforms multiple baselines in these two non-collaborative tasks.
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10742v1
PDF	https://arxiv.org/pdf/1911.10742v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-trainable-non-collaborative-dialog
Repo
Framework

Soccer Team Vectors


Title	Soccer Team Vectors
Authors	Robert Müller, Stefan Langer, Fabian Ritz, Christoph Roch, Steffen Illium, Claudia Linnhoff-Popien
Abstract	In this work we present STEVE - Soccer TEam VEctors, a principled approach for learning real valued vectors for soccer teams where similar teams are close to each other in the resulting vector space. STEVE only relies on freely available information about the matches teams played in the past. These vectors can serve as input to various machine learning tasks. Evaluating on the task of team market value estimation, STEVE outperforms all its competitors. Moreover, we use STEVE for similarity search and to rank soccer teams.
Tasks
Published	2019-07-30
URL	https://arxiv.org/abs/1908.00698v2
PDF	https://arxiv.org/pdf/1908.00698v2.pdf
PWC	https://paperswithcode.com/paper/soccer-team-vectors
Repo
Framework

Adversarial Pixel-Level Generation of Semantic Images


Title	Adversarial Pixel-Level Generation of Semantic Images
Authors	Emanuele Ghelfi, Paolo Galeone, Michele De Simoni, Federico Di Mattia
Abstract	Generative Adversarial Networks (GANs) have obtained extraordinary success in the generation of realistic images, a domain where a lower pixel-level accuracy is acceptable. We study the problem, not yet tackled in the literature, of generating semantic images starting from a prior distribution. Intuitively this problem can be approached using standard methods and architectures. However, a better-suited approach is needed to avoid generating blurry, hallucinated and thus unusable images since tasks like semantic segmentation require pixel-level exactness. In this work, we present a novel architecture for learning to generate pixel-level accurate semantic images, namely Semantic Generative Adversarial Networks (SemGANs). The experimental evaluation shows that our architecture outperforms standard ones from both a quantitative and a qualitative point of view in many semantic image generation tasks.
Tasks	Image Generation, Semantic Segmentation
Published	2019-06-27
URL	https://arxiv.org/abs/1906.12195v1
PDF	https://arxiv.org/pdf/1906.12195v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-pixel-level-generation-of
Repo
Framework

Identifying Pediatric Vascular Anomalies With Deep Learning


Title	Identifying Pediatric Vascular Anomalies With Deep Learning
Authors	Justin Chan, Sharat Raju, Randall Bly, Jonathan A. Perkins, Shyamnath Gollakota
Abstract	Vascular anomalies, more colloquially known as birthmarks, affect up to 1 in 10 infants. Though many of these lesions self-resolve, some types can result in medical complications or disfigurement without proper diagnosis or management. Accurately diagnosing vascular anomalies is challenging for pediatricians and primary care physicians due to subtle visual differences and similarity to other pediatric dermatologic conditions. This can result in delayed or incorrect referrals for treatment. To address this problem, we developed a convolutional neural network (CNN) to automatically classify images of vascular anomalies and other pediatric skin conditions to aid physicians with diagnosis. We constructed a dataset of 21,681 clinical images, including data collected between 2002-2018 at Seattle Children’s hospital as well as five dermatologist-curated online repositories, and built a taxonomy over vascular anomalies and other common pediatric skin lesions. The CNN achieved an average AUC of 0.9731 when ten-fold cross-validation was performed across a taxonomy of 12 classes. The classifier’s average AUC and weighted F1 score was 0.9889 and 0.9732 respectively when evaluated on a previously unseen test set of six of these classes. Further, when used as an aid by pediatricians (n = 7), the classifier increased their average visual diagnostic accuracy from 73.10% to 91.67%. The classifier runs in real-time on a smartphone and has the potential to improve diagnosis of these conditions, particularly in resource-limited areas.
Tasks
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07046v1
PDF	https://arxiv.org/pdf/1909.07046v1.pdf
PWC	https://paperswithcode.com/paper/identifying-pediatric-vascular-anomalies-with
Repo
Framework

Unbiased Recommender Learning from Missing-Not-At-Random Implicit Feedback


Title	Unbiased Recommender Learning from Missing-Not-At-Random Implicit Feedback
Authors	Yuta Saito, Suguru Yaginuma, Yuta Nishino, Hayato Sakata, Kazuhide Nakata
Abstract	Recommender systems widely use implicit feedback such as click data because of its general availability. Although the presence of clicks signals the users’ preference to some extent, the lack of such clicks does not necessarily indicate a negative response from the users, as it is possible that the users were not exposed to the items (positive-unlabeled problem). This leads to a difficulty in predicting the users’ preferences from implicit feedback. Previous studies addressed the positive-unlabeled problem by uniformly upweighting the loss for the positive feedback data or estimating the confidence of each data having relevance information via the EM-algorithm. However, these methods failed to address the missing-not-at-random problem in which popular or frequently recommended items are more likely to be clicked than other items even if a user does not have a considerable interest in them. To overcome these limitations, we first define an ideal loss function to be optimized to realize recommendations that maximize the relevance and propose an unbiased estimator for the ideal loss. Subsequently, we analyze the variance of the proposed unbiased estimator and further propose a clipped estimator that includes the unbiased estimator as a special case. We demonstrate that the clipped estimator is expected to improve the performance of the recommender system, by considering the bias-variance trade-off. We conduct semi-synthetic and real-world experiments and demonstrate that the proposed method largely outperforms the baselines. In particular, the proposed method works better for rare items that are less frequently observed in the training data. The findings indicate that the proposed method can better achieve the objective of recommending items with the highest relevance.
Tasks	Causal Inference, Recommendation Systems
Published	2019-09-09
URL	https://arxiv.org/abs/1909.03601v3
PDF	https://arxiv.org/pdf/1909.03601v3.pdf
PWC	https://paperswithcode.com/paper/relevance-matrix-factorization
Repo
Framework

Multi-scale Octave Convolutions for Robust Speech Recognition


Title	Multi-scale Octave Convolutions for Robust Speech Recognition
Authors	Joanna Rownicka, Peter Bell, Steve Renals
Abstract	We propose a multi-scale octave convolution layer to learn robust speech representations efficiently. Octave convolutions were introduced by Chen et al [1] in the computer vision field to reduce the spatial redundancy of the feature maps by decomposing the output of a convolutional layer into feature maps at two different spatial resolutions, one octave apart. This approach improved the efficiency as well as the accuracy of the CNN models. The accuracy gain was attributed to the enlargement of the receptive field in the original input space. We argue that octave convolutions likewise improve the robustness of learned representations due to the use of average pooling in the lower resolution group, acting as a low-pass filter. We test this hypothesis by evaluating on two noisy speech corpora - Aurora-4 and AMI. We extend the octave convolution concept to multiple resolution groups and multiple octaves. To evaluate the robustness of the inferred representations, we report the similarity between clean and noisy encodings using an affine projection loss as a proxy robustness measure. The results show that proposed method reduces the WER by up to 6.6% relative for Aurora-4 and 3.6% for AMI, while improving the computational efficiency of the CNN acoustic models.
Tasks	Robust Speech Recognition, Speech Recognition
Published	2019-10-31
URL	https://arxiv.org/abs/1910.14443v1
PDF	https://arxiv.org/pdf/1910.14443v1.pdf
PWC	https://paperswithcode.com/paper/multi-scale-octave-convolutions-for-robust
Repo
Framework

Why do These Match? Explaining the Behavior of Image Similarity Models


Title	Why do These Match? Explaining the Behavior of Image Similarity Models
Authors	Bryan A. Plummer, Mariya I. Vasileva, Vitali Petsiuk, Kate Saenko, David Forsyth
Abstract	Explaining a deep learning model can help users understand its behavior and allow researchers to discern its shortcomings. Recent work has primarily focused on explaining models for tasks like image classification or visual question answering. In this paper, we introduce an explanation approach for image similarity models, where a model’s output is a semantic feature representation rather than a classification. In this task, an explanation depends on both of the input images, so standard methods do not apply. We propose an explanation method that pairs a saliency map identifying important image regions with an attribute that best explains the match. We find that our explanations are more human-interpretable than saliency maps alone, and can also improve performance on the classic task of attribute recognition. The ability of our approach to generalize is demonstrated on two datasets from very different domains, Polyvore Outfits and Animals with Attributes 2.
Tasks	Image Classification, Question Answering, Visual Question Answering
Published	2019-05-26
URL	https://arxiv.org/abs/1905.10797v1
PDF	https://arxiv.org/pdf/1905.10797v1.pdf
PWC	https://paperswithcode.com/paper/why-do-these-match-explaining-the-behavior-of
Repo
Framework

RSL19BD at DBDC4: Ensemble of Decision Tree-based and LSTM-based Models


Title	RSL19BD at DBDC4: Ensemble of Decision Tree-based and LSTM-based Models
Authors	Chih-Hao Wang, Sosuke Kato, Tetsuya Sakai
Abstract	RSL19BD (Waseda University Sakai Laboratory) participated in the Fourth Dialogue Breakdown Detection Challenge (DBDC4) and submitted five runs to both English and Japanese subtasks. In these runs, we utilise the Decision Tree-based model and the Long Short-Term Memory-based (LSTM-based) model following the approaches of RSL17BD and KTH in the Third Dialogue Breakdown Detection Challenge (DBDC3) respectively. The Decision Tree-based model follows the approach of RSL17BD but utilises RandomForestRegressor instead of ExtraTreesRegressor. In addition, instead of predicting the mean and the variance of the probability distribution of the three breakdown labels, it predicts the probability of each label directly. The LSTM-based model follows the approach of KTH with some changes in the architecture and utilises Convolutional Neural Network (CNN) to perform text feature extraction. In addition, instead of targeting the single breakdown label and minimising the categorical cross entropy loss, it targets the probability distribution of the three breakdown labels and minimises the mean squared error. Run 1 utilises a Decision Tree-based model; Run 2 utilises an LSTM-based model; Run 3 performs an ensemble of 5 LSTM-based models; Run 4 performs an ensemble of Run 1 and Run 2; Run 5 performs an ensemble of Run 1 and Run 3. Run 5 statistically significantly outperformed all other runs in terms of MSE (NB, PB, B) for the English data and all other runs except Run 4 in terms of MSE (NB, PB, B) for the Japanese data (alpha level = 0.05).
Tasks
Published	2019-05-06
URL	https://arxiv.org/abs/1905.01799v3
PDF	https://arxiv.org/pdf/1905.01799v3.pdf
PWC	https://paperswithcode.com/paper/rsl19bd-at-dbdc4-ensemble-of-decision-tree
Repo
Framework

Deep Learning in Medical Image Registration: A Review


Title	Deep Learning in Medical Image Registration: A Review
Authors	Yabo Fu, Yang Lei, Tonghe Wang, Walter J. Curran, Tian Liu, Xiaofeng Yang
Abstract	This paper presents a review of deep learning (DL) based medical image registration methods. We summarized the latest developments and applications of DL-based registration methods in the medical field. These methods were classified into seven categories according to their methods, functions and popularity. A detailed review of each category was presented, highlighting important contributions and identifying specific challenges. A short assessment was presented following the detailed review of each category to summarize its achievements and future potentials. We provided a comprehensive comparison among DL-based methods for lung and brain deformable registration using benchmark datasets. Lastly, we analyzed the statistics of all the cited works from various aspects, revealing the popularity and future trend of development in medical image registration using deep learning.
Tasks	Image Registration, Medical Image Registration
Published	2019-12-27
URL	https://arxiv.org/abs/1912.12318v1
PDF	https://arxiv.org/pdf/1912.12318v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-in-medical-image-registration-a-1
Repo
Framework

Domain Expansion in DNN-based Acoustic Models for Robust Speech Recognition


Title	Domain Expansion in DNN-based Acoustic Models for Robust Speech Recognition
Authors	Shahram Ghorbani, Soheil Khorram, John H. L. Hansen
Abstract	Training acoustic models with sequentially incoming data – while both leveraging new data and avoiding the forgetting effect– is an essential obstacle to achieving human intelligence level in speech recognition. An obvious approach to leverage data from a new domain (e.g., new accented speech) is to first generate a comprehensive dataset of all domains, by combining all available data, and then use this dataset to retrain the acoustic models. However, as the amount of training data grows, storing and retraining on such a large-scale dataset becomes practically impossible. To deal with this problem, in this study, we study several domain expansion techniques which exploit only the data of the new domain to build a stronger model for all domains. These techniques are aimed at learning the new domain with a minimal forgetting effect (i.e., they maintain original model performance). These techniques modify the adaptation procedure by imposing new constraints including (1) weight constraint adaptation (WCA): keeping the model parameters close to the original model parameters; (2) elastic weight consolidation (EWC): slowing down training for parameters that are important for previously established domains; (3) soft KL-divergence (SKLD): restricting the KL-divergence between the original and the adapted model output distributions; and (4) hybrid SKLD-EWC: incorporating both SKLD and EWC constraints. We evaluate these techniques in an accent adaptation task in which we adapt a deep neural network (DNN) acoustic model trained with native English to three different English accents: Australian, Hispanic, and Indian. The experimental results show that SKLD significantly outperforms EWC, and EWC works better than WCA. The hybrid SKLD-EWC technique results in the best overall performance.
Tasks	Robust Speech Recognition, Speech Recognition
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00565v1
PDF	https://arxiv.org/pdf/1910.00565v1.pdf
PWC	https://paperswithcode.com/paper/domain-expansion-in-dnn-based-acoustic-models
Repo
Framework

Hyperspectral Data Augmentation


Title	Hyperspectral Data Augmentation
Authors	Jakub Nalepa, Michal Myller, Michal Kawulok
Abstract	Data augmentation is a popular technique which helps improve generalization capabilities of deep neural networks. It plays a pivotal role in remote-sensing scenarios in which the amount of high-quality ground truth data is limited, and acquiring new examples is costly or impossible. This is a common problem in hyperspectral imaging, where manual annotation of image data is difficult, expensive, and prone to human bias. In this letter, we propose online data augmentation of hyperspectral data which is executed during the inference rather than before the training of deep networks. This is in contrast to all other state-of-the-art hyperspectral augmentation algorithms which increase the size (and representativeness) of training sets. Additionally, we introduce a new principal component analysis based augmentation. The experiments revealed that our data augmentation algorithms improve generalization of deep networks, work in real-time, and the online approach can be effectively combined with offline techniques to enhance the classification accuracy.
Tasks	Data Augmentation
Published	2019-03-13
URL	http://arxiv.org/abs/1903.05580v1
PDF	http://arxiv.org/pdf/1903.05580v1.pdf
PWC	https://paperswithcode.com/paper/hyperspectral-data-augmentation
Repo
Framework

Cumulative Adaptation for BLSTM Acoustic Models


Title	Cumulative Adaptation for BLSTM Acoustic Models
Authors	Markus Kitza, Pavel Golik, Ralf Schlüter, Hermann Ney
Abstract	This paper addresses the robust speech recognition problem as an adaptation task. Specifically, we investigate the cumulative application of adaptation methods. A bidirectional Long Short-Term Memory (BLSTM) based neural network, capable of learning temporal relationships and translation invariant representations, is used for robust acoustic modelling. Further, i-vectors were used as an input to the neural network to perform instantaneous speaker and environment adaptation, providing 8% relative improvement in word error rate on the NIST Hub5 2000 evaluation test set. By enhancing the first-pass i-vector based adaptation with a second-pass adaptation using speaker and environment dependent transformations within the network, a further relative improvement of 5% in word error rate was achieved. We have reevaluated the features used to estimate i-vectors and their normalization to achieve the best performance in a modern large scale automatic speech recognition system.
Tasks	Acoustic Modelling, Robust Speech Recognition, Speech Recognition
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06207v1
PDF	https://arxiv.org/pdf/1906.06207v1.pdf
PWC	https://paperswithcode.com/paper/cumulative-adaptation-for-blstm-acoustic
Repo
Framework

DeepEthnic: Multi-Label Ethnic Classification from Face Images


Title	DeepEthnic: Multi-Label Ethnic Classification from Face Images
Authors	Katia Huri, Eli David, Nathan S. Netanyahu
Abstract	Ethnic group classification is a well-researched problem, which has been pursued mainly during the past two decades via traditional approaches of image processing and machine learning. In this paper, we propose a method of classifying an image face into an ethnic group by applying transfer learning from a previously trained classification network for large-scale data recognition. Our proposed method yields state-of-the-art success rates of 99.02%, 99.76%, 99.2%, and 96.7%, respectively, for the four ethnic groups: African, Asian, Caucasian, and Indian.
Tasks	Transfer Learning
Published	2019-12-06
URL	https://arxiv.org/abs/1912.02983v1
PDF	https://arxiv.org/pdf/1912.02983v1.pdf
PWC	https://paperswithcode.com/paper/deepethnic-multi-label-ethnic-classification
Repo
Framework

Contextual Text Denoising with Masked Language Models


Title	Contextual Text Denoising with Masked Language Models
Authors	Yifu Sun, Haoming Jiang
Abstract	Recently, with the help of deep learning models, significant advances have been made in different Natural Language Processing (NLP) tasks. Unfortunately, state-of-the-art models are vulnerable to noisy texts. We propose a new contextual text denoising algorithm based on the ready-to-use masked language model. The proposed algorithm does not require retraining of the model and can be integrated into any NLP system without additional training on paired cleaning training data. We evaluate our method under synthetic noise and natural noise and show that the proposed algorithm can use context information to correct noise text and improve the performance of noisy inputs in several downstream tasks.
Tasks	Denoising, Language Modelling
Published	2019-10-30
URL	https://arxiv.org/abs/1910.14080v1
PDF	https://arxiv.org/pdf/1910.14080v1.pdf
PWC	https://paperswithcode.com/paper/contextual-text-denoising-with-masked
Repo
Framework

On approximating $\nabla f$ with neural networks


Title	On approximating $\nabla f$ with neural networks
Authors	Saeed Saremi
Abstract	Consider a feedforward neural network $\psi: \mathbb{R}^d\rightarrow \mathbb{R}^d$ such that $\psi\approx \nabla f$, where $f:\mathbb{R}^d \rightarrow \mathbb{R}$ is a smooth function, therefore $\psi$ must satisfy $\partial_j \psi_i = \partial_i \psi_j$ pointwise. We prove a theorem that a $\psi$ network with more than one hidden layer can only represent one feature in its first hidden layer; this is a dramatic departure from the well-known results for one hidden layer. The proof of the theorem is straightforward, where two backward paths and a weight-tying matrix play the key roles. We then present the alternative, the implicit parametrization, where the neural network is $\phi: \mathbb{R}^d \rightarrow \mathbb{R}$ and $\nabla \phi \approx \nabla f$; in addition, a “soft analysis” of $\nabla \phi$ gives a dual perspective on the theorem. Throughout, we come back to recent probabilistic models that are formulated as $\nabla \phi \approx \nabla f$, and conclude with a critique of denoising autoencoders.
Tasks	Denoising
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12744v2
PDF	https://arxiv.org/pdf/1910.12744v2.pdf
PWC	https://paperswithcode.com/paper/on-approximating-nabla-f-with-neural-networks
Repo
Framework