January 30, 2020

2932 words 14 mins read

Paper Group ANR 259

Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking. CNN-Based Deep Architecture for Reinforced Concrete Delamination Segmentation Through Thermography. Better Algorithms for Stochastic Bandits with Adversarial Corruptions. Atom Responding Machine for Dialog Generat …

Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking


Title	Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking
Authors	Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari
Abstract	This paper proposes a generative moment matching network (GMMN)-based post-filter that provides inter-utterance pitch variation for deep neural network (DNN)-based singing voice synthesis. The natural pitch variation of a human singing voice leads to a richer musical experience and is used in double-tracking, a recording method in which two performances of the same phrase are recorded and mixed to create a richer, layered sound. However, singing voices synthesized using conventional DNN-based methods never vary because the synthesis process is deterministic and only one waveform is synthesized from one musical score. To address this problem, we use a GMMN to model the variation of the modulation spectrum of the pitch contour of natural singing voices and add a randomized inter-utterance variation to the pitch contour generated by conventional DNN-based singing voice synthesis. Experimental evaluations suggest that 1) our approach can provide perceptible inter-utterance pitch variation while preserving speech quality. We extend our approach to double-tracking, and the evaluation demonstrates that 2) GMMN-based neural double-tracking is perceptually closer to natural double-tracking than conventional signal processing-based artificial double-tracking is.
Tasks
Published	2019-02-09
URL	http://arxiv.org/abs/1902.03389v1
PDF	http://arxiv.org/pdf/1902.03389v1.pdf
PWC	https://paperswithcode.com/paper/generative-moment-matching-network-based
Repo
Framework

CNN-Based Deep Architecture for Reinforced Concrete Delamination Segmentation Through Thermography


Title	CNN-Based Deep Architecture for Reinforced Concrete Delamination Segmentation Through Thermography
Authors	Chongsheng Cheng, Zhexiong Shang, Zhigang Shen
Abstract	Delamination assessment of the bridge deck plays a vital role for bridge health monitoring. Thermography as one of the nondestructive technologies for delamination detection has the advantage of efficient data acquisition. But there are challenges on the interpretation of data for accurate delamination shape profiling. Due to the environmental variation and the irregular presence of delamination size and depth, conventional processing methods based on temperature contrast fall short in accurate segmentation of delamination. Inspired by the recent development of deep learning architecture for image segmentation, the Convolutional Neural Network (CNN) based framework was investigated for the applicability of delamination segmentation under variations in temperature contrast and shape diffusion. The models were developed based on Dense Convolutional Network (DenseNet) and trained on thermal images collected for mimicked delamination in concrete slabs with different depths under experimental setup. The results suggested satisfactory performance of accurate profiling the delamination shapes.
Tasks	Semantic Segmentation
Published	2019-04-11
URL	http://arxiv.org/abs/1904.05509v1
PDF	http://arxiv.org/pdf/1904.05509v1.pdf
PWC	https://paperswithcode.com/paper/cnn-based-deep-architecture-for-reinforced
Repo
Framework

Better Algorithms for Stochastic Bandits with Adversarial Corruptions


Title	Better Algorithms for Stochastic Bandits with Adversarial Corruptions
Authors	Anupam Gupta, Tomer Koren, Kunal Talwar
Abstract	We study the stochastic multi-armed bandits problem in the presence of adversarial corruption. We present a new algorithm for this problem whose regret is nearly optimal, substantially improving upon previous work. Our algorithm is agnostic to the level of adversarial contamination and can tolerate a significant amount of corruption with virtually no degradation in performance.
Tasks	Multi-Armed Bandits
Published	2019-02-22
URL	http://arxiv.org/abs/1902.08647v2
PDF	http://arxiv.org/pdf/1902.08647v2.pdf
PWC	https://paperswithcode.com/paper/better-algorithms-for-stochastic-bandits-with
Repo
Framework

Atom Responding Machine for Dialog Generation


Title	Atom Responding Machine for Dialog Generation
Authors	Ganbin Zhou, Ping Luo, Jingwu Chen, Fen Lin, Leyu Lin, Qing He
Abstract	Recently, improving the relevance and diversity of dialogue system has attracted wide attention. For a post x, the corresponding response y is usually diverse in the real-world corpus, while the conventional encoder-decoder model tends to output the high-frequency (safe but trivial) responses and thus is difficult to handle the large number of responding styles. To address these issues, we propose the Atom Responding Machine (ARM), which is based on a proposed encoder-composer-decoder network trained by a teacher-student framework. To enrich the generated responses, ARM introduces a large number of molecule-mechanisms as various responding styles, which are conducted by taking different combinations from a few atom-mechanisms. In other words, even a little of atom-mechanisms can make a mickle of molecule-mechanisms. The experiments demonstrate diversity and quality of the responses generated by ARM. We also present generating process to show underlying interpretability for the result.
Tasks
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05532v2
PDF	https://arxiv.org/pdf/1905.05532v2.pdf
PWC	https://paperswithcode.com/paper/atom-responding-machine-for-dialog-generation
Repo
Framework

Ensemble Model Patching: A Parameter-Efficient Variational Bayesian Neural Network


Title	Ensemble Model Patching: A Parameter-Efficient Variational Bayesian Neural Network
Authors	Oscar Chang, Yuling Yao, David Williams-King, Hod Lipson
Abstract	Two main obstacles preventing the widespread adoption of variational Bayesian neural networks are the high parameter overhead that makes them infeasible on large networks, and the difficulty of implementation, which can be thought of as “programming overhead.” MC dropout [Gal and Ghahramani, 2016] is popular because it sidesteps these obstacles. Nevertheless, dropout is often harmful to model performance when used in networks with batch normalization layers [Li et al., 2018], which are an indispensable part of modern neural networks. We construct a general variational family for ensemble-based Bayesian neural networks that encompasses dropout as a special case. We further present two specific members of this family that work well with batch normalization layers, while retaining the benefits of low parameter and programming overhead, comparable to non-Bayesian training. Our proposed methods improve predictive accuracy and achieve almost perfect calibration on a ResNet-18 trained with ImageNet.
Tasks	Calibration
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09453v1
PDF	https://arxiv.org/pdf/1905.09453v1.pdf
PWC	https://paperswithcode.com/paper/ensemble-model-patching-a-parameter-efficient
Repo
Framework


Title	Multi-Label Product Categorization Using Multi-Modal Fusion Models
Authors	Pasawee Wirojwatanakul, Artit Wangperawong
Abstract	In this study, we investigated multi-modal approaches using images, descriptions, and titles to categorize e-commerce products on Amazon. Specifically, we examined late fusion models, where the modalities are fused at the decision level. Products were each assigned multiple labels, and the hierarchy in the labels were flattened and filtered. For our individual baseline models, we modified a CNN architecture to classify the description and title, and then modified Keras’ ResNet-50 to classify the images, achieving $F_1$ scores of 77.0%, 82.7%, and 61.0%, respectively. In comparison, our tri-modal late fusion model can classify products more effectively than single modal models can, improving the $F_1$ score to 88.2%. Each modality complemented the shortcomings of the other modalities, demonstrating that increasing the number of modalities can be an effective method for improving the performance of multi-label classification problems.
Tasks	Multi-Label Classification, Product Categorization
Published	2019-06-30
URL	https://arxiv.org/abs/1907.00420v2
PDF	https://arxiv.org/pdf/1907.00420v2.pdf
PWC	https://paperswithcode.com/paper/multi-label-product-categorization-using
Repo
Framework

Massively Multilingual Adversarial Speech Recognition


Title	Massively Multilingual Adversarial Speech Recognition
Authors	Oliver Adams, Matthew Wiesner, Shinji Watanabe, David Yarowsky
Abstract	We report on adaptation of multilingual end-to-end speech recognition models trained on as many as 100 languages. Our findings shed light on the relative importance of similarity between the target and pretraining languages along the dimensions of phonetics, phonology, language family, geographical location, and orthography. In this context, experiments demonstrate the effectiveness of two additional pretraining objectives in encouraging language-independent encoder representations: a context-independent phoneme objective paired with a language-adversarial classification objective.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2019-04-03
URL	http://arxiv.org/abs/1904.02210v1
PDF	http://arxiv.org/pdf/1904.02210v1.pdf
PWC	https://paperswithcode.com/paper/massively-multilingual-adversarial-speech
Repo
Framework

Blocksworld Revisited: Learning and Reasoning to Generate Event-Sequences from Image Pairs


Title	Blocksworld Revisited: Learning and Reasoning to Generate Event-Sequences from Image Pairs
Authors	Tejas Gokhale, Shailaja Sampat, Zhiyuan Fang, Yezhou Yang, Chitta Baral
Abstract	The process of identifying changes or transformations in a scene along with the ability of reasoning about their causes and effects, is a key aspect of intelligence. In this work we go beyond recent advances in computational perception, and introduce a more challenging task, Image-based Event-Sequencing (IES). In IES, the task is to predict a sequence of actions required to rearrange objects from the configuration in an input source image to the one in the target image. IES also requires systems to possess inductive generalizability. Motivated from evidence in cognitive development, we compile the first IES dataset, the Blocksworld Image Reasoning Dataset (BIRD) which contains images of wooden blocks in different configurations, and the sequence of moves to rearrange one configuration to the other. We first explore the use of existing deep learning architectures and show that these end-to-end methods under-perform in inferring temporal event-sequences and fail at inductive generalization. We then propose a modular two-step approach: Visual Perception followed by Event-Sequencing, and demonstrate improved performance by combining learning and reasoning. Finally, by showing an extension of our approach on natural images, we seek to pave the way for future research on event sequencing for real world scenes.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.12042v1
PDF	https://arxiv.org/pdf/1905.12042v1.pdf
PWC	https://paperswithcode.com/paper/blocksworld-revisited-learning-and-reasoning
Repo
Framework

X-BERT: eXtreme Multi-label Text Classification with using Bidirectional Encoder Representations from Transformers


Title	X-BERT: eXtreme Multi-label Text Classification with using Bidirectional Encoder Representations from Transformers
Authors	Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, Inderjit Dhillon
Abstract	Extreme multi-label text classification (XMC) concerns tagging input text with the most relevant labels from an extremely large set. Recently, pre-trained language representation models such as BERT (Bidirectional Encoder Representations from Transformers) have been shown to achieve outstanding performance on many NLP tasks including sentence classification with small label sets (typically fewer than thousands). However, there are several challenges in extending BERT to the XMC problem, such as (i) the difficulty of capturing dependencies or correlations among labels, whose features may come from heterogeneous sources, and (ii) the tractability to scale to the extreme label setting because of the Softmax bottleneck scaling linearly with the output space. To overcome these challenges, we propose X-BERT, the first scalable solution to finetune BERT models on the XMC problem. Specifically, X-BERT leverages both the label and input text to build label representations, which induces semantic label clusters to better model label dependencies. At the heart of X-BERT is a procedure to finetune BERT models to capture the contextual relations between input text and the induced label clusters. Finally, an ensemble of the different BERT models trained on heterogeneous label clusters leads to our best final model, which leads to a state-of-the-art XMC method. In particular, on a Wiki dataset with around 0.5 million labels, the precision@1 of X-BERT is 67:87%, a substantial improvement over the neural baseline fastText and a state-of-the-art XMC approach Parabel, which achieves 32:58% and 60:91% precision@1, respectively.
Tasks	Extreme Multi-Label Classification, Multi-Label Classification, Multi-Label Text Classification, Product Categorization, Sentence Classification, Text Classification
Published	2019-05-07
URL	https://arxiv.org/abs/1905.02331v3
PDF	https://arxiv.org/pdf/1905.02331v3.pdf
PWC	https://paperswithcode.com/paper/a-modular-deep-learning-approach-for-extreme
Repo
Framework

Target-less registration of point clouds: A review


Title	Target-less registration of point clouds: A review
Authors	Yue Pan
Abstract	Point cloud registration has been one of the basic steps of point cloud processing, which has a lot of applications in remote sensing and robotics. In this report, we summarized the basic workflow of target-less point cloud registration,namely correspondence determination and transformation estimation. Then we reviewed three commonly used groups of registration approaches, namely the feature matching based methods, the iterative closest points algorithm and the randomly hypothesis and verify based methods. Besides, we analyzed the advantage and disadvantage of these methods are introduced their common application scenarios. At last, we discussed the challenges of current point cloud registration methods and proposed several open questions for the future development of automatic registration approaches.
Tasks	Point Cloud Registration
Published	2019-12-29
URL	https://arxiv.org/abs/1912.12756v1
PDF	https://arxiv.org/pdf/1912.12756v1.pdf
PWC	https://paperswithcode.com/paper/target-less-registration-of-point-clouds-a
Repo
Framework

Scribble-based Hierarchical Weakly Supervised Learning for Brain Tumor Segmentation


Title	Scribble-based Hierarchical Weakly Supervised Learning for Brain Tumor Segmentation
Authors	Zhanghexuan Ji, Yan Shen, Chunwei Ma, Mingchen Gao
Abstract	The recent state-of-the-art deep learning methods have significantly improved brain tumor segmentation. However, fully supervised training requires a large amount of manually labeled masks, which is highly time-consuming and needs domain expertise. Weakly supervised learning with scribbles provides a good trade-off between model accuracy and the effort of manual labeling. However, for segmenting the hierarchical brain tumor structures, manually labeling scribbles for each substructure could still be demanding. In this paper, we use only two kinds of weak labels, i.e., scribbles on whole tumor and healthy brain tissue, and global labels for the presence of each substructure, to train a deep learning model to segment all the sub-regions. Specifically, we train two networks in two phases: first, we only use whole tumor scribbles to train a whole tumor (WT) segmentation network, which roughly recovers the WT mask of training data; then we cluster the WT region with the guide of global labels. The rough substructure segmentation from clustering is used as weak labels to train the second network. The dense CRF loss is used to refine the weakly supervised segmentation. We evaluate our approach on the BraTS2017 dataset and achieve competitive WT dice score as well as comparable scores on substructure segmentation compared to an upper bound when trained with fully annotated masks.
Tasks	Brain Tumor Segmentation
Published	2019-11-05
URL	https://arxiv.org/abs/1911.02014v1
PDF	https://arxiv.org/pdf/1911.02014v1.pdf
PWC	https://paperswithcode.com/paper/scribble-based-hierarchical-weakly-supervised
Repo
Framework

DP-LSSGD: A Stochastic Optimization Method to Lift the Utility in Privacy-Preserving ERM


Title	DP-LSSGD: A Stochastic Optimization Method to Lift the Utility in Privacy-Preserving ERM
Authors	Bao Wang, Quanquan Gu, March Boedihardjo, Farzin Barekat, Stanley J. Osher
Abstract	Machine learning (ML) models trained by differentially private stochastic gradient descent (DP-SGD) have much lower utility than the non-private ones. To mitigate this degradation, we propose a DP Laplacian smoothing SGD (DP-LSSGD) to train ML models with differential privacy (DP) guarantees. At the core of DP-LSSGD is the Laplacian smoothing, which smooths out the Gaussian noise used in the Gaussian mechanism. Under the same amount of noise used in the Gaussian mechanism, DP-LSSGD attains the same DP guarantee, but in practice, DP-LSSGD makes training both convex and nonconvex ML models more stable and enables the trained models to generalize better. The proposed algorithm is simple to implement and the extra computational complexity and memory overhead compared with DP-SGD are negligible. DP-LSSGD is applicable to train a large variety of ML models, including DNNs. The code is available at \url{https://github.com/BaoWangMath/DP-LSSGD}.
Tasks	Stochastic Optimization
Published	2019-06-28
URL	https://arxiv.org/abs/1906.12056v2
PDF	https://arxiv.org/pdf/1906.12056v2.pdf
PWC	https://paperswithcode.com/paper/dp-lssgd-a-stochastic-optimization-method-to
Repo
Framework

On Correlation of Features Extracted by Deep Neural Networks


Title	On Correlation of Features Extracted by Deep Neural Networks
Authors	Babajide O. Ayinde, Tamer Inanc, Jacek M. Zurada
Abstract	Redundancy in deep neural network (DNN) models has always been one of their most intriguing and important properties. DNNs have been shown to overparameterize, or extract a lot of redundant features. In this work, we explore the impact of size (both width and depth), activation function, and weight initialization on the susceptibility of deep neural network models to extract redundant features. To estimate the number of redundant features in each layer, all the features of a given layer are hierarchically clustered according to their relative cosine distances in feature space and a set threshold. It is shown that both network size and activation function are the two most important components that foster the tendency of DNNs to extract redundant features. The concept is illustrated using deep multilayer perceptron and convolutional neural networks on MNIST digits recognition and CIFAR-10 dataset, respectively.
Tasks
Published	2019-01-30
URL	http://arxiv.org/abs/1901.10900v1
PDF	http://arxiv.org/pdf/1901.10900v1.pdf
PWC	https://paperswithcode.com/paper/on-correlation-of-features-extracted-by-deep
Repo
Framework

Deep Reinforcement Learning for Single-Shot Diagnosis and Adaptation in Damaged Robots


Title	Deep Reinforcement Learning for Single-Shot Diagnosis and Adaptation in Damaged Robots
Authors	Shresth Verma, Haritha S. Nair, Gaurav Agarwal, Joydip Dhar, Anupam Shukla
Abstract	Robotics has proved to be an indispensable tool in many industrial as well as social applications, such as warehouse automation, manufacturing, disaster robotics, etc. In most of these scenarios, damage to the agent while accomplishing mission-critical tasks can result in failure. To enable robotic adaptation in such situations, the agent needs to adopt policies which are robust to a diverse set of damages and must do so with minimum computational complexity. We thus propose a damage aware control architecture which diagnoses the damage prior to gait selection while also incorporating domain randomization in the damage space for learning a robust policy. To implement damage awareness, we have used a Long Short Term Memory based supervised learning network which diagnoses the damage and predicts the type of damage. The main novelty of this approach is that only a single policy is trained to adapt against a wide variety of damages and the diagnosis is done in a single trial at the time of damage.
Tasks
Published	2019-10-02
URL	https://arxiv.org/abs/1910.01240v1
PDF	https://arxiv.org/pdf/1910.01240v1.pdf
PWC	https://paperswithcode.com/paper/deep-reinforcement-learning-for-single-shot
Repo
Framework

Detection of speech events and speaker characteristics through photo-plethysmographic signal neural processing


Title	Detection of speech events and speaker characteristics through photo-plethysmographic signal neural processing
Authors	Guillermo Cámbara, Jordi Luque, Mireia Farrús
Abstract	The use of photoplethysmogram signal (PPG) for heart and sleep monitoring is commonly found nowadays in smartphones and wrist wearables. Besides common usages, it has been proposed and reported that person information can be extracted from PPG for other uses, like biometry tasks. In this work, we explore several end-to-end convolutional neural network architectures for detection of human’s characteristics such as gender or person identity. In addition, we evaluate whether speech/non-speech events may be inferred from PPG signal, where speech might translate in fluctuations into the pulse signal. The obtained results are promising and clearly show the potential of fully end-to-end topologies for automatic extraction of meaningful biomarkers, even from a noisy signal sampled by a low-cost PPG sensor. The AUCs for best architectures put forward PPG wave as biological discriminant, reaching $79%$ and $89.0%$, respectively for gender and person verification tasks. Furthermore, speech detection experiments reporting AUCs around $69%$ encourage us for further exploration about the feasibility of PPG for speech processing tasks.
Tasks
Published	2019-11-12
URL	https://arxiv.org/abs/1911.04808v1
PDF	https://arxiv.org/pdf/1911.04808v1.pdf
PWC	https://paperswithcode.com/paper/detection-of-speech-events-and-speaker
Repo
Framework