Paper Group ANR 259
Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking. CNN-Based Deep Architecture for Reinforced Concrete Delamination Segmentation Through Thermography. Better Algorithms for Stochastic Bandits with Adversarial Corruptions. Atom Responding Machine for Dialog Generat …
Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking
Title | Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking |
Authors | Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari |
Abstract | This paper proposes a generative moment matching network (GMMN)-based post-filter that provides inter-utterance pitch variation for deep neural network (DNN)-based singing voice synthesis. The natural pitch variation of a human singing voice leads to a richer musical experience and is used in double-tracking, a recording method in which two performances of the same phrase are recorded and mixed to create a richer, layered sound. However, singing voices synthesized using conventional DNN-based methods never vary because the synthesis process is deterministic and only one waveform is synthesized from one musical score. To address this problem, we use a GMMN to model the variation of the modulation spectrum of the pitch contour of natural singing voices and add a randomized inter-utterance variation to the pitch contour generated by conventional DNN-based singing voice synthesis. Experimental evaluations suggest that 1) our approach can provide perceptible inter-utterance pitch variation while preserving speech quality. We extend our approach to double-tracking, and the evaluation demonstrates that 2) GMMN-based neural double-tracking is perceptually closer to natural double-tracking than conventional signal processing-based artificial double-tracking is. |
Tasks | |
Published | 2019-02-09 |
URL | http://arxiv.org/abs/1902.03389v1 |
http://arxiv.org/pdf/1902.03389v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-moment-matching-network-based |
Repo | |
Framework | |
CNN-Based Deep Architecture for Reinforced Concrete Delamination Segmentation Through Thermography
Title | CNN-Based Deep Architecture for Reinforced Concrete Delamination Segmentation Through Thermography |
Authors | Chongsheng Cheng, Zhexiong Shang, Zhigang Shen |
Abstract | Delamination assessment of the bridge deck plays a vital role for bridge health monitoring. Thermography as one of the nondestructive technologies for delamination detection has the advantage of efficient data acquisition. But there are challenges on the interpretation of data for accurate delamination shape profiling. Due to the environmental variation and the irregular presence of delamination size and depth, conventional processing methods based on temperature contrast fall short in accurate segmentation of delamination. Inspired by the recent development of deep learning architecture for image segmentation, the Convolutional Neural Network (CNN) based framework was investigated for the applicability of delamination segmentation under variations in temperature contrast and shape diffusion. The models were developed based on Dense Convolutional Network (DenseNet) and trained on thermal images collected for mimicked delamination in concrete slabs with different depths under experimental setup. The results suggested satisfactory performance of accurate profiling the delamination shapes. |
Tasks | Semantic Segmentation |
Published | 2019-04-11 |
URL | http://arxiv.org/abs/1904.05509v1 |
http://arxiv.org/pdf/1904.05509v1.pdf | |
PWC | https://paperswithcode.com/paper/cnn-based-deep-architecture-for-reinforced |
Repo | |
Framework | |
Better Algorithms for Stochastic Bandits with Adversarial Corruptions
Title | Better Algorithms for Stochastic Bandits with Adversarial Corruptions |
Authors | Anupam Gupta, Tomer Koren, Kunal Talwar |
Abstract | We study the stochastic multi-armed bandits problem in the presence of adversarial corruption. We present a new algorithm for this problem whose regret is nearly optimal, substantially improving upon previous work. Our algorithm is agnostic to the level of adversarial contamination and can tolerate a significant amount of corruption with virtually no degradation in performance. |
Tasks | Multi-Armed Bandits |
Published | 2019-02-22 |
URL | http://arxiv.org/abs/1902.08647v2 |
http://arxiv.org/pdf/1902.08647v2.pdf | |
PWC | https://paperswithcode.com/paper/better-algorithms-for-stochastic-bandits-with |
Repo | |
Framework | |
Atom Responding Machine for Dialog Generation
Title | Atom Responding Machine for Dialog Generation |
Authors | Ganbin Zhou, Ping Luo, Jingwu Chen, Fen Lin, Leyu Lin, Qing He |
Abstract | Recently, improving the relevance and diversity of dialogue system has attracted wide attention. For a post x, the corresponding response y is usually diverse in the real-world corpus, while the conventional encoder-decoder model tends to output the high-frequency (safe but trivial) responses and thus is difficult to handle the large number of responding styles. To address these issues, we propose the Atom Responding Machine (ARM), which is based on a proposed encoder-composer-decoder network trained by a teacher-student framework. To enrich the generated responses, ARM introduces a large number of molecule-mechanisms as various responding styles, which are conducted by taking different combinations from a few atom-mechanisms. In other words, even a little of atom-mechanisms can make a mickle of molecule-mechanisms. The experiments demonstrate diversity and quality of the responses generated by ARM. We also present generating process to show underlying interpretability for the result. |
Tasks | |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05532v2 |
https://arxiv.org/pdf/1905.05532v2.pdf | |
PWC | https://paperswithcode.com/paper/atom-responding-machine-for-dialog-generation |
Repo | |
Framework | |
Ensemble Model Patching: A Parameter-Efficient Variational Bayesian Neural Network
Title | Ensemble Model Patching: A Parameter-Efficient Variational Bayesian Neural Network |
Authors | Oscar Chang, Yuling Yao, David Williams-King, Hod Lipson |
Abstract | Two main obstacles preventing the widespread adoption of variational Bayesian neural networks are the high parameter overhead that makes them infeasible on large networks, and the difficulty of implementation, which can be thought of as “programming overhead.” MC dropout [Gal and Ghahramani, 2016] is popular because it sidesteps these obstacles. Nevertheless, dropout is often harmful to model performance when used in networks with batch normalization layers [Li et al., 2018], which are an indispensable part of modern neural networks. We construct a general variational family for ensemble-based Bayesian neural networks that encompasses dropout as a special case. We further present two specific members of this family that work well with batch normalization layers, while retaining the benefits of low parameter and programming overhead, comparable to non-Bayesian training. Our proposed methods improve predictive accuracy and achieve almost perfect calibration on a ResNet-18 trained with ImageNet. |
Tasks | Calibration |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09453v1 |
https://arxiv.org/pdf/1905.09453v1.pdf | |
PWC | https://paperswithcode.com/paper/ensemble-model-patching-a-parameter-efficient |
Repo | |
Framework | |
Multi-Label Product Categorization Using Multi-Modal Fusion Models
Title | Multi-Label Product Categorization Using Multi-Modal Fusion Models |
Authors | Pasawee Wirojwatanakul, Artit Wangperawong |
Abstract | In this study, we investigated multi-modal approaches using images, descriptions, and titles to categorize e-commerce products on Amazon. Specifically, we examined late fusion models, where the modalities are fused at the decision level. Products were each assigned multiple labels, and the hierarchy in the labels were flattened and filtered. For our individual baseline models, we modified a CNN architecture to classify the description and title, and then modified Keras’ ResNet-50 to classify the images, achieving $F_1$ scores of 77.0%, 82.7%, and 61.0%, respectively. In comparison, our tri-modal late fusion model can classify products more effectively than single modal models can, improving the $F_1$ score to 88.2%. Each modality complemented the shortcomings of the other modalities, demonstrating that increasing the number of modalities can be an effective method for improving the performance of multi-label classification problems. |
Tasks | Multi-Label Classification, Product Categorization |
Published | 2019-06-30 |
URL | https://arxiv.org/abs/1907.00420v2 |
https://arxiv.org/pdf/1907.00420v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-label-product-categorization-using |
Repo | |
Framework | |
Massively Multilingual Adversarial Speech Recognition
Title | Massively Multilingual Adversarial Speech Recognition |
Authors | Oliver Adams, Matthew Wiesner, Shinji Watanabe, David Yarowsky |
Abstract | We report on adaptation of multilingual end-to-end speech recognition models trained on as many as 100 languages. Our findings shed light on the relative importance of similarity between the target and pretraining languages along the dimensions of phonetics, phonology, language family, geographical location, and orthography. In this context, experiments demonstrate the effectiveness of two additional pretraining objectives in encouraging language-independent encoder representations: a context-independent phoneme objective paired with a language-adversarial classification objective. |
Tasks | End-To-End Speech Recognition, Speech Recognition |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.02210v1 |
http://arxiv.org/pdf/1904.02210v1.pdf | |
PWC | https://paperswithcode.com/paper/massively-multilingual-adversarial-speech |
Repo | |
Framework | |
Blocksworld Revisited: Learning and Reasoning to Generate Event-Sequences from Image Pairs
Title | Blocksworld Revisited: Learning and Reasoning to Generate Event-Sequences from Image Pairs |
Authors | Tejas Gokhale, Shailaja Sampat, Zhiyuan Fang, Yezhou Yang, Chitta Baral |
Abstract | The process of identifying changes or transformations in a scene along with the ability of reasoning about their causes and effects, is a key aspect of intelligence. In this work we go beyond recent advances in computational perception, and introduce a more challenging task, Image-based Event-Sequencing (IES). In IES, the task is to predict a sequence of actions required to rearrange objects from the configuration in an input source image to the one in the target image. IES also requires systems to possess inductive generalizability. Motivated from evidence in cognitive development, we compile the first IES dataset, the Blocksworld Image Reasoning Dataset (BIRD) which contains images of wooden blocks in different configurations, and the sequence of moves to rearrange one configuration to the other. We first explore the use of existing deep learning architectures and show that these end-to-end methods under-perform in inferring temporal event-sequences and fail at inductive generalization. We then propose a modular two-step approach: Visual Perception followed by Event-Sequencing, and demonstrate improved performance by combining learning and reasoning. Finally, by showing an extension of our approach on natural images, we seek to pave the way for future research on event sequencing for real world scenes. |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.12042v1 |
https://arxiv.org/pdf/1905.12042v1.pdf | |
PWC | https://paperswithcode.com/paper/blocksworld-revisited-learning-and-reasoning |
Repo | |
Framework | |
X-BERT: eXtreme Multi-label Text Classification with using Bidirectional Encoder Representations from Transformers
Title | X-BERT: eXtreme Multi-label Text Classification with using Bidirectional Encoder Representations from Transformers |
Authors | Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, Inderjit Dhillon |
Abstract | Extreme multi-label text classification (XMC) concerns tagging input text with the most relevant labels from an extremely large set. Recently, pre-trained language representation models such as BERT (Bidirectional Encoder Representations from Transformers) have been shown to achieve outstanding performance on many NLP tasks including sentence classification with small label sets (typically fewer than thousands). However, there are several challenges in extending BERT to the XMC problem, such as (i) the difficulty of capturing dependencies or correlations among labels, whose features may come from heterogeneous sources, and (ii) the tractability to scale to the extreme label setting because of the Softmax bottleneck scaling linearly with the output space. To overcome these challenges, we propose X-BERT, the first scalable solution to finetune BERT models on the XMC problem. Specifically, X-BERT leverages both the label and input text to build label representations, which induces semantic label clusters to better model label dependencies. At the heart of X-BERT is a procedure to finetune BERT models to capture the contextual relations between input text and the induced label clusters. Finally, an ensemble of the different BERT models trained on heterogeneous label clusters leads to our best final model, which leads to a state-of-the-art XMC method. In particular, on a Wiki dataset with around 0.5 million labels, the precision@1 of X-BERT is 67:87%, a substantial improvement over the neural baseline fastText and a state-of-the-art XMC approach Parabel, which achieves 32:58% and 60:91% precision@1, respectively. |
Tasks | Extreme Multi-Label Classification, Multi-Label Classification, Multi-Label Text Classification, Product Categorization, Sentence Classification, Text Classification |
Published | 2019-05-07 |
URL | https://arxiv.org/abs/1905.02331v3 |
https://arxiv.org/pdf/1905.02331v3.pdf | |
PWC | https://paperswithcode.com/paper/a-modular-deep-learning-approach-for-extreme |
Repo | |
Framework | |
Target-less registration of point clouds: A review
Title | Target-less registration of point clouds: A review |
Authors | Yue Pan |
Abstract | Point cloud registration has been one of the basic steps of point cloud processing, which has a lot of applications in remote sensing and robotics. In this report, we summarized the basic workflow of target-less point cloud registration,namely correspondence determination and transformation estimation. Then we reviewed three commonly used groups of registration approaches, namely the feature matching based methods, the iterative closest points algorithm and the randomly hypothesis and verify based methods. Besides, we analyzed the advantage and disadvantage of these methods are introduced their common application scenarios. At last, we discussed the challenges of current point cloud registration methods and proposed several open questions for the future development of automatic registration approaches. |
Tasks | Point Cloud Registration |
Published | 2019-12-29 |
URL | https://arxiv.org/abs/1912.12756v1 |
https://arxiv.org/pdf/1912.12756v1.pdf | |
PWC | https://paperswithcode.com/paper/target-less-registration-of-point-clouds-a |
Repo | |
Framework | |
Scribble-based Hierarchical Weakly Supervised Learning for Brain Tumor Segmentation
Title | Scribble-based Hierarchical Weakly Supervised Learning for Brain Tumor Segmentation |
Authors | Zhanghexuan Ji, Yan Shen, Chunwei Ma, Mingchen Gao |
Abstract | The recent state-of-the-art deep learning methods have significantly improved brain tumor segmentation. However, fully supervised training requires a large amount of manually labeled masks, which is highly time-consuming and needs domain expertise. Weakly supervised learning with scribbles provides a good trade-off between model accuracy and the effort of manual labeling. However, for segmenting the hierarchical brain tumor structures, manually labeling scribbles for each substructure could still be demanding. In this paper, we use only two kinds of weak labels, i.e., scribbles on whole tumor and healthy brain tissue, and global labels for the presence of each substructure, to train a deep learning model to segment all the sub-regions. Specifically, we train two networks in two phases: first, we only use whole tumor scribbles to train a whole tumor (WT) segmentation network, which roughly recovers the WT mask of training data; then we cluster the WT region with the guide of global labels. The rough substructure segmentation from clustering is used as weak labels to train the second network. The dense CRF loss is used to refine the weakly supervised segmentation. We evaluate our approach on the BraTS2017 dataset and achieve competitive WT dice score as well as comparable scores on substructure segmentation compared to an upper bound when trained with fully annotated masks. |
Tasks | Brain Tumor Segmentation |
Published | 2019-11-05 |
URL | https://arxiv.org/abs/1911.02014v1 |
https://arxiv.org/pdf/1911.02014v1.pdf | |
PWC | https://paperswithcode.com/paper/scribble-based-hierarchical-weakly-supervised |
Repo | |
Framework | |
DP-LSSGD: A Stochastic Optimization Method to Lift the Utility in Privacy-Preserving ERM
Title | DP-LSSGD: A Stochastic Optimization Method to Lift the Utility in Privacy-Preserving ERM |
Authors | Bao Wang, Quanquan Gu, March Boedihardjo, Farzin Barekat, Stanley J. Osher |
Abstract | Machine learning (ML) models trained by differentially private stochastic gradient descent (DP-SGD) have much lower utility than the non-private ones. To mitigate this degradation, we propose a DP Laplacian smoothing SGD (DP-LSSGD) to train ML models with differential privacy (DP) guarantees. At the core of DP-LSSGD is the Laplacian smoothing, which smooths out the Gaussian noise used in the Gaussian mechanism. Under the same amount of noise used in the Gaussian mechanism, DP-LSSGD attains the same DP guarantee, but in practice, DP-LSSGD makes training both convex and nonconvex ML models more stable and enables the trained models to generalize better. The proposed algorithm is simple to implement and the extra computational complexity and memory overhead compared with DP-SGD are negligible. DP-LSSGD is applicable to train a large variety of ML models, including DNNs. The code is available at \url{https://github.com/BaoWangMath/DP-LSSGD}. |
Tasks | Stochastic Optimization |
Published | 2019-06-28 |
URL | https://arxiv.org/abs/1906.12056v2 |
https://arxiv.org/pdf/1906.12056v2.pdf | |
PWC | https://paperswithcode.com/paper/dp-lssgd-a-stochastic-optimization-method-to |
Repo | |
Framework | |
On Correlation of Features Extracted by Deep Neural Networks
Title | On Correlation of Features Extracted by Deep Neural Networks |
Authors | Babajide O. Ayinde, Tamer Inanc, Jacek M. Zurada |
Abstract | Redundancy in deep neural network (DNN) models has always been one of their most intriguing and important properties. DNNs have been shown to overparameterize, or extract a lot of redundant features. In this work, we explore the impact of size (both width and depth), activation function, and weight initialization on the susceptibility of deep neural network models to extract redundant features. To estimate the number of redundant features in each layer, all the features of a given layer are hierarchically clustered according to their relative cosine distances in feature space and a set threshold. It is shown that both network size and activation function are the two most important components that foster the tendency of DNNs to extract redundant features. The concept is illustrated using deep multilayer perceptron and convolutional neural networks on MNIST digits recognition and CIFAR-10 dataset, respectively. |
Tasks | |
Published | 2019-01-30 |
URL | http://arxiv.org/abs/1901.10900v1 |
http://arxiv.org/pdf/1901.10900v1.pdf | |
PWC | https://paperswithcode.com/paper/on-correlation-of-features-extracted-by-deep |
Repo | |
Framework | |
Deep Reinforcement Learning for Single-Shot Diagnosis and Adaptation in Damaged Robots
Title | Deep Reinforcement Learning for Single-Shot Diagnosis and Adaptation in Damaged Robots |
Authors | Shresth Verma, Haritha S. Nair, Gaurav Agarwal, Joydip Dhar, Anupam Shukla |
Abstract | Robotics has proved to be an indispensable tool in many industrial as well as social applications, such as warehouse automation, manufacturing, disaster robotics, etc. In most of these scenarios, damage to the agent while accomplishing mission-critical tasks can result in failure. To enable robotic adaptation in such situations, the agent needs to adopt policies which are robust to a diverse set of damages and must do so with minimum computational complexity. We thus propose a damage aware control architecture which diagnoses the damage prior to gait selection while also incorporating domain randomization in the damage space for learning a robust policy. To implement damage awareness, we have used a Long Short Term Memory based supervised learning network which diagnoses the damage and predicts the type of damage. The main novelty of this approach is that only a single policy is trained to adapt against a wide variety of damages and the diagnosis is done in a single trial at the time of damage. |
Tasks | |
Published | 2019-10-02 |
URL | https://arxiv.org/abs/1910.01240v1 |
https://arxiv.org/pdf/1910.01240v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-for-single-shot |
Repo | |
Framework | |
Detection of speech events and speaker characteristics through photo-plethysmographic signal neural processing
Title | Detection of speech events and speaker characteristics through photo-plethysmographic signal neural processing |
Authors | Guillermo Cámbara, Jordi Luque, Mireia Farrús |
Abstract | The use of photoplethysmogram signal (PPG) for heart and sleep monitoring is commonly found nowadays in smartphones and wrist wearables. Besides common usages, it has been proposed and reported that person information can be extracted from PPG for other uses, like biometry tasks. In this work, we explore several end-to-end convolutional neural network architectures for detection of human’s characteristics such as gender or person identity. In addition, we evaluate whether speech/non-speech events may be inferred from PPG signal, where speech might translate in fluctuations into the pulse signal. The obtained results are promising and clearly show the potential of fully end-to-end topologies for automatic extraction of meaningful biomarkers, even from a noisy signal sampled by a low-cost PPG sensor. The AUCs for best architectures put forward PPG wave as biological discriminant, reaching $79%$ and $89.0%$, respectively for gender and person verification tasks. Furthermore, speech detection experiments reporting AUCs around $69%$ encourage us for further exploration about the feasibility of PPG for speech processing tasks. |
Tasks | |
Published | 2019-11-12 |
URL | https://arxiv.org/abs/1911.04808v1 |
https://arxiv.org/pdf/1911.04808v1.pdf | |
PWC | https://paperswithcode.com/paper/detection-of-speech-events-and-speaker |
Repo | |
Framework | |