January 26, 2020

3458 words 17 mins read

Paper Group ANR 1590

Paper Group ANR 1590

Towards More Sample Efficiency in Reinforcement Learning with Data Augmentation. Elimination of All Bad Local Minima in Deep Learning. Multi-scale discriminative Region Discovery for Weakly-Supervised Object Localization. Is Two Better than One? Effects of Multiple Agents on User Persuasion. Deep Autotuner: A Data-Driven Approach to Natural-Soundin …

Towards More Sample Efficiency in Reinforcement Learning with Data Augmentation

Title Towards More Sample Efficiency in Reinforcement Learning with Data Augmentation
Authors Yijiong Lin, Jiancong Huang, Matthieu Zimmer, Juan Rojas, Paul Weng
Abstract Deep reinforcement learning (DRL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements. We propose two novel data augmentation techniques for DRL in order to reuse more efficiently observed data. The first one called Kaleidoscope Experience Replay exploits reflectional symmetries, while the second called Goal-augmented Experience Replay takes advantage of lax goal definitions. Our preliminary experimental results show a large increase in learning speed.
Tasks Data Augmentation
Published 2019-10-19
URL https://arxiv.org/abs/1910.09959v3
PDF https://arxiv.org/pdf/1910.09959v3.pdf
PWC https://paperswithcode.com/paper/towards-more-sample-efficiency
Repo
Framework

Elimination of All Bad Local Minima in Deep Learning

Title Elimination of All Bad Local Minima in Deep Learning
Authors Kenji Kawaguchi, Leslie Pack Kaelbling
Abstract In this paper, we theoretically prove that adding one special neuron per output unit eliminates all suboptimal local minima of any deep neural network, for multi-class classification, binary classification, and regression with an arbitrary loss function, under practical assumptions. At every local minimum of any deep neural network with these added neurons, the set of parameters of the original neural network (without added neurons) is guaranteed to be a global minimum of the original neural network. The effects of the added neurons are proven to automatically vanish at every local minimum. Moreover, we provide a novel theoretical characterization of a failure mode of eliminating suboptimal local minima via an additional theorem and several examples. This paper also introduces a novel proof technique based on the perturbable gradient basis (PGB) necessary condition of local minima, which provides new insight into the elimination of local minima and is applicable to analyze various models and transformations of objective functions beyond the elimination of local minima.
Tasks
Published 2019-01-02
URL https://arxiv.org/abs/1901.00279v2
PDF https://arxiv.org/pdf/1901.00279v2.pdf
PWC https://paperswithcode.com/paper/elimination-of-all-bad-local-minima-in-deep
Repo
Framework

Multi-scale discriminative Region Discovery for Weakly-Supervised Object Localization

Title Multi-scale discriminative Region Discovery for Weakly-Supervised Object Localization
Authors Pei Lv, Haiyu Yu, Junxiao Xue, Junjin Cheng, Lisha Cui, Bing Zhou, Mingliang Xu, Yi Yang
Abstract Localizing objects with weak supervision in an image is a key problem of the research in computer vision community. Many existing Weakly-Supervised Object Localization (WSOL) approaches tackle this problem by estimating the most discriminative regions with feature maps (activation maps) obtained by Deep Convolutional Neural Network, that is, only the objects or parts of them with the most discriminative response will be located. However, the activation maps often display different local maximum responses or relatively weak response when one image contains multiple objects with the same type or small objects. In this paper, we propose a simple yet effective multi-scale discriminative region discovery method to localize not only more integral objects but also as many as possible with only image-level class labels. The gradient weights flowing into different convolutional layers of CNN are taken as the input of our method, which is different from previous methods only considering that of the final convolutional layer. To mine more discriminative regions for the task of object localization, the multiple local maximum from the gradient weight maps are leveraged to generate the localization map with a parallel sliding window. Furthermore, multi-scale localization maps from different convolutional layers are fused to produce the final result. We evaluate the proposed method with the foundation of VGGnet on the ILSVRC 2016, CUB-200-2011 and PASCAL VOC 2012 datasets. On ILSVRC 2016, the proposed method yields the Top-1 localization error of 48.65%, which outperforms previous results by 2.75%. On PASCAL VOC 2012, our approach achieve the highest localization accuracy of 0.43. Even for CUB-200-2011 dataset, our method still achieves competitive results.
Tasks Object Localization, Weakly-Supervised Object Localization
Published 2019-09-24
URL https://arxiv.org/abs/1909.10698v1
PDF https://arxiv.org/pdf/1909.10698v1.pdf
PWC https://paperswithcode.com/paper/multi-scale-discriminative-region-discovery
Repo
Framework

Is Two Better than One? Effects of Multiple Agents on User Persuasion

Title Is Two Better than One? Effects of Multiple Agents on User Persuasion
Authors Reshmashree B. Kantharaju, Dominic De Franco, Alison Pease, Catherine Pelachaud
Abstract Virtual humans need to be persuasive in order to promote behaviour change in human users. While several studies have focused on understanding the numerous aspects that influence the degree of persuasion, most of them are limited to dyadic interactions. In this paper, we present an evaluation study focused on understanding the effects of multiple agents on user’s persuasion. Along with gender and status (authoritative & peer), we also look at type of focus employed by the agent i.e., user-directed where the agent aims to persuade by addressing the user directly and vicarious where the agent aims to persuade the user, who is an observer, indirectly by engaging another agent in the discussion. Participants were randomly assigned to one of the 12 conditions and presented with a persuasive message by one or several virtual agents. A questionnaire was used to measure perceived interpersonal attitude, credibility and persuasion. Results indicate that credibility positively affects persuasion. In general, multiple agent setting, irrespective of the focus, was more persuasive than single agent setting. Although, participants favored user-directed setting and reported it to be persuasive and had an increased level of trust in the agents, the actual change in persuasion score reflects that vicarious setting was the most effective in inducing behaviour change. In addition to this, the study also revealed that authoritative agents were the most persuasive.
Tasks
Published 2019-04-10
URL http://arxiv.org/abs/1904.05248v1
PDF http://arxiv.org/pdf/1904.05248v1.pdf
PWC https://paperswithcode.com/paper/is-two-better-than-one-effects-of-multiple
Repo
Framework

Deep Autotuner: A Data-Driven Approach to Natural-Sounding Pitch Correction for Singing Voice in Karaoke Performances

Title Deep Autotuner: A Data-Driven Approach to Natural-Sounding Pitch Correction for Singing Voice in Karaoke Performances
Authors Sanna Wager, George Tzanetakis, Cheng-i Wang, Lijiang Guo, Aswin Sivaraman, Minje Kim
Abstract We describe a machine-learning approach to pitch correcting a solo singing performance in a karaoke setting, where the solo voice and accompaniment are on separate tracks. The proposed approach addresses the situation where no musical score of the vocals nor the accompaniment exists: It predicts the amount of correction from the relationship between the spectral contents of the vocal and accompaniment tracks. Hence, the pitch shift in cents suggested by the model can be used to make the voice sound in tune with the accompaniment. This approach differs from commercially used automatic pitch correction systems, where notes in the vocal tracks are shifted to be centered around notes in a user-defined score or mapped to the closest pitch among the twelve equal-tempered scale degrees. We train the model using a dataset of 4,702 amateur karaoke performances selected for good intonation. We present a Convolutional Gated Recurrent Unit (CGRU) model to accomplish this task. This method can be extended into unsupervised pitch correction of a vocal performance, popularly referred to as autotuning.
Tasks
Published 2019-02-03
URL http://arxiv.org/abs/1902.00956v1
PDF http://arxiv.org/pdf/1902.00956v1.pdf
PWC https://paperswithcode.com/paper/deep-autotuner-a-data-driven-approach-to
Repo
Framework

SMiRL: Surprise Minimizing RL in Dynamic Environments

Title SMiRL: Surprise Minimizing RL in Dynamic Environments
Authors Glen Berseth, Daniel Geng, Coline Devin, Nicholas Rhinehart, Chelsea Finn, Dinesh Jayaraman, Sergey Levine
Abstract All living organisms struggle against the forces of nature to carve out a maintainable niche. We propose that such a search for order amidst chaos might offer a unifying principle for the emergence of useful behaviors in artificial agents. We formalize this idea into an unsupervised reinforcement learning method called Surprise Minimizing RL (SMiRL). SmiRL alternates between learning a density model to evaluate the surprise of a stimulus, and improving the policy to seek more predictable stimuli. This process maximizes a lower-bound on the negative entropy of the states, which can be seen as maximizing the agent’s ability to maintain order in the environment. The policy seeks out stable and repeatable situations that counteract the environment’s prevailing sources of entropy. This might include avoiding other hostile agents, or finding a stable, balanced pose for a bipedal robot in the face of disturbance forces. We demonstrate that our surprise minimizing agents can successfully play Tetris, Doom, control a humanoid to avoid falls, and navigate to escape enemies in a maze without any task-specific reward supervision. We further show that SMiRL can be used together with a standard task rewards to accelerate reward-driven learning.
Tasks
Published 2019-12-11
URL https://arxiv.org/abs/1912.05510v2
PDF https://arxiv.org/pdf/1912.05510v2.pdf
PWC https://paperswithcode.com/paper/smirl-surprise-minimizing-rl-in-dynamic
Repo
Framework

Learning to Predict Explainable Plots for Neural Story Generation

Title Learning to Predict Explainable Plots for Neural Story Generation
Authors Gang Chen, Yang Liu, Huanbo Luan, Meng Zhang, Qun Liu, Maosong Sun
Abstract Story generation is an important natural language processing task that aims to generate coherent stories automatically. While the use of neural networks has proven effective in improving story generation, how to learn to generate an explainable high-level plot still remains a major challenge. In this work, we propose a latent variable model for neural story generation. The model treats an outline, which is a natural language sentence explainable to humans, as a latent variable to represent a high-level plot that bridges the input and output. We adopt an external summarization model to guide the latent variable model to learn how to generate outlines from training data. Experiments show that our approach achieves significant improvements over state-of-the-art methods in both automatic and human evaluations.
Tasks
Published 2019-12-05
URL https://arxiv.org/abs/1912.02395v2
PDF https://arxiv.org/pdf/1912.02395v2.pdf
PWC https://paperswithcode.com/paper/learning-to-predict-explainable-plots-for
Repo
Framework

Online Optimization with Predictions and Non-convex Losses

Title Online Optimization with Predictions and Non-convex Losses
Authors Yiheng Lin, Gautam Goel, Adam Wierman
Abstract We study online optimization in a setting where an online learner seeks to optimize a per-round hitting cost, which may be non-convex, while incurring a movement cost when changing actions between rounds. We ask: \textit{under what general conditions is it possible for an online learner to leverage predictions of future cost functions in order to achieve near-optimal costs?} Prior work has provided near-optimal online algorithms for specific combinations of assumptions about hitting and switching costs, but no general results are known. In this work, we give two general sufficient conditions that specify a relationship between the hitting and movement costs which guarantees that a new algorithm, Synchronized Fixed Horizon Control (SFHC), provides a $1+O(1/w)$ competitive ratio, where $w$ is the number of predictions available to the learner. Our conditions do not require the cost functions to be convex, and we also derive competitive ratio results for non-convex hitting and movement costs. Our results provide the first constant, dimension-free competitive ratio for online non-convex optimization with movement costs. Further, we give an example of a natural instance, Convex Body Chasing (CBC), where the sufficient conditions are not satisfied and we can prove that no online algorithm can have a competitive ratio that converges to 1.
Tasks
Published 2019-11-10
URL https://arxiv.org/abs/1911.03827v2
PDF https://arxiv.org/pdf/1911.03827v2.pdf
PWC https://paperswithcode.com/paper/online-optimization-with-predictions-and-non
Repo
Framework

Structure fusion based on graph convolutional networks for semi-supervised classification

Title Structure fusion based on graph convolutional networks for semi-supervised classification
Authors Guangfeng Lin, Jing Wang, Kaiyang Liao, Fan Zhao, Wanjun Chen
Abstract Suffering from the multi-view data diversity and complexity for semi-supervised classification, most of existing graph convolutional networks focus on the networks architecture construction or the salient graph structure preservation, and ignore the the complete graph structure for semi-supervised classification contribution. To mine the more complete distribution structure from multi-view data with the consideration of the specificity and the commonality, we propose structure fusion based on graph convolutional networks (SF-GCN) for improving the performance of semi-supervised classification. SF-GCN can not only retain the special characteristic of each view data by spectral embedding, but also capture the common style of multi-view data by distance metric between multi-graph structures. Suppose the linear relationship between multi-graph structures, we can construct the optimization function of structure fusion model by balancing the specificity loss and the commonality loss. By solving this function, we can simultaneously obtain the fusion spectral embedding from the multi-view data and the fusion structure as adjacent matrix to input graph convolutional networks for semi-supervised classification. Experiments demonstrate that the performance of SF-GCN outperforms that of the state of the arts on three challenging datasets, which are Cora,Citeseer and Pubmed in citation networks.
Tasks Node Classification
Published 2019-07-02
URL https://arxiv.org/abs/1907.02586v1
PDF https://arxiv.org/pdf/1907.02586v1.pdf
PWC https://paperswithcode.com/paper/structure-fusion-based-on-graph-convolutional
Repo
Framework

CC-Net: Image Complexity Guided Network Compression for Biomedical Image Segmentation

Title CC-Net: Image Complexity Guided Network Compression for Biomedical Image Segmentation
Authors Suraj Mishra, Peixian Liang, Adam Czajka, Danny Z. Chen, X. Sharon Hu
Abstract Convolutional neural networks (CNNs) for biomedical image analysis are often of very large size, resulting in high memory requirement and high latency of operations. Searching for an acceptable compressed representation of the base CNN for a specific imaging application typically involves a series of time-consuming training/validation experiments to achieve a good compromise between network size and accuracy. To address this challenge, we propose CC-Net, a new image complexity-guided CNN compression scheme for biomedical image segmentation. Given a CNN model, CC-Net predicts the final accuracy of networks of different sizes based on the average image complexity computed from the training data. It then selects a multiplicative factor for producing a desired network with acceptable network accuracy and size. Experiments show that CC-Net is effective for generating compressed segmentation networks, retaining up to 95% of the base network segmentation accuracy and utilizing only 0.1% of trainable parameters of the full-sized networks in the best case.
Tasks Semantic Segmentation
Published 2019-01-06
URL https://arxiv.org/abs/1901.01578v2
PDF https://arxiv.org/pdf/1901.01578v2.pdf
PWC https://paperswithcode.com/paper/cc-net-image-complexity-guided-network
Repo
Framework

Speech Recognition with Augmented Synthesized Speech

Title Speech Recognition with Augmented Synthesized Speech
Authors Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Ye Jia, Pedro Moreno, Yonghui Wu, Zelin Wu
Abstract Recent success of the Tacotron speech synthesis architecture and its variants in producing natural sounding multi-speaker synthesized speech has raised the exciting possibility of replacing expensive, manually transcribed, domain-specific, human speech that is used to train speech recognizers. The multi-speaker speech synthesis architecture can learn latent embedding spaces of prosody, speaker and style variations derived from input acoustic representations thereby allowing for manipulation of the synthesized speech. In this paper, we evaluate the feasibility of enhancing speech recognition performance using speech synthesis using two corpora from different domains. We explore algorithms to provide the necessary acoustic and lexical diversity needed for robust speech recognition. Finally, we demonstrate the feasibility of this approach as a data augmentation strategy for domain-transfer. We find that improvements to speech recognition performance is achievable by augmenting training data with synthesized material. However, there remains a substantial gap in performance between recognizers trained on human speech those trained on synthesized speech.
Tasks Data Augmentation, Robust Speech Recognition, Speech Recognition, Speech Synthesis
Published 2019-09-25
URL https://arxiv.org/abs/1909.11699v1
PDF https://arxiv.org/pdf/1909.11699v1.pdf
PWC https://paperswithcode.com/paper/speech-recognition-with-augmented-synthesized
Repo
Framework

Network Shrinkage Estimation

Title Network Shrinkage Estimation
Authors Nesreen K. Ahmed, Nick Duffield
Abstract Networks are a natural representation of complex systems across the sciences, and higher-order dependencies are central to the understanding and modeling of these systems. However, in many practical applications such as online social networks, networks are massive, dynamic, and naturally streaming, where pairwise interactions become available one at a time in some arbitrary order. The massive size and streaming nature of these networks allow only partial observation, since it is infeasible to analyze the entire network. Under such scenarios, it is challenging to study the higher-order structural and connectivity patterns of streaming networks. In this work, we consider the fundamental problem of estimating the higher-order dependencies using adaptive sampling. We propose a novel adaptive, single-pass sampling framework and unbiased estimators for higher-order network analysis of large streaming networks. Our algorithms exploit adaptive techniques to identify edges that are highly informative for efficiently estimating the higher-order structure of streaming networks from small sample data. We also introduce a novel James-Stein-type shrinkage estimator to minimize the estimation error. Our approach is fully analytic with theoretical guarantees, computationally efficient, and can be incrementally updated in a streaming setting. Numerical experiments on large networks show that our approach is superior to baseline methods.
Tasks
Published 2019-08-02
URL https://arxiv.org/abs/1908.01087v1
PDF https://arxiv.org/pdf/1908.01087v1.pdf
PWC https://paperswithcode.com/paper/network-shrinkage-estimation
Repo
Framework

Efficient Sketching Algorithm for Sparse Binary Data

Title Efficient Sketching Algorithm for Sparse Binary Data
Authors Rameshwar Pratap, Debajyoti Bera, Karthik Revanuru
Abstract Recent advancement of the WWW, IOT, social network, e-commerce, etc. have generated a large volume of data. These datasets are mostly represented by high dimensional and sparse datasets. Many fundamental subroutines of common data analytic tasks such as clustering, classification, ranking, nearest neighbour search, etc. scale poorly with the dimension of the dataset. In this work, we address this problem and propose a sketching (alternatively, dimensionality reduction) algorithm – $\binsketch$ (Binary Data Sketch) – for sparse binary datasets. $\binsketch$ preserves the binary version of the dataset after sketching and maintains estimates for multiple similarity measures such as Jaccard, Cosine, Inner-Product similarities, and Hamming distance, on the same sketch. We present a theoretical analysis of our algorithm and complement it with extensive experimentation on several real-world datasets. We compare the performance of our algorithm with the state-of-the-art algorithms on the task of mean-square-error and ranking. Our proposed algorithm offers a comparable accuracy while suggesting a significant speedup in the dimensionality reduction time, with respect to the other candidate algorithms. Our proposal is simple, easy to implement, and therefore can be adopted in practice.
Tasks Dimensionality Reduction
Published 2019-10-10
URL https://arxiv.org/abs/1910.04658v1
PDF https://arxiv.org/pdf/1910.04658v1.pdf
PWC https://paperswithcode.com/paper/efficient-sketching-algorithm-for-sparse
Repo
Framework

FD-FCN: 3D Fully Dense and Fully Convolutional Network for Semantic Segmentation of Brain Anatomy

Title FD-FCN: 3D Fully Dense and Fully Convolutional Network for Semantic Segmentation of Brain Anatomy
Authors Binbin Yang, Weiwei Zhang
Abstract In this paper, a 3D patch-based fully dense and fully convolutional network (FD-FCN) is proposed for fast and accurate segmentation of subcortical structures in T1-weighted magnetic resonance images. Developed from the seminal FCN with an end-to-end learning-based approach and constructed by newly designed dense blocks including a dense fully-connected layer, the proposed FD-FCN is different from other FCN-based methods and leads to an outperformance in the perspective of both efficiency and accuracy. Compared with the U-shaped architecture, FD-FCN discards the upsampling path for model fitness. To alleviate the problem of parameter explosion, the inputs of dense blocks are no longer directly passed to subsequent layers. This architecture of FD-FCN brings a great reduction on both memory and time consumption in training process. Although FD-FCN is slimmed down, in model competence it gains better capability of dense inference than other conventional networks. This benefits from the construction of network architecture and the incorporation of redesigned dense blocks. The multi-scale FD-FCN models both local and global context by embedding intermediate-layer outputs in the final prediction, which encourages consistency between features extracted at different scales and embeds fine-grained information directly in the segmentation process. In addition, dense blocks are rebuilt to enlarge the receptive fields without significantly increasing parameters, and spectral coordinates are exploited for spatial context of the original input patch. The experiments were performed over the IBSR dataset, and FD-FCN produced an accurate segmentation result of overall Dice overlap value of 89.81% for 11 brain structures in 53 seconds, with at least 3.66% absolute improvement of dice accuracy than state-of-the-art 3D FCN-based methods.
Tasks Semantic Segmentation
Published 2019-07-22
URL https://arxiv.org/abs/1907.09194v1
PDF https://arxiv.org/pdf/1907.09194v1.pdf
PWC https://paperswithcode.com/paper/fd-fcn-3d-fully-dense-and-fully-convolutional
Repo
Framework

Bayesian Parznets for Robust Speech Recognition in the Waveform Domain

Title Bayesian Parznets for Robust Speech Recognition in the Waveform Domain
Authors Dino Oglic, Zoran Cvetkovic, Peter Sollich
Abstract We propose a novel family of band-pass filters for efficient spectral decomposition of signals. Previous work has already established the effectiveness of representations based on static band-pass filtering of speech signals (e.g., mel-frequency cepstral coefficients and deep scattering spectrum). A potential shortcoming of these approaches is the fact that the parameters specifying such a representation are fixed a priori and not learned using the available data. To address this limitation, we propose a family of filters defined via cosine modulations of Parzen windows, where the modulation frequency models the center of a spectral band-pass filter and the length of a Parzen window is inversely proportional to its bandwidth. We propose to learn these filters as part of a multilayer convolutional operator using stochastic variational inference based on Gaussian dropout posteriors and sparsity inducing priors. Such a prior leads to an intractable integral defining the Kullback–Leibler divergence term for which we propose an effective approximation based on the Gauss–Hermite quadrature. Our empirical results demonstrate that modulation filter-learning can be statistically significantly more effective than static band-pass filtering on continuous speech recognition from raw speech. This is also the first work to achieve state-of-the-art results on speech recognition using variational inference.
Tasks Bayesian Inference, Robust Speech Recognition, Speech Recognition
Published 2019-06-23
URL https://arxiv.org/abs/1906.09526v2
PDF https://arxiv.org/pdf/1906.09526v2.pdf
PWC https://paperswithcode.com/paper/parzen-filters-for-spectral-decomposition-of
Repo
Framework
comments powered by Disqus