January 25, 2020

2940 words 14 mins read

Paper Group ANR 1705

Independent and automatic evaluation of acoustic-to-articulatory inversion models. What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis. Transferring neural speech waveform synthesizers to musical instrument sounds generation. Location-Relative Attention Mechanisms For Robust Long-Form Speech S …

Independent and automatic evaluation of acoustic-to-articulatory inversion models


Title	Independent and automatic evaluation of acoustic-to-articulatory inversion models
Authors	Parrot Maud, Millet Juliette, Dunbar Ewan
Abstract	Reconstruction of articulatory trajectories from the acoustic speech signal has been proposed for improving speech recognition and text-to-speech synthesis. However, to be useful in these settings, articulatory reconstruction must be speaker independent. Furthermore, as most research focuses on single, small datasets with few speakers, robust articulatory reconstrucion could profit from combining datasets. Standard evaluation measures such as root mean square error and Pearson correlation are inappropriate for evaluating the speaker-independence of models or the usefulness of combining datasets. We present a new evaluation for articulatory reconstruction which is independent of the articulatory data set used for training: the phone discrimination ABX task. We use the ABX measure to evaluate a Bi-LSTM based model trained on 3 datasets (14 speakers), and show that it gives information complementary to the standard measures, and enables us to evaluate the effects of dataset merging, as well as the speaker independence of the model.
Tasks	Speech Recognition, Speech Synthesis, Text-To-Speech Synthesis
Published	2019-11-15
URL	https://arxiv.org/abs/1911.06573v1
PDF	https://arxiv.org/pdf/1911.06573v1.pdf
PWC	https://paperswithcode.com/paper/independent-and-automatic-evaluation-of
Repo
Framework

What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis


Title	What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis
Authors	Chung-Yi Li, Pei-Chieh Yuan, Hung-Yi Lee
Abstract	End-to-end speech recognition systems have achieved competitive results compared to traditional systems. However, the complex transformations involved between layers given highly variable acoustic signals are hard to analyze. In this paper, we present our ASR probing model, which synthesizes speech from hidden representations of end-to-end ASR to examine the information maintain after each layer calculation. Listening to the synthesized speech, we observe gradual removal of speaker variability and noise as the layer goes deeper, which aligns with the previous studies on how deep network functions in speech recognition. This paper is the first study analyzing the end-to-end speech recognition model by demonstrating what each layer hears. Speaker verification and speech enhancement measurements on synthesized speech are also conducted to confirm our observation further.
Tasks	End-To-End Speech Recognition, Speaker Verification, Speech Enhancement, Speech Recognition, Speech Synthesis
Published	2019-11-04
URL	https://arxiv.org/abs/1911.01102v1
PDF	https://arxiv.org/pdf/1911.01102v1.pdf
PWC	https://paperswithcode.com/paper/what-does-a-network-layer-hear-analyzing
Repo
Framework

Transferring neural speech waveform synthesizers to musical instrument sounds generation


Title	Transferring neural speech waveform synthesizers to musical instrument sounds generation
Authors	Yi Zhao, Xin Wang, Lauri Juvela, Junichi Yamagishi
Abstract	Recent neural waveform synthesizers such as WaveNet, WaveGlow, and the neural-source-filter (NSF) model have shown good performance in speech synthesis despite their different methods of waveform generation. The similarity between speech and music audio synthesis techniques suggests interesting avenues to explore in terms of the best way to apply speech synthesizers in the music domain. This work compares three neural synthesizers used for musical instrument sounds generation under three scenarios: training from scratch on music data, zero-shot learning from the speech domain, and fine-tuning-based adaptation from the speech to the music domain. The results of a large-scale perceptual test demonstrated that the performance of three synthesizers improved when they were pre-trained on speech data and fine-tuned on music data, which indicates the usefulness of knowledge from speech data for music audio generation. Among the synthesizers, WaveGlow showed the best potential in zero-shot learning while NSF performed best in the other scenarios and could generate samples that were perceptually close to natural audio.
Tasks	Audio Generation, Speech Synthesis, Zero-Shot Learning
Published	2019-10-27
URL	https://arxiv.org/abs/1910.12381v2
PDF	https://arxiv.org/pdf/1910.12381v2.pdf
PWC	https://paperswithcode.com/paper/transferring-neural-speech-waveform
Repo
Framework

Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis


Title	Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Authors	Eric Battenberg, RJ Skerry-Ryan, Soroosh Mariooryad, Daisy Stanton, David Kao, Matt Shannon, Tom Bagby
Abstract	Despite the ability to produce human-level speech for in-domain text, attention-based end-to-end text-to-speech (TTS) systems suffer from text alignment failures that increase in frequency for out-of-domain text. We show that these failures can be addressed using simple location-relative attention mechanisms that do away with content-based query/key comparisons. We compare two families of attention mechanisms: location-relative GMM-based mechanisms and additive energy-based mechanisms. We suggest simple modifications to GMM-based attention that allow it to align quickly and consistently during training, and introduce a new location-relative attention mechanism to the additive energy-based family, called Dynamic Convolution Attention (DCA). We compare the various mechanisms in terms of alignment speed and consistency during training, naturalness, and ability to generalize to long utterances, and conclude that GMM attention and DCA can generalize to very long utterances, while preserving naturalness for shorter, in-domain utterances.
Tasks	Speech Synthesis
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10288v1
PDF	https://arxiv.org/pdf/1910.10288v1.pdf
PWC	https://paperswithcode.com/paper/location-relative-attention-mechanisms-for
Repo
Framework

MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies


Title	MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies
Authors	Xue Bin Peng, Michael Chang, Grace Zhang, Pieter Abbeel, Sergey Levine
Abstract	Humans are able to perform a myriad of sophisticated tasks by drawing upon skills acquired through prior experience. For autonomous agents to have this capability, they must be able to extract reusable skills from past experience that can be recombined in new ways for subsequent tasks. Furthermore, when controlling complex high-dimensional morphologies, such as humanoid bodies, tasks often require coordination of multiple skills simultaneously. Learning discrete primitives for every combination of skills quickly becomes prohibitive. Composable primitives that can be recombined to create a large variety of behaviors can be more suitable for modeling this combinatorial explosion. In this work, we propose multiplicative compositional policies (MCP), a method for learning reusable motor skills that can be composed to produce a range of complex behaviors. Our method factorizes an agent’s skills into a collection of primitives, where multiple primitives can be activated simultaneously via multiplicative composition. This flexibility allows the primitives to be transferred and recombined to elicit new behaviors as necessary for novel tasks. We demonstrate that MCP is able to extract composable skills for highly complex simulated characters from pre-training tasks, such as motion imitation, and then reuse these skills to solve challenging continuous control tasks, such as dribbling a soccer ball to a goal, and picking up an object and transporting it to a target location.
Tasks	Continuous Control
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09808v1
PDF	https://arxiv.org/pdf/1905.09808v1.pdf
PWC	https://paperswithcode.com/paper/mcp-learning-composable-hierarchical-control
Repo
Framework

ExplaiNE: An Approach for Explaining Network Embedding-based Link Predictions


Title	ExplaiNE: An Approach for Explaining Network Embedding-based Link Predictions
Authors	Bo Kang, Jefrey Lijffijt, Tijl De Bie
Abstract	Networks are powerful data structures, but are challenging to work with for conventional machine learning methods. Network Embedding (NE) methods attempt to resolve this by learning vector representations for the nodes, for subsequent use in downstream machine learning tasks. Link Prediction (LP) is one such downstream machine learning task that is an important use case and popular benchmark for NE methods. Unfortunately, while NE methods perform exceedingly well at this task, they are lacking in transparency as compared to simpler LP approaches. We introduce ExplaiNE, an approach to offer counterfactual explanations for NE-based LP methods, by identifying existing links in the network that explain the predicted links. ExplaiNE is applicable to a broad class of NE algorithms. An extensive empirical evaluation for the NE method `Conditional Network Embedding’ in particular demonstrates its accuracy and scalability. \|
Tasks	Link Prediction, Network Embedding
Published	2019-04-22
URL	http://arxiv.org/abs/1904.12694v1
PDF	http://arxiv.org/pdf/1904.12694v1.pdf
PWC	https://paperswithcode.com/paper/190412694
Repo
Framework

Randomized Exploration for Non-Stationary Stochastic Linear Bandits


Title	Randomized Exploration for Non-Stationary Stochastic Linear Bandits
Authors	Baekjin Kim, Ambuj Tewari
Abstract	We investigate two perturbation approaches to overcome conservatism that optimism based algorithms chronically suffer from in practice. The first approach replaces optimism with a simple randomization when using confidence sets. The second one adds random perturbations to its current estimate before maximizing the expected reward. For non-stationary linear bandits, where each action is associated with a $d$-dimensional feature and the unknown parameter is time-varying with total variation $B_T$, we propose two randomized algorithms, Discounted Randomized LinUCB (D-RandLinUCB) and Discounted Linear Thompson Sampling (D-LinTS) via the two perturbation approaches. We highlight the statistical optimality versus computational efficiency trade-off between them in that the former asymptotically achieves the optimal dynamic regret $\tilde{\mathcal{O}}( d ^{2/3}B_T^{1/3} T^{2/3})$, but the latter is oracle-efficient with an extra logarithmic gap in number of arms compared to minimax-optimal dynamic regret. In a simulation study, both empirically show the outstanding performance in tackling conservatism issue that Discounted LinUCB (D-LinUCB) struggles with.
Tasks
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05695v3
PDF	https://arxiv.org/pdf/1912.05695v3.pdf
PWC	https://paperswithcode.com/paper/near-optimal-oracle-efficient-algorithms-for
Repo
Framework

Face Detection with Feature Pyramids and Landmarks


Title	Face Detection with Feature Pyramids and Landmarks
Authors	Samuel W. F. Earp, Pavit Noinongyao, Justin A. Cairns, Ankush Ganguly
Abstract	Accurate face detection and facial landmark localization are crucial to any face recognition system. We present a series of three single-stage RCNNs with different sized backbones (MobileNetV2-25, MobileNetV2-100, and ResNet101) and a six-layer feature pyramid trained exclusively on the WIDER FACE dataset. We compare the face detection and landmark accuracies using eight context module architectures, four proposed by previous research and four modified versions. We find no evidence that any of the proposed architectures significantly overperform and postulate that the random initialization of the additional layers is at least of equal importance. To show this we present a model that achieves near state-of-the-art performance on WIDER FACE and also provides high accuracy landmarks with a simple context module. We also present results using MobileNetV2 backbones, which achieve over 90% average precision on the WIDER FACE hard validation set while being able to run in real-time. By comparing to other authors, we show that our models exceed the state-of-the-art for similar-sized RCNNs and match the performance of much heavier networks.
Tasks	Face Alignment, Face Detection, Face Recognition
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00596v2
PDF	https://arxiv.org/pdf/1912.00596v2.pdf
PWC	https://paperswithcode.com/paper/face-detection-with-feature-pyramids-and
Repo
Framework

Rarely-switching linear bandits: optimization of causal effects for the real world


Title	Rarely-switching linear bandits: optimization of causal effects for the real world
Authors	Benjamin Lansdell, Sofia Triantafillou, Konrad Kording
Abstract	Excessively changing policies in many real world scenarios is difficult, unethical, or expensive. After all, doctor guidelines, tax codes, and price lists can only be reprinted so often. We may thus want to only change a policy when it is probable that the change is beneficial. In cases that a policy is a threshold on contextual variables we can estimate treatment effects for populations lying at the threshold. This allows for a schedule of incremental policy updates that let us optimize a policy while making few detrimental changes. Using this idea, and the theory of linear contextual bandits, we present a conservative policy updating procedure which updates a deterministic policy only when justified. We extend the theory of linear bandits to this rarely-switching case, proving that such procedures share the same regret, up to constant scaling, as the common LinUCB algorithm. However the algorithm makes far fewer changes to its policy and, of those changes, fewer are detrimental. We provide simulations and an analysis of an infant health well-being causal inference dataset, showing the algorithm efficiently learns a good policy with few changes. Our approach allows efficiently solving problems where changes are to be avoided, with potential applications in medicine, economics and beyond.
Tasks	Causal Inference, Multi-Armed Bandits
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13121v2
PDF	https://arxiv.org/pdf/1905.13121v2.pdf
PWC	https://paperswithcode.com/paper/rarely-switching-linear-bandits-optimization
Repo
Framework

Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective


Title	Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective
Authors	Tom Everitt, Marcus Hutter
Abstract	Can an arbitrarily intelligent reinforcement learning agent be kept under control by a human user? Or do agents with sufficient intelligence inevitably find ways to shortcut their reward signal? This question impacts how far reinforcement learning can be scaled, and whether alternative paradigms must be developed in order to build safe artificial general intelligence. In this paper, we use an intuitive yet precise graphical model called causal influence diagrams to formalize reward tampering problems. We also describe a number of modifications to the reinforcement learning objective that prevent incentives for reward tampering. We verify the solutions using recently developed graphical criteria for inferring agent incentives from causal influence diagrams. Along the way, we also compare corrigibility and self-preservation properties of the various solutions, and discuss how they can be combined into a single agent without reward tampering incentives.
Tasks
Published	2019-08-13
URL	https://arxiv.org/abs/1908.04734v3
PDF	https://arxiv.org/pdf/1908.04734v3.pdf
PWC	https://paperswithcode.com/paper/reward-tampering-problems-and-solutions-in
Repo
Framework

LPRNet: Lightweight Deep Network by Low-rank Pointwise Residual Convolution


Title	LPRNet: Lightweight Deep Network by Low-rank Pointwise Residual Convolution
Authors	Bin Sun, Jun Li, Ming Shao, Yun Fu
Abstract	Deep learning has become popular in recent years primarily due to the powerful computing device such as GPUs. However, deploying these deep models to end-user devices, smart phones, or embedded systems with limited resources is challenging. To reduce the computation and memory costs, we propose a novel lightweight deep learning module by low-rank pointwise residual (LPR) convolution, called LPRNet. Essentially, LPR aims at using low-rank approximation in pointwise convolution to further reduce the module size, while keeping depthwise convolutions as the residual module to rectify the LPR module. This is critical when the low-rankness undermines the convolution process. We embody our design by replacing modules of identical input-output dimension in MobileNet and ShuffleNetv2. Experiments on visual recognition tasks including image classification and face alignment on popular benchmarks show that our LPRNet achieves competitive performance but with significant reduction of Flops and memory cost compared to the state-of-the-art deep models focusing on model compression.
Tasks	Face Alignment, Image Classification, Model Compression
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11853v3
PDF	https://arxiv.org/pdf/1910.11853v3.pdf
PWC	https://paperswithcode.com/paper/lprnet-lightweight-deep-network-by-low-rank
Repo
Framework

The Fuzzy ROC


Title	The Fuzzy ROC
Authors	Giovanni Parmigiani
Abstract	The fuzzy ROC extends Receiver Operating Curve (ROC) visualization to the situation where some data points, falling in an indeterminacy region, are not classified. It addresses two challenges: definition of sensitivity and specificity bounds under indeterminacy; and visual summarization of the large number of possibilities arising from different choices of indeterminacy zones.
Tasks
Published	2019-03-04
URL	http://arxiv.org/abs/1903.01868v1
PDF	http://arxiv.org/pdf/1903.01868v1.pdf
PWC	https://paperswithcode.com/paper/the-fuzzy-roc
Repo
Framework

Capsule Networks with Max-Min Normalization


Title	Capsule Networks with Max-Min Normalization
Authors	Zhen Zhao, Ashley Kleinhans, Gursharan Sandhu, Ishan Patel, K. P. Unnikrishnan
Abstract	Capsule Networks (CapsNet) use the Softmax function to convert the logits of the routing coefficients into a set of normalized values that signify the assignment probabilities between capsules in adjacent layers. We show that the use of Softmax prevents capsule layers from forming optimal couplings between lower and higher-level capsules. Softmax constrains the dynamic range of the routing coefficients and leads to probabilities that remain mostly uniform after several routing iterations. Instead, we propose the use of Max-Min normalization. Max-Min performs a scale-invariant normalization of the logits that allows each lower-level capsule to take on an independent value, constrained only by the bounds of normalization. Max-Min provides consistent improvement in test accuracy across five datasets and allows more routing iterations without a decrease in network performance. A single CapsNet trained using Max-Min achieves an improved test error of 0.20% on the MNIST dataset. With a simple 3-model majority vote, we achieve a test error of 0.17% on MNIST.
Tasks
Published	2019-03-22
URL	http://arxiv.org/abs/1903.09662v1
PDF	http://arxiv.org/pdf/1903.09662v1.pdf
PWC	https://paperswithcode.com/paper/capsule-networks-with-max-min-normalization
Repo
Framework

Truncated Cauchy Non-negative Matrix Factorization


Title	Truncated Cauchy Non-negative Matrix Factorization
Authors	Naiyang Guan, Tongliang Liu, Yangmuzi Zhang, Dacheng Tao, Larry S. Davis
Abstract	Non-negative matrix factorization (NMF) minimizes the Euclidean distance between the data matrix and its low rank approximation, and it fails when applied to corrupted data because the loss function is sensitive to outliers. In this paper, we propose a Truncated CauchyNMF loss that handle outliers by truncating large errors, and develop a Truncated CauchyNMF to robustly learn the subspace on noisy datasets contaminated by outliers. We theoretically analyze the robustness of Truncated CauchyNMF comparing with the competing models and theoretically prove that Truncated CauchyNMF has a generalization bound which converges at a rate of order $O(\sqrt{{\ln n}/{n}})$, where $n$ is the sample size. We evaluate Truncated CauchyNMF by image clustering on both simulated and real datasets. The experimental results on the datasets containing gross corruptions validate the effectiveness and robustness of Truncated CauchyNMF for learning robust subspaces.
Tasks	Image Clustering
Published	2019-06-02
URL	https://arxiv.org/abs/1906.00495v1
PDF	https://arxiv.org/pdf/1906.00495v1.pdf
PWC	https://paperswithcode.com/paper/190600495
Repo
Framework

CroP: Color Constancy Benchmark Dataset Generator


Title	CroP: Color Constancy Benchmark Dataset Generator
Authors	Nikola Banić, Karlo Koščević, Marko Subašić, Sven Lončarić
Abstract	Implementing color constancy as a pre-processing step in contemporary digital cameras is of significant importance as it removes the influence of scene illumination on object colors. Several benchmark color constancy datasets have been created for the purpose of developing and testing new color constancy methods. However, they all have numerous drawbacks including a small number of images, erroneously extracted ground-truth illuminations, long histories of misuses, violations of their stated assumptions, etc. To overcome such and similar problems, in this paper a color constancy benchmark dataset generator is proposed. For a given camera sensor it enables generation of any number of realistic raw images taken in a subset of the real world, namely images of printed photographs. Datasets with such images share many positive features with other existing real-world datasets, while some of the negative features are completely eliminated. The generated images can be successfully used to train methods that afterward achieve high accuracy on real-world datasets. This opens the way for creating large enough datasets for advanced deep learning techniques. Experimental results are presented and discussed. The source code is available at http://www.fer.unizg.hr/ipg/resources/color_constancy/.
Tasks	Color Constancy
Published	2019-03-29
URL	http://arxiv.org/abs/1903.12581v1
PDF	http://arxiv.org/pdf/1903.12581v1.pdf
PWC	https://paperswithcode.com/paper/crop-color-constancy-benchmark-dataset
Repo
Framework