Paper Group ANR 1705
Independent and automatic evaluation of acoustic-to-articulatory inversion models. What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis. Transferring neural speech waveform synthesizers to musical instrument sounds generation. Location-Relative Attention Mechanisms For Robust Long-Form Speech S …
Independent and automatic evaluation of acoustic-to-articulatory inversion models
Title | Independent and automatic evaluation of acoustic-to-articulatory inversion models |
Authors | Parrot Maud, Millet Juliette, Dunbar Ewan |
Abstract | Reconstruction of articulatory trajectories from the acoustic speech signal has been proposed for improving speech recognition and text-to-speech synthesis. However, to be useful in these settings, articulatory reconstruction must be speaker independent. Furthermore, as most research focuses on single, small datasets with few speakers, robust articulatory reconstrucion could profit from combining datasets. Standard evaluation measures such as root mean square error and Pearson correlation are inappropriate for evaluating the speaker-independence of models or the usefulness of combining datasets. We present a new evaluation for articulatory reconstruction which is independent of the articulatory data set used for training: the phone discrimination ABX task. We use the ABX measure to evaluate a Bi-LSTM based model trained on 3 datasets (14 speakers), and show that it gives information complementary to the standard measures, and enables us to evaluate the effects of dataset merging, as well as the speaker independence of the model. |
Tasks | Speech Recognition, Speech Synthesis, Text-To-Speech Synthesis |
Published | 2019-11-15 |
URL | https://arxiv.org/abs/1911.06573v1 |
https://arxiv.org/pdf/1911.06573v1.pdf | |
PWC | https://paperswithcode.com/paper/independent-and-automatic-evaluation-of |
Repo | |
Framework | |
What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis
Title | What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis |
Authors | Chung-Yi Li, Pei-Chieh Yuan, Hung-Yi Lee |
Abstract | End-to-end speech recognition systems have achieved competitive results compared to traditional systems. However, the complex transformations involved between layers given highly variable acoustic signals are hard to analyze. In this paper, we present our ASR probing model, which synthesizes speech from hidden representations of end-to-end ASR to examine the information maintain after each layer calculation. Listening to the synthesized speech, we observe gradual removal of speaker variability and noise as the layer goes deeper, which aligns with the previous studies on how deep network functions in speech recognition. This paper is the first study analyzing the end-to-end speech recognition model by demonstrating what each layer hears. Speaker verification and speech enhancement measurements on synthesized speech are also conducted to confirm our observation further. |
Tasks | End-To-End Speech Recognition, Speaker Verification, Speech Enhancement, Speech Recognition, Speech Synthesis |
Published | 2019-11-04 |
URL | https://arxiv.org/abs/1911.01102v1 |
https://arxiv.org/pdf/1911.01102v1.pdf | |
PWC | https://paperswithcode.com/paper/what-does-a-network-layer-hear-analyzing |
Repo | |
Framework | |
Transferring neural speech waveform synthesizers to musical instrument sounds generation
Title | Transferring neural speech waveform synthesizers to musical instrument sounds generation |
Authors | Yi Zhao, Xin Wang, Lauri Juvela, Junichi Yamagishi |
Abstract | Recent neural waveform synthesizers such as WaveNet, WaveGlow, and the neural-source-filter (NSF) model have shown good performance in speech synthesis despite their different methods of waveform generation. The similarity between speech and music audio synthesis techniques suggests interesting avenues to explore in terms of the best way to apply speech synthesizers in the music domain. This work compares three neural synthesizers used for musical instrument sounds generation under three scenarios: training from scratch on music data, zero-shot learning from the speech domain, and fine-tuning-based adaptation from the speech to the music domain. The results of a large-scale perceptual test demonstrated that the performance of three synthesizers improved when they were pre-trained on speech data and fine-tuned on music data, which indicates the usefulness of knowledge from speech data for music audio generation. Among the synthesizers, WaveGlow showed the best potential in zero-shot learning while NSF performed best in the other scenarios and could generate samples that were perceptually close to natural audio. |
Tasks | Audio Generation, Speech Synthesis, Zero-Shot Learning |
Published | 2019-10-27 |
URL | https://arxiv.org/abs/1910.12381v2 |
https://arxiv.org/pdf/1910.12381v2.pdf | |
PWC | https://paperswithcode.com/paper/transferring-neural-speech-waveform |
Repo | |
Framework | |
Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Title | Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis |
Authors | Eric Battenberg, RJ Skerry-Ryan, Soroosh Mariooryad, Daisy Stanton, David Kao, Matt Shannon, Tom Bagby |
Abstract | Despite the ability to produce human-level speech for in-domain text, attention-based end-to-end text-to-speech (TTS) systems suffer from text alignment failures that increase in frequency for out-of-domain text. We show that these failures can be addressed using simple location-relative attention mechanisms that do away with content-based query/key comparisons. We compare two families of attention mechanisms: location-relative GMM-based mechanisms and additive energy-based mechanisms. We suggest simple modifications to GMM-based attention that allow it to align quickly and consistently during training, and introduce a new location-relative attention mechanism to the additive energy-based family, called Dynamic Convolution Attention (DCA). We compare the various mechanisms in terms of alignment speed and consistency during training, naturalness, and ability to generalize to long utterances, and conclude that GMM attention and DCA can generalize to very long utterances, while preserving naturalness for shorter, in-domain utterances. |
Tasks | Speech Synthesis |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10288v1 |
https://arxiv.org/pdf/1910.10288v1.pdf | |
PWC | https://paperswithcode.com/paper/location-relative-attention-mechanisms-for |
Repo | |
Framework | |
MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies
Title | MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies |
Authors | Xue Bin Peng, Michael Chang, Grace Zhang, Pieter Abbeel, Sergey Levine |
Abstract | Humans are able to perform a myriad of sophisticated tasks by drawing upon skills acquired through prior experience. For autonomous agents to have this capability, they must be able to extract reusable skills from past experience that can be recombined in new ways for subsequent tasks. Furthermore, when controlling complex high-dimensional morphologies, such as humanoid bodies, tasks often require coordination of multiple skills simultaneously. Learning discrete primitives for every combination of skills quickly becomes prohibitive. Composable primitives that can be recombined to create a large variety of behaviors can be more suitable for modeling this combinatorial explosion. In this work, we propose multiplicative compositional policies (MCP), a method for learning reusable motor skills that can be composed to produce a range of complex behaviors. Our method factorizes an agent’s skills into a collection of primitives, where multiple primitives can be activated simultaneously via multiplicative composition. This flexibility allows the primitives to be transferred and recombined to elicit new behaviors as necessary for novel tasks. We demonstrate that MCP is able to extract composable skills for highly complex simulated characters from pre-training tasks, such as motion imitation, and then reuse these skills to solve challenging continuous control tasks, such as dribbling a soccer ball to a goal, and picking up an object and transporting it to a target location. |
Tasks | Continuous Control |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09808v1 |
https://arxiv.org/pdf/1905.09808v1.pdf | |
PWC | https://paperswithcode.com/paper/mcp-learning-composable-hierarchical-control |
Repo | |
Framework | |
ExplaiNE: An Approach for Explaining Network Embedding-based Link Predictions
Title | ExplaiNE: An Approach for Explaining Network Embedding-based Link Predictions |
Authors | Bo Kang, Jefrey Lijffijt, Tijl De Bie |
Abstract | Networks are powerful data structures, but are challenging to work with for conventional machine learning methods. Network Embedding (NE) methods attempt to resolve this by learning vector representations for the nodes, for subsequent use in downstream machine learning tasks. Link Prediction (LP) is one such downstream machine learning task that is an important use case and popular benchmark for NE methods. Unfortunately, while NE methods perform exceedingly well at this task, they are lacking in transparency as compared to simpler LP approaches. We introduce ExplaiNE, an approach to offer counterfactual explanations for NE-based LP methods, by identifying existing links in the network that explain the predicted links. ExplaiNE is applicable to a broad class of NE algorithms. An extensive empirical evaluation for the NE method `Conditional Network Embedding’ in particular demonstrates its accuracy and scalability. | |
Tasks | Link Prediction, Network Embedding |
Published | 2019-04-22 |
URL | http://arxiv.org/abs/1904.12694v1 |
http://arxiv.org/pdf/1904.12694v1.pdf | |
PWC | https://paperswithcode.com/paper/190412694 |
Repo | |
Framework | |
Randomized Exploration for Non-Stationary Stochastic Linear Bandits
Title | Randomized Exploration for Non-Stationary Stochastic Linear Bandits |
Authors | Baekjin Kim, Ambuj Tewari |
Abstract | We investigate two perturbation approaches to overcome conservatism that optimism based algorithms chronically suffer from in practice. The first approach replaces optimism with a simple randomization when using confidence sets. The second one adds random perturbations to its current estimate before maximizing the expected reward. For non-stationary linear bandits, where each action is associated with a $d$-dimensional feature and the unknown parameter is time-varying with total variation $B_T$, we propose two randomized algorithms, Discounted Randomized LinUCB (D-RandLinUCB) and Discounted Linear Thompson Sampling (D-LinTS) via the two perturbation approaches. We highlight the statistical optimality versus computational efficiency trade-off between them in that the former asymptotically achieves the optimal dynamic regret $\tilde{\mathcal{O}}( d ^{2/3}B_T^{1/3} T^{2/3})$, but the latter is oracle-efficient with an extra logarithmic gap in number of arms compared to minimax-optimal dynamic regret. In a simulation study, both empirically show the outstanding performance in tackling conservatism issue that Discounted LinUCB (D-LinUCB) struggles with. |
Tasks | |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05695v3 |
https://arxiv.org/pdf/1912.05695v3.pdf | |
PWC | https://paperswithcode.com/paper/near-optimal-oracle-efficient-algorithms-for |
Repo | |
Framework | |
Face Detection with Feature Pyramids and Landmarks
Title | Face Detection with Feature Pyramids and Landmarks |
Authors | Samuel W. F. Earp, Pavit Noinongyao, Justin A. Cairns, Ankush Ganguly |
Abstract | Accurate face detection and facial landmark localization are crucial to any face recognition system. We present a series of three single-stage RCNNs with different sized backbones (MobileNetV2-25, MobileNetV2-100, and ResNet101) and a six-layer feature pyramid trained exclusively on the WIDER FACE dataset. We compare the face detection and landmark accuracies using eight context module architectures, four proposed by previous research and four modified versions. We find no evidence that any of the proposed architectures significantly overperform and postulate that the random initialization of the additional layers is at least of equal importance. To show this we present a model that achieves near state-of-the-art performance on WIDER FACE and also provides high accuracy landmarks with a simple context module. We also present results using MobileNetV2 backbones, which achieve over 90% average precision on the WIDER FACE hard validation set while being able to run in real-time. By comparing to other authors, we show that our models exceed the state-of-the-art for similar-sized RCNNs and match the performance of much heavier networks. |
Tasks | Face Alignment, Face Detection, Face Recognition |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00596v2 |
https://arxiv.org/pdf/1912.00596v2.pdf | |
PWC | https://paperswithcode.com/paper/face-detection-with-feature-pyramids-and |
Repo | |
Framework | |
Rarely-switching linear bandits: optimization of causal effects for the real world
Title | Rarely-switching linear bandits: optimization of causal effects for the real world |
Authors | Benjamin Lansdell, Sofia Triantafillou, Konrad Kording |
Abstract | Excessively changing policies in many real world scenarios is difficult, unethical, or expensive. After all, doctor guidelines, tax codes, and price lists can only be reprinted so often. We may thus want to only change a policy when it is probable that the change is beneficial. In cases that a policy is a threshold on contextual variables we can estimate treatment effects for populations lying at the threshold. This allows for a schedule of incremental policy updates that let us optimize a policy while making few detrimental changes. Using this idea, and the theory of linear contextual bandits, we present a conservative policy updating procedure which updates a deterministic policy only when justified. We extend the theory of linear bandits to this rarely-switching case, proving that such procedures share the same regret, up to constant scaling, as the common LinUCB algorithm. However the algorithm makes far fewer changes to its policy and, of those changes, fewer are detrimental. We provide simulations and an analysis of an infant health well-being causal inference dataset, showing the algorithm efficiently learns a good policy with few changes. Our approach allows efficiently solving problems where changes are to be avoided, with potential applications in medicine, economics and beyond. |
Tasks | Causal Inference, Multi-Armed Bandits |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.13121v2 |
https://arxiv.org/pdf/1905.13121v2.pdf | |
PWC | https://paperswithcode.com/paper/rarely-switching-linear-bandits-optimization |
Repo | |
Framework | |
Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective
Title | Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective |
Authors | Tom Everitt, Marcus Hutter |
Abstract | Can an arbitrarily intelligent reinforcement learning agent be kept under control by a human user? Or do agents with sufficient intelligence inevitably find ways to shortcut their reward signal? This question impacts how far reinforcement learning can be scaled, and whether alternative paradigms must be developed in order to build safe artificial general intelligence. In this paper, we use an intuitive yet precise graphical model called causal influence diagrams to formalize reward tampering problems. We also describe a number of modifications to the reinforcement learning objective that prevent incentives for reward tampering. We verify the solutions using recently developed graphical criteria for inferring agent incentives from causal influence diagrams. Along the way, we also compare corrigibility and self-preservation properties of the various solutions, and discuss how they can be combined into a single agent without reward tampering incentives. |
Tasks | |
Published | 2019-08-13 |
URL | https://arxiv.org/abs/1908.04734v3 |
https://arxiv.org/pdf/1908.04734v3.pdf | |
PWC | https://paperswithcode.com/paper/reward-tampering-problems-and-solutions-in |
Repo | |
Framework | |
LPRNet: Lightweight Deep Network by Low-rank Pointwise Residual Convolution
Title | LPRNet: Lightweight Deep Network by Low-rank Pointwise Residual Convolution |
Authors | Bin Sun, Jun Li, Ming Shao, Yun Fu |
Abstract | Deep learning has become popular in recent years primarily due to the powerful computing device such as GPUs. However, deploying these deep models to end-user devices, smart phones, or embedded systems with limited resources is challenging. To reduce the computation and memory costs, we propose a novel lightweight deep learning module by low-rank pointwise residual (LPR) convolution, called LPRNet. Essentially, LPR aims at using low-rank approximation in pointwise convolution to further reduce the module size, while keeping depthwise convolutions as the residual module to rectify the LPR module. This is critical when the low-rankness undermines the convolution process. We embody our design by replacing modules of identical input-output dimension in MobileNet and ShuffleNetv2. Experiments on visual recognition tasks including image classification and face alignment on popular benchmarks show that our LPRNet achieves competitive performance but with significant reduction of Flops and memory cost compared to the state-of-the-art deep models focusing on model compression. |
Tasks | Face Alignment, Image Classification, Model Compression |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.11853v3 |
https://arxiv.org/pdf/1910.11853v3.pdf | |
PWC | https://paperswithcode.com/paper/lprnet-lightweight-deep-network-by-low-rank |
Repo | |
Framework | |
The Fuzzy ROC
Title | The Fuzzy ROC |
Authors | Giovanni Parmigiani |
Abstract | The fuzzy ROC extends Receiver Operating Curve (ROC) visualization to the situation where some data points, falling in an indeterminacy region, are not classified. It addresses two challenges: definition of sensitivity and specificity bounds under indeterminacy; and visual summarization of the large number of possibilities arising from different choices of indeterminacy zones. |
Tasks | |
Published | 2019-03-04 |
URL | http://arxiv.org/abs/1903.01868v1 |
http://arxiv.org/pdf/1903.01868v1.pdf | |
PWC | https://paperswithcode.com/paper/the-fuzzy-roc |
Repo | |
Framework | |
Capsule Networks with Max-Min Normalization
Title | Capsule Networks with Max-Min Normalization |
Authors | Zhen Zhao, Ashley Kleinhans, Gursharan Sandhu, Ishan Patel, K. P. Unnikrishnan |
Abstract | Capsule Networks (CapsNet) use the Softmax function to convert the logits of the routing coefficients into a set of normalized values that signify the assignment probabilities between capsules in adjacent layers. We show that the use of Softmax prevents capsule layers from forming optimal couplings between lower and higher-level capsules. Softmax constrains the dynamic range of the routing coefficients and leads to probabilities that remain mostly uniform after several routing iterations. Instead, we propose the use of Max-Min normalization. Max-Min performs a scale-invariant normalization of the logits that allows each lower-level capsule to take on an independent value, constrained only by the bounds of normalization. Max-Min provides consistent improvement in test accuracy across five datasets and allows more routing iterations without a decrease in network performance. A single CapsNet trained using Max-Min achieves an improved test error of 0.20% on the MNIST dataset. With a simple 3-model majority vote, we achieve a test error of 0.17% on MNIST. |
Tasks | |
Published | 2019-03-22 |
URL | http://arxiv.org/abs/1903.09662v1 |
http://arxiv.org/pdf/1903.09662v1.pdf | |
PWC | https://paperswithcode.com/paper/capsule-networks-with-max-min-normalization |
Repo | |
Framework | |
Truncated Cauchy Non-negative Matrix Factorization
Title | Truncated Cauchy Non-negative Matrix Factorization |
Authors | Naiyang Guan, Tongliang Liu, Yangmuzi Zhang, Dacheng Tao, Larry S. Davis |
Abstract | Non-negative matrix factorization (NMF) minimizes the Euclidean distance between the data matrix and its low rank approximation, and it fails when applied to corrupted data because the loss function is sensitive to outliers. In this paper, we propose a Truncated CauchyNMF loss that handle outliers by truncating large errors, and develop a Truncated CauchyNMF to robustly learn the subspace on noisy datasets contaminated by outliers. We theoretically analyze the robustness of Truncated CauchyNMF comparing with the competing models and theoretically prove that Truncated CauchyNMF has a generalization bound which converges at a rate of order $O(\sqrt{{\ln n}/{n}})$, where $n$ is the sample size. We evaluate Truncated CauchyNMF by image clustering on both simulated and real datasets. The experimental results on the datasets containing gross corruptions validate the effectiveness and robustness of Truncated CauchyNMF for learning robust subspaces. |
Tasks | Image Clustering |
Published | 2019-06-02 |
URL | https://arxiv.org/abs/1906.00495v1 |
https://arxiv.org/pdf/1906.00495v1.pdf | |
PWC | https://paperswithcode.com/paper/190600495 |
Repo | |
Framework | |
CroP: Color Constancy Benchmark Dataset Generator
Title | CroP: Color Constancy Benchmark Dataset Generator |
Authors | Nikola Banić, Karlo Koščević, Marko Subašić, Sven Lončarić |
Abstract | Implementing color constancy as a pre-processing step in contemporary digital cameras is of significant importance as it removes the influence of scene illumination on object colors. Several benchmark color constancy datasets have been created for the purpose of developing and testing new color constancy methods. However, they all have numerous drawbacks including a small number of images, erroneously extracted ground-truth illuminations, long histories of misuses, violations of their stated assumptions, etc. To overcome such and similar problems, in this paper a color constancy benchmark dataset generator is proposed. For a given camera sensor it enables generation of any number of realistic raw images taken in a subset of the real world, namely images of printed photographs. Datasets with such images share many positive features with other existing real-world datasets, while some of the negative features are completely eliminated. The generated images can be successfully used to train methods that afterward achieve high accuracy on real-world datasets. This opens the way for creating large enough datasets for advanced deep learning techniques. Experimental results are presented and discussed. The source code is available at http://www.fer.unizg.hr/ipg/resources/color_constancy/. |
Tasks | Color Constancy |
Published | 2019-03-29 |
URL | http://arxiv.org/abs/1903.12581v1 |
http://arxiv.org/pdf/1903.12581v1.pdf | |
PWC | https://paperswithcode.com/paper/crop-color-constancy-benchmark-dataset |
Repo | |
Framework | |