April 1, 2020

3154 words 15 mins read

Paper Group ANR 402

End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification. Self-supervised learning for audio-visual speaker diarization. Phoneme Boundary Detection using Learnable Segmental Features. RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning. Selfish Robustness and Equilibria in Mu …

End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification


Title	End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification
Authors	Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Yawen Xue, Kenji Nagamatsu
Abstract	The most common approach to speaker diarization is clustering of speaker embeddings. However, the clustering-based approach has a number of problems; i.e., (i) it is not optimized to minimize diarization errors directly, (ii) it cannot handle speaker overlaps correctly, and (iii) it has trouble adapting their speaker embedding models to real audio recordings with speaker overlaps. To solve these problems, we propose the End-to-End Neural Diarization (EEND), in which a neural network directly outputs speaker diarization results given a multi-speaker recording. To realize such an end-to-end model, we formulate the speaker diarization problem as a multi-label classification problem and introduce a permutation-free objective function to directly minimize diarization errors. Besides its end-to-end simplicity, the EEND method can explicitly handle speaker overlaps during training and inference. Just by feeding multi-speaker recordings with corresponding speaker segment labels, our model can be easily adapted to real conversations. We evaluated our method on simulated speech mixtures and real conversation datasets. The results showed that the EEND method outperformed the state-of-the-art x-vector clustering-based method, while it correctly handled speaker overlaps. We explored the neural network architecture for the EEND method, and found that the self-attention-based neural network was the key to achieving excellent performance. In contrast to conditioning the network only on its previous and next hidden states, as is done using bidirectional long short-term memory (BLSTM), self-attention is directly conditioned on all the frames. By visualizing the attention weights, we show that self-attention captures global speaker characteristics in addition to local speech activity dynamics, making it especially suitable for dealing with the speaker diarization problem.
Tasks	Multi-Label Classification, Speaker Diarization
Published	2020-02-24
URL	https://arxiv.org/abs/2003.02966v1
PDF	https://arxiv.org/pdf/2003.02966v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-neural-diarization-reformulating
Repo
Framework

Self-supervised learning for audio-visual speaker diarization


Title	Self-supervised learning for audio-visual speaker diarization
Authors	Yifan Ding, Yong Xu, Shi-Xiong Zhang, Yahuan Cong, Liqiang Wang
Abstract	Speaker diarization, which is to find the speech segments of specific speakers, has been widely used in human-centered applications such as video conferences or human-computer interaction systems. In this paper, we propose a self-supervised audio-video synchronization learning method to address the problem of speaker diarization without massive labeling effort. We improve the previous approaches by introducing two new loss functions: the dynamic triplet loss and the multinomial loss. We test them on a real-world human-computer interaction system and the results show our best model yields a remarkable gain of +8%F1-scoresas well as diarization error rate reduction. Finally, we introduce a new large scale audio-video corpus designed to fill the vacancy of audio-video datasets in Chinese.
Tasks	Speaker Diarization, Video Synchronization
Published	2020-02-13
URL	https://arxiv.org/abs/2002.05314v1
PDF	https://arxiv.org/pdf/2002.05314v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-learning-for-audio-visual
Repo
Framework

Phoneme Boundary Detection using Learnable Segmental Features


Title	Phoneme Boundary Detection using Learnable Segmental Features
Authors	Felix Kreuk, Yaniv Sheena, Joseph Keshet, Yossi Adi
Abstract	Phoneme boundary detection plays an essential first step for a variety of speech processing applications such as speaker diarization, speech science, keyword spotting, etc. In this work, we propose a neural architecture coupled with a parameterized structured loss function to learn segmental representations for the task of phoneme boundary detection. First, we evaluated our model when the spoken phonemes were not given as input. Results on the TIMIT and Buckeye corpora suggest that the proposed model is superior to the baseline models and reaches state-of-the-art performance in terms of F1 and R-value. We further explore the use of phonetic transcription as additional supervision and show this yields minor improvements in performance but substantially better convergence rates. We additionally evaluate the model on a Hebrew corpus and demonstrate such phonetic supervision can be beneficial in a multi-lingual setting.
Tasks	Boundary Detection, Keyword Spotting, Speaker Diarization
Published	2020-02-11
URL	https://arxiv.org/abs/2002.04992v2
PDF	https://arxiv.org/pdf/2002.04992v2.pdf
PWC	https://paperswithcode.com/paper/phoneme-boundary-detection-using-learnable
Repo
Framework

RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning


Title	RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning
Authors	Nan Jiang, Sheng Jin, Zhiyao Duan, Changshui Zhang
Abstract	This paper presents a deep reinforcement learning algorithm for online accompaniment generation, with potential for real-time interactive human-machine duet improvisation. Different from offline music generation and harmonization, online music accompaniment requires the algorithm to respond to human input and generate the machine counterpart in a sequential order. We cast this as a reinforcement learning problem, where the generation agent learns a policy to generate a musical note (action) based on previously generated context (state). The key of this algorithm is the well-functioning reward model. Instead of defining it using music composition rules, we learn this model from monophonic and polyphonic training data. This model considers the compatibility of the machine-generated note with both the machine-generated context and the human-generated context. Experiments show that this algorithm is able to respond to the human part and generate a melodic, harmonic and diverse machine part. Subjective evaluations on preferences show that the proposed algorithm generates music pieces of higher quality than the baseline method.
Tasks	Music Generation
Published	2020-02-08
URL	https://arxiv.org/abs/2002.03082v1
PDF	https://arxiv.org/pdf/2002.03082v1.pdf
PWC	https://paperswithcode.com/paper/rl-duet-online-music-accompaniment-generation
Repo
Framework

Selfish Robustness and Equilibria in Multi-Player Bandits


Title	Selfish Robustness and Equilibria in Multi-Player Bandits
Authors	Etienne Boursier, Vianney Perchet
Abstract	Motivated by cognitive radios, stochastic multi-player multi-armed bandits gained a lot of interest recently. In this class of problems, several players simultaneously pull arms and encounter a collision – with 0 reward – if some of them pull the same arm at the same time. While the cooperative case where players maximize the collective reward (obediently following some fixed protocol) has been mostly considered, robustness to malicious players is a crucial and challenging concern. Existing approaches consider only the case of adversarial jammers whose objective is to blindly minimize the collective reward. We shall consider instead the more natural class of selfish players whose incentives are to maximize their individual rewards, potentially at the expense of the social welfare. We provide the first algorithm robust to selfish players (a.k.a. Nash equilibrium) with a logarithmic regret, when the arm reward is observed. When collisions are also observed, Grim Trigger type of strategies enable some implicit communication-based algorithms and we construct robust algorithms in two different settings: in the homogeneous case (with a regret comparable to the centralized optimal one) and in the heterogeneous case (for an adapted and relevant notion of regret). We also provide impossibility results when only the reward is observed or when arm means vary arbitrarily among players.
Tasks	Multi-Armed Bandits
Published	2020-02-04
URL	https://arxiv.org/abs/2002.01197v1
PDF	https://arxiv.org/pdf/2002.01197v1.pdf
PWC	https://paperswithcode.com/paper/selfish-robustness-and-equilibria-in-multi
Repo
Framework

TEAM: An Taylor Expansion-Based Method for Generating Adversarial Examples


Title	TEAM: An Taylor Expansion-Based Method for Generating Adversarial Examples
Authors	Ya-guan Qian, Xi-Ming Zhang, Wassim Swaileh, Li Wei, Bin Wang, Jian-Hai Chen, Wu-Jie Zhou, Jing-Sheng Lei
Abstract	Although Deep Neural Networks(DNNs) have achieved successful applications in many fields, they are vulnerable to adversarial examples.Adversarial training is one of the most effective methods to improve the robustness of DNNs, and it is generally considered as solving a saddle point problem that minimizes risk and maximizes perturbation.Therefore, powerful adversarial examples can effectively replicate the situation of perturbation maximization to solve the saddle point problem.The method proposed in this paper approximates the output of DNNs in the input neighborhood by using the Taylor expansion, and then optimizes it by using the Lagrange multiplier method to generate adversarial examples. If it is used for adversarial training, the DNNs can be effectively regularized and the defects of the model can be improved.
Tasks
Published	2020-01-23
URL	https://arxiv.org/abs/2001.08389v2
PDF	https://arxiv.org/pdf/2001.08389v2.pdf
PWC	https://paperswithcode.com/paper/towards-robust-dnns-an-taylor-expansion-based
Repo
Framework

Maximizing the Total Reward via Reward Tweaking


Title	Maximizing the Total Reward via Reward Tweaking
Authors	Chen Tessler, Shie Mannor
Abstract	In reinforcement learning, the discount factor $\gamma$ controls the agent’s effective planning horizon. Traditionally, this parameter was considered part of the MDP; however, as deep reinforcement learning algorithms tend to become unstable when the effective planning horizon is long, recent works refer to $\gamma$ as a hyper-parameter. In this work, we focus on the finite-horizon setting and introduce \emph{reward tweaking}. Reward tweaking learns a surrogate reward function $\tilde r$ for the discounted setting, which induces an optimal (undiscounted) return in the original finite-horizon task. Theoretically, we show that there exists a surrogate reward which leads to optimality in the original task and discuss the robustness of our approach. Additionally, we perform experiments in a high-dimensional continuous control task and show that reward tweaking guides the agent towards better long-horizon returns when it plans for short horizons using the tweaked reward.
Tasks	Continuous Control
Published	2020-02-09
URL	https://arxiv.org/abs/2002.03327v1
PDF	https://arxiv.org/pdf/2002.03327v1.pdf
PWC	https://paperswithcode.com/paper/maximizing-the-total-reward-via-reward
Repo
Framework

Sentence Level Human Translation Quality Estimation with Attention-based Neural Networks


Title	Sentence Level Human Translation Quality Estimation with Attention-based Neural Networks
Authors	Yu Yuan, Serge Sharoff
Abstract	This paper explores the use of Deep Learning methods for automatic estimation of quality of human translations. Automatic estimation can provide useful feedback for translation teaching, examination and quality control. Conventional methods for solving this task rely on manually engineered features and external knowledge. This paper presents an end-to-end neural model without feature engineering, incorporating a cross attention mechanism to detect which parts in sentence pairs are most relevant for assessing quality. Another contribution concerns of prediction of fine-grained scores for measuring different aspects of translation quality. Empirical results on a large human annotated dataset show that the neural model outperforms feature-based methods significantly. The dataset and the tools are available.
Tasks	Feature Engineering
Published	2020-03-13
URL	https://arxiv.org/abs/2003.06381v1
PDF	https://arxiv.org/pdf/2003.06381v1.pdf
PWC	https://paperswithcode.com/paper/sentence-level-human-translation-quality
Repo
Framework

Structure-Adaptive Sequential Testing for Online False Discovery Rate Control


Title	Structure-Adaptive Sequential Testing for Online False Discovery Rate Control
Authors	Bowen Gang, Wenguang Sun, Weinan Wang
Abstract	Consider the online testing of a stream of hypotheses where a real–time decision must be made before the next data point arrives. The error rate is required to be controlled at {all} decision points. Conventional \emph{simultaneous testing rules} are no longer applicable due to the more stringent error constraints and absence of future data. Moreover, the online decision–making process may come to a halt when the total error budget, or alpha–wealth, is exhausted. This work develops a new class of structure–adaptive sequential testing (SAST) rules for online false discover rate (FDR) control. A key element in our proposal is a new alpha–investment algorithm that precisely characterizes the gains and losses in sequential decision making. SAST captures time varying structures of the data stream, learns the optimal threshold adaptively in an ongoing manner and optimizes the alpha-wealth allocation across different time periods. We present theory and numerical results to show that the proposed method is valid for online FDR control and achieves substantial power gain over existing online testing rules.
Tasks	Decision Making
Published	2020-02-28
URL	https://arxiv.org/abs/2003.00113v1
PDF	https://arxiv.org/pdf/2003.00113v1.pdf
PWC	https://paperswithcode.com/paper/structure-adaptive-sequential-testing-for
Repo
Framework

ENTMOOT: A Framework for Optimization over Ensemble Tree Models


Title	ENTMOOT: A Framework for Optimization over Ensemble Tree Models
Authors	Alexander Thebelt, Jan Kronqvist, Miten Mistry, Robert M. Lee, Nathan Sudermann-Merx, Ruth Misener
Abstract	Gradient boosted trees and other regression tree models perform well in a wide range of real-world, industrial applications. These tree models (i) offer insight into important prediction features, (ii) effectively manage sparse data, and (iii) have excellent prediction capabilities. Despite their advantages, they are generally unpopular for decision-making tasks and black-box optimization, which is due to their difficult-to-optimize structure and the lack of a reliable uncertainty measure. ENTMOOT is our new framework for integrating (already trained) tree models into larger optimization problems. The contributions of ENTMOOT include: (i) explicitly introducing a reliable uncertainty measure that is compatible with tree models, (ii) solving the larger optimization problems that incorporate these uncertainty aware tree models, (iii) proving that the solutions are globally optimal, i.e. no better solution exists. In particular, we show how the ENTMOOT approach allows a simple integration of tree models into decision-making and black-box optimization, where it proves as a strong competitor to commonly-used frameworks.
Tasks	Decision Making
Published	2020-03-10
URL	https://arxiv.org/abs/2003.04774v1
PDF	https://arxiv.org/pdf/2003.04774v1.pdf
PWC	https://paperswithcode.com/paper/entmoot-a-framework-for-optimization-over
Repo
Framework

Progressively-Growing AmbientGANs For Learning Stochastic Object Models From Imaging Measurements


Title	Progressively-Growing AmbientGANs For Learning Stochastic Object Models From Imaging Measurements
Authors	Weimin Zhou, Sayantan Bhadra, Frank J. Brooks, Hua Li, Mark A. Anastasio
Abstract	The objective optimization of medical imaging systems requires full characterization of all sources of randomness in the measured data, which includes the variability within the ensemble of objects to-be-imaged. This can be accomplished by establishing a stochastic object model (SOM) that describes the variability in the class of objects to-be-imaged. Generative adversarial networks (GANs) can be potentially useful to establish SOMs because they hold great promise to learn generative models that describe the variability within an ensemble of training data. However, because medical imaging systems record imaging measurements that are noisy and indirect representations of object properties, GANs cannot be directly applied to establish stochastic models of objects to-be-imaged. To address this issue, an augmented GAN architecture named AmbientGAN was developed to establish SOMs from noisy and indirect measurement data. However, because the adversarial training can be unstable, the applicability of the AmbientGAN can be potentially limited. In this work, we propose a novel training strategy—Progressive Growing of AmbientGANs (ProAGAN)—to stabilize the training of AmbientGANs for establishing SOMs from noisy and indirect imaging measurements. An idealized magnetic resonance (MR) imaging system and clinical MR brain images are considered. The proposed methodology is evaluated by comparing signal detection performance computed by use of ProAGAN-generated synthetic images and images that depict the true object properties.
Tasks
Published	2020-01-26
URL	https://arxiv.org/abs/2001.09523v1
PDF	https://arxiv.org/pdf/2001.09523v1.pdf
PWC	https://paperswithcode.com/paper/progressively-growing-ambientgans-for
Repo
Framework

Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets


Title	Discovering contemporaneous and lagged causal relations in autocorrelated nonlinear time series datasets
Authors	Jakob Runge
Abstract	We consider causal discovery from time series using conditional independence (CI) based network learning algorithms such as the PC algorithm. The PC algorithm is divided into a skeleton phase where adjacencies are determined based on efficiently selected CI tests and subsequent phases where links are oriented utilizing the Markov and Faithfulness assumptions. Here we show that autocorrelation makes the PC algorithm much less reliable with very low adjacency and orientation detection rates and inflated false positives. We propose a new algorithm, called PCMCI$^+$ that extends the PCMCI method from [Runge et al., 2019b] to also include discovery of contemporaneous links. It separates the skeleton phase for lagged and contemporaneous conditioning sets and modifies the conditioning sets for the individual CI tests. We show that this algorithm now benefits from increasing autocorrelation and yields much more adjacency detection power and especially more orientation recall for contemporaneous links while controlling false positives and having much shorter runtimes. Numerical experiments indicate that the algorithm can be of considerable use in many application scenarios for dozens of variables and large time delays.
Tasks	Causal Discovery, Time Series
Published	2020-03-07
URL	https://arxiv.org/abs/2003.03685v1
PDF	https://arxiv.org/pdf/2003.03685v1.pdf
PWC	https://paperswithcode.com/paper/discovering-contemporaneous-and-lagged-causal
Repo
Framework

Causality and Robust Optimization


Title	Causality and Robust Optimization
Authors	Akihiro Yabe
Abstract	A decision-maker must consider cofounding bias when attempting to apply machine learning prediction, and, while feature selection is widely recognized as important process in data-analysis, it could cause cofounding bias. A causal Bayesian network is a standard tool for describing causal relationships, and if relationships are known, then adjustment criteria can determine with which features cofounding bias disappears. A standard modification would thus utilize causal discovery algorithms for preventing cofounding bias in feature selection. Causal discovery algorithms, however, essentially rely on the faithfulness assumption, which turn out to be easily violated in practical feature selection settings. In this paper, we propose a meta-algorithm that can remedy existing feature selection algorithms in terms of cofounding bias. Our algorithm is induced from a novel adjustment criterion that requires rather than faithfulness, an assumption which can be induced from another well-known assumption of the causal sufficiency. We further prove that the features added through our modification convert cofounding bias into prediction variance. With the aid of existing robust optimization technologies that regularize risky strategies with high variance, then, we are able to successfully improve the throughput performance of decision-making optimization, as is shown in our experimental results.
Tasks	Causal Discovery, Decision Making, Feature Selection
Published	2020-02-28
URL	https://arxiv.org/abs/2002.12626v1
PDF	https://arxiv.org/pdf/2002.12626v1.pdf
PWC	https://paperswithcode.com/paper/causality-and-robust-optimization
Repo
Framework

Injecting Domain Knowledge in Neural Networks: a Controlled Experiment on a Constrained Problem


Title	Injecting Domain Knowledge in Neural Networks: a Controlled Experiment on a Constrained Problem
Authors	Mattia Silvestri, Michele Lombardi, Michela Milano
Abstract	Given enough data, Deep Neural Networks (DNNs) are capable of learning complex input-output relations with high accuracy. In several domains, however, data is scarce or expensive to retrieve, while a substantial amount of expert knowledge is available. It seems reasonable that if we can inject this additional information in the DNN, we could ease the learning process. One such case is that of Constraint Problems, for which declarative approaches exists and pure ML solutions have obtained mixed success. Using a classical constrained problem as a case study, we perform controlled experiments to probe the impact of progressively adding domain and empirical knowledge in the DNN. Our results are very encouraging, showing that (at least in our setup) embedding domain knowledge at training time can have a considerable effect and that a small amount of empirical knowledge is sufficient to obtain practically useful results.
Tasks
Published	2020-02-25
URL	https://arxiv.org/abs/2002.10742v1
PDF	https://arxiv.org/pdf/2002.10742v1.pdf
PWC	https://paperswithcode.com/paper/injecting-domain-knowledge-in-neural-networks
Repo
Framework

Disentanglement by Nonlinear ICA with General Incompressible-flow Networks (GIN)


Title	Disentanglement by Nonlinear ICA with General Incompressible-flow Networks (GIN)
Authors	Peter Sorrenson, Carsten Rother, Ullrich Köthe
Abstract	A central question of representation learning asks under which conditions it is possible to reconstruct the true latent variables of an arbitrarily complex generative process. Recent breakthrough work by Khemakhem et al. (2019) on nonlinear ICA has answered this question for a broad class of conditional generative processes. We extend this important result in a direction relevant for application to real-world data. First, we generalize the theory to the case of unknown intrinsic problem dimension and prove that in some special (but not very restrictive) cases, informative latent variables will be automatically separated from noise by an estimating model. Furthermore, the recovered informative latent variables will be in one-to-one correspondence with the true latent variables of the generating process, up to a trivial component-wise transformation. Second, we introduce a modification of the RealNVP invertible neural network architecture (Dinh et al. (2016)) which is particularly suitable for this type of problem: the General Incompressible-flow Network (GIN). Experiments on artificial data and EMNIST demonstrate that theoretical predictions are indeed verified in practice. In particular, we provide a detailed set of exactly 22 informative latent variables extracted from EMNIST.
Tasks	Representation Learning
Published	2020-01-14
URL	https://arxiv.org/abs/2001.04872v1
PDF	https://arxiv.org/pdf/2001.04872v1.pdf
PWC	https://paperswithcode.com/paper/disentanglement-by-nonlinear-ica-with-general
Repo
Framework