April 2, 2020

3238 words 16 mins read

Paper Group ANR 370

Towards Mixture Proportion Estimation without Irreducibility. Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection. SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation. An adaptive data-driven approach to solve real-world vehicle routing problems in logistics. Bridging the Gap …

Towards Mixture Proportion Estimation without Irreducibility


Title	Towards Mixture Proportion Estimation without Irreducibility
Authors	Yu Yao, Tongliang Liu, Bo Han, Mingming Gong, Gang Niu, Masashi Sugiyama, Dacheng Tao
Abstract	\textit{Mixture proportion estimation} (MPE) is a fundamental problem of practical significance, where we are given data from only a \textit{mixture} and one of its two \textit{components} to identify the proportion of each component. All existing MPE methods that are distribution-independent explicitly or implicitly rely on the \textit{irreducible} assumption—the unobserved component is not a mixture containing the observable component. If this is not satisfied, those methods will lead to a critical estimation bias. In this paper, we propose \textit{Regrouping-MPE} that works without irreducible assumption: it builds a new irreducible MPE problem and solves the new problem. It is worthwhile to change the problem: we prove that if the assumption holds, our method will not affect anything; if the assumption does not hold, the bias from problem changing is less than the bias from violation of the irreducible assumption in the original problem. Experiments show that our method outperforms all state-of-the-art MPE methods on various real-world datasets.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03673v1
PDF	https://arxiv.org/pdf/2002.03673v1.pdf
PWC	https://paperswithcode.com/paper/towards-mixture-proportion-estimation-without
Repo
Framework

Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection


Title	Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection
Authors	Qian Chen, Mengzhe Chen, Bo Li, Wen Wang
Abstract	With the increased applications of automatic speech recognition (ASR) in recent years, it is essential to automatically insert punctuation marks and remove disfluencies in transcripts, to improve the readability of the transcripts as well as the performance of subsequent applications, such as machine translation, dialogue systems, and so forth. In this paper, we propose a Controllable Time-delay Transformer (CT-Transformer) model that jointly completes the punctuation prediction and disfluency detection tasks in real time. The CT-Transformer model facilitates freezing partial outputs with controllable time delay to fulfill the real-time constraints in partial decoding required by subsequent applications. We further propose a fast decoding strategy to minimize latency while maintaining competitive performance. Experimental results on the IWSLT2011 benchmark dataset and an in-house Chinese annotated dataset demonstrate that the proposed approach outperforms the previous state-of-the-art models on F-scores and achieves a competitive inference speed.
Tasks	Machine Translation, Speech Recognition
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01309v1
PDF	https://arxiv.org/pdf/2003.01309v1.pdf
PWC	https://paperswithcode.com/paper/controllable-time-delay-transformer-for-real
Repo
Framework

SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation


Title	SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation
Authors	Arya D. McCarthy, Liezl Puzon, Juan Pino
Abstract	We propose autoencoding speaker conversion for training data augmentation in automatic speech translation. This technique directly transforms an audio sequence, resulting in audio synthesized to resemble another speaker’s voice. Our method compares favorably to SpecAugment on English$\to$French and English$\to$Romanian automatic speech translation (AST) tasks as well as on a low-resource English automatic speech recognition (ASR) task. Further, in ablations, we show the benefits of both quantity and diversity in augmented data. Finally, we show that we can combine our approach with augmentation by machine-translated transcripts to obtain a competitive end-to-end AST model that outperforms a very strong cascade model on an English$\to$French AST task. Our method is sufficiently general that it can be applied to other speech generation and analysis tasks.
Tasks	Data Augmentation, Speech Recognition
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12231v1
PDF	https://arxiv.org/pdf/2002.12231v1.pdf
PWC	https://paperswithcode.com/paper/skinaugment-auto-encoding-speaker-conversions
Repo
Framework

An adaptive data-driven approach to solve real-world vehicle routing problems in logistics


Title	An adaptive data-driven approach to solve real-world vehicle routing problems in logistics
Authors	Emir Zunic, Dzenana Donko, Emir Buza
Abstract	Transportation occupies one-third of the amount in the logistics costs, and accordingly transportation systems largely influence the performance of the logistics system. This work presents an adaptive data-driven innovative modular approach for solving the real-world Vehicle Routing Problems (VRP) in the field of logistics. The work consists of two basic units: (i) an innovative multi-step algorithm for successful and entirely feasible solving of the VRP problems in logistics, (ii) an adaptive approach for adjusting and setting up parameters and constants of the proposed algorithm. The proposed algorithm combines several data transformation approaches, heuristics and Tabu search. Moreover, as the performance of the algorithm depends on the set of control parameters and constants, a predictive model that adaptively adjusts these parameters and constants according to historical data is proposed. A comparison of the acquired results has been made using the Decision Support System with predictive models: Generalized Linear Models (GLM) and Support Vector Machine (SVM). The algorithm, along with the control parameters, which using the prediction method were acquired, was incorporated into a web-based enterprise system, which is in use in several big distribution companies in Bosnia and Herzegovina. The results of the proposed algorithm were compared with a set of benchmark instances and validated over real benchmark instances as well. The successful feasibility of the given routes, in a real environment, is also presented.
Tasks
Published	2020-01-05
URL	https://arxiv.org/abs/2001.02094v1
PDF	https://arxiv.org/pdf/2001.02094v1.pdf
PWC	https://paperswithcode.com/paper/an-adaptive-data-driven-approach-to-solve
Repo
Framework

Bridging the Gap between Spatial and Spectral Domains: A Survey on Graph Neural Networks


Title	Bridging the Gap between Spatial and Spectral Domains: A Survey on Graph Neural Networks
Authors	Zhiqian Chen, Fanglan Chen, Lei Zhang, Taoran Ji, Kaiqun Fu, Liang Zhao, Feng Chen, Chang-Tien Lu
Abstract	The success of deep learning has been widely recognized in many machine learning tasks during the last decades, ranging from image classification and speech recognition to natural language understanding. As an extension of deep learning, Graph neural networks (GNNs) are designed to solve the non-Euclidean problems on graph-structured data which can hardly be handled by general deep learning techniques. Existing GNNs under various mechanisms, such as random walk, PageRank, graph convolution, and heat diffusion, are designed for different types of graphs and problems, which makes it difficult to compare them directly. Previous GNN surveys focus on categorizing current models into independent groups, lacking analysis regarding their internal connection. This paper proposes a unified framework and provides a novel perspective that can widely fit existing GNNs into our framework methodologically. Specifically, we survey and categorize existing GNN models into the spatial and spectral domains, and reveal connections among subcategories in each domain. Further analysis establishes a strong link across the spatial and spectral domains.
Tasks	Image Classification, Speech Recognition
Published	2020-02-27
URL	https://arxiv.org/abs/2002.11867v2
PDF	https://arxiv.org/pdf/2002.11867v2.pdf
PWC	https://paperswithcode.com/paper/bridging-the-gap-between-spatial-and-spectral
Repo
Framework

The Fluidity of Concept Representations in Human Brain Signals


Title	The Fluidity of Concept Representations in Human Brain Signals
Authors	Eva Hendrikx, Lisa Beinborn
Abstract	Cognitive theories of human language processing often distinguish between concrete and abstract concepts. In this work, we analyze the discriminability of concrete and abstract concepts in fMRI data using a range of analysis methods. We find that the distinction can be decoded from the signal with an accuracy significantly above chance, but it is not found to be a relevant structuring factor in clustering and relational analyses. From our detailed comparison, we obtain the impression that human concept representations are more fluid than dichotomous categories can capture. We argue that fluid concept representations lead to more realistic models of human language processing because they better capture the ambiguity and underspecification present in natural language use.
Tasks
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08880v1
PDF	https://arxiv.org/pdf/2002.08880v1.pdf
PWC	https://paperswithcode.com/paper/the-fluidity-of-concept-representations-in
Repo
Framework

Multimodal active speaker detection and virtual cinematography for video conferencing


Title	Multimodal active speaker detection and virtual cinematography for video conferencing
Authors	Ross Cutler, Ramin Mehran, Sam Johnson, Cha Zhang, Adam Kirk, Oliver Whyte, Adarsh Kowdle
Abstract	Active speaker detection (ASD) and virtual cinematography (VC) can significantly improve the remote user experience of a video conference by automatically panning, tilting and zooming of a video conferencing camera: users subjectively rate an expert video cinematographer’s video significantly higher than unedited video. We describe a new automated ASD and VC that performs within 0.3 MOS of an expert cinematographer based on subjective ratings with a 1-5 scale. This system uses a 4K wide-FOV camera, a depth camera, and a microphone array; it extracts features from each modality and trains an ASD using an AdaBoost machine learning system that is very efficient and runs in real-time. A VC is similarly trained using machine learning to optimize the subjective quality of the overall experience. To avoid distracting the room participants and reduce switching latency the system has no moving parts – the VC works by cropping and zooming the 4K wide-FOV video stream. The system was tuned and evaluated using extensive crowdsourcing techniques and evaluated on a dataset with N=100 meetings, each 2-5 minutes in length.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03977v2
PDF	https://arxiv.org/pdf/2002.03977v2.pdf
PWC	https://paperswithcode.com/paper/multimodal-active-speaker-detection-and
Repo
Framework

Boosting Simple Learners


Title	Boosting Simple Learners
Authors	Noga Alon, Alon Gonen, Elad Hazan, Shay Moran
Abstract	We consider boosting algorithms under the restriction that the weak learners come from a class of bounded VC-dimension. In this setting, we focus on two main questions: (i) \underline{Oracle Complexity:} we show that the restriction on the complexity of the weak learner significantly improves the number of calls to the weak learner. We describe a boosting procedure which makes only~$\tilde O(1/\gamma)$ calls to the weak learner, where $\gamma$ denotes the weak learner’s advantage. This circumvents a lower bound of $\Omega(1/\gamma^2)$ due to Freund and Schapire (‘95, ‘12) for the general case. Unlike previous boosting algorithms which aggregate the weak hypotheses by majority votes, our method use more complex aggregation rules, and we show this to be necessary. (ii) \underline{Expressivity:} we consider the question of what can be learned by boosting weak hypotheses of bounded VC-dimension? Towards this end we identify a combinatorial-geometric parameter called the $\gamma$-VC dimension which quantifies the expressivity of a class of weak hypotheses when used as part of a boosting procedure. We explore the limits of the $\gamma$-VC dimension and compute it for well-studied classes such as halfspaces and decision stumps. Along the way, we establish and exploit connections with {\it Discrepancy theory}.
Tasks
Published	2020-01-31
URL	https://arxiv.org/abs/2001.11704v1
PDF	https://arxiv.org/pdf/2001.11704v1.pdf
PWC	https://paperswithcode.com/paper/boosting-simple-learners
Repo
Framework

A Developmental Neuro-Robotics Approach for Boosting the Recognition of Handwritten Digits


Title	A Developmental Neuro-Robotics Approach for Boosting the Recognition of Handwritten Digits
Authors	Alessandro Di Nuovo
Abstract	Developmental psychology and neuroimaging research identified a close link between numbers and fingers, which can boost the initial number knowledge in children. Recent evidence shows that a simulation of the children’s embodied strategies can improve the machine intelligence too. This article explores the application of embodied strategies to convolutional neural network models in the context of developmental neuro-robotics, where the training information is likely to be gradually acquired while operating rather than being abundant and fully available as the classical machine learning scenarios. The experimental analyses show that the proprioceptive information from the robot fingers can improve network accuracy in the recognition of handwritten Arabic digits when training examples and epochs are few. This result is comparable to brain imaging and longitudinal studies with young children. In conclusion, these findings also support the relevance of the embodiment in the case of artificial agents’ training and show a possible way for the humanization of the learning process, where the robotic body can express the internal processes of artificial intelligence making it more understandable for humans.
Tasks
Published	2020-03-23
URL	https://arxiv.org/abs/2003.10308v1
PDF	https://arxiv.org/pdf/2003.10308v1.pdf
PWC	https://paperswithcode.com/paper/a-developmental-neuro-robotics-approach-for
Repo
Framework

GATCluster: Self-Supervised Gaussian-Attention Network for Image Clustering


Title	GATCluster: Self-Supervised Gaussian-Attention Network for Image Clustering
Authors	Chuang Niu, Jun Zhang, Ge Wang, Jimin Liang
Abstract	Deep clustering has achieved state-of-the-art results via joint representation learning and clustering, but still has an inferior performance for the real scene images, e.g., those in ImageNet. With such images, deep clustering methods face several challenges, including extracting discriminative features, avoiding trivial solutions, capturing semantic information, and performing on large-size image datasets. To address these problems, here we propose a self-supervised attention network for image clustering (AttentionCluster). Rather than extracting intermediate features first and then performing the traditional clustering algorithm, AttentionCluster directly outputs semantic cluster labels that are more discriminative than intermediate features and does not need further post-processing. To train the AttentionCluster in a completely unsupervised manner, we design four learning tasks with the constraints of transformation invariance, separability maximization, entropy analysis, and attention mapping. Specifically, the transformation invariance and separability maximization tasks learn the relationships between sample pairs. The entropy analysis task aims to avoid trivial solutions. To capture the object-oriented semantics, we design a self-supervised attention mechanism that includes a parameterized attention module and a soft-attention loss. All the guiding signals for clustering are self-generated during the training process. Moreover, we develop a two-step learning algorithm that is training-friendly and memory-efficient for processing large-size images. Extensive experiments demonstrate the superiority of our proposed method in comparison with the state-of-the-art image clustering benchmarks.
Tasks	Image Clustering, Representation Learning
Published	2020-02-27
URL	https://arxiv.org/abs/2002.11863v1
PDF	https://arxiv.org/pdf/2002.11863v1.pdf
PWC	https://paperswithcode.com/paper/gatcluster-self-supervised-gaussian-attention
Repo
Framework

Fast Differentiable Sorting and Ranking


Title	Fast Differentiable Sorting and Ranking
Authors	Mathieu Blondel, Olivier Teboul, Quentin Berthet, Josip Djolonga
Abstract	The sorting operation is one of the most basic and commonly used building blocks in computer programming. In machine learning, it is commonly used for robust statistics. However, seen as a function, it is piecewise linear and as a result includes many kinks at which it is non-differentiable. More problematic is the related ranking operator, commonly used for order statistics and ranking metrics. It is a piecewise constant function, meaning that its derivatives are null or undefined. While numerous works have proposed differentiable proxies to sorting and ranking, they do not achieve the $O(n \log n)$ time complexity one would expect from sorting and ranking operations. In this paper, we propose the first differentiable sorting and ranking operators with $O(n \log n)$ time and $O(n)$ space complexity. Our proposal in addition enjoys exact computation and differentiation. We achieve this feat by constructing differentiable sorting and ranking operators as projections onto the permutahedron, the convex hull of permutations, and using a reduction to isotonic optimization. Empirically, we confirm that our approach is an order of magnitude faster than existing approaches and showcase two novel applications: differentiable Spearman’s rank correlation coefficient and soft least trimmed squares.
Tasks
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08871v1
PDF	https://arxiv.org/pdf/2002.08871v1.pdf
PWC	https://paperswithcode.com/paper/fast-differentiable-sorting-and-ranking
Repo
Framework

Deep Image Clustering with Tensor Kernels and Unsupervised Companion Objectives


Title	Deep Image Clustering with Tensor Kernels and Unsupervised Companion Objectives
Authors	Daniel J. Trosten, Michael C. Kampffmeyer, Robert Jenssen
Abstract	In this paper we develop a new model for deep image clustering, using convolutional neural networks and tensor kernels. The proposed Deep Tensor Kernel Clustering (DTKC) consists of a convolutional neural network (CNN), which is trained to reflect a common cluster structure at the output of its intermediate layers. Encouraging a consistent cluster structure throughout the network has the potential to guide it towards meaningful clusters, even though these clusters might appear to be nonlinear in the input space. The cluster structure is enforced through the idea of unsupervised companion objectives, where separate loss functions are attached to layers in the network. These unsupervised companion objectives are constructed based on a proposed generalization of the Cauchy-Schwarz (CS) divergence, from vectors to tensors of arbitrary rank. Generalizing the CS divergence to tensor-valued data is a crucial step, due to the tensorial nature of the intermediate representations in the CNN. Several experiments are conducted to thoroughly assess the performance of the proposed DTKC model. The results indicate that the model outperforms, or performs comparable to, a wide range of baseline algorithms. We also empirically demonstrate that our model does not suffer from objective function mismatch, which can be a problematic artifact in autoencoder-based clustering models.
Tasks	Image Clustering
Published	2020-01-20
URL	https://arxiv.org/abs/2001.07026v1
PDF	https://arxiv.org/pdf/2001.07026v1.pdf
PWC	https://paperswithcode.com/paper/deep-image-clustering-with-tensor-kernels-and
Repo
Framework

Contextual Lensing of Universal Sentence Representations


Title	Contextual Lensing of Universal Sentence Representations
Authors	Jamie Kiros
Abstract	What makes a universal sentence encoder universal? The notion of a generic encoder of text appears to be at odds with the inherent contextualization and non-permanence of language use in a dynamic world. However, mapping sentences into generic fixed-length vectors for downstream similarity and retrieval tasks has been fruitful, particularly for multilingual applications. How do we manage this dilemma? In this work we propose Contextual Lensing, a methodology for inducing context-oriented universal sentence vectors. We break the construction of universal sentence vectors into a core, variable length, sentence matrix representation equipped with an adaptable `lens’ from which fixed-length vectors can be induced as a function of the lens context. We show that it is possible to focus notions of language similarity into a small number of lens parameters given a core universal matrix representation. For example, we demonstrate the ability to encode translation similarity of sentences across several languages into a single weight matrix, even when the core encoder has not seen parallel data. \|
Tasks
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08866v1
PDF	https://arxiv.org/pdf/2002.08866v1.pdf
PWC	https://paperswithcode.com/paper/contextual-lensing-of-universal-sentence
Repo
Framework

Hydra: Preserving Ensemble Diversity for Model Distillation


Title	Hydra: Preserving Ensemble Diversity for Model Distillation
Authors	Linh Tran, Bastiaan S. Veeling, Kevin Roth, Jakub Swiatkowski, Joshua V. Dillon, Jasper Snoek, Stephan Mandt, Tim Salimans, Sebastian Nowozin, Rodolphe Jenatton
Abstract	Ensembles of models have been empirically shown to improve predictive performance and to yield robust measures of uncertainty. However, they are expensive in computation and memory. Therefore, recent research has focused on distilling ensembles into a single compact model, reducing the computational and memory burden of the ensemble while trying to preserve its predictive behavior. Most existing distillation formulations summarize the ensemble by capturing its average predictions. As a result, the diversity of the ensemble predictions, stemming from each individual member, is lost. Thus, the distilled model cannot provide a measure of uncertainty comparable to that of the original ensemble. To retain more faithfully the diversity of the ensemble, we propose a distillation method based on a single multi-headed neural network, which we refer to as Hydra. The shared body network learns a joint feature representation that enables each head to capture the predictive behavior of each ensemble member. We demonstrate that with a slight increase in parameter count, Hydra improves distillation performance on classification and regression settings while capturing the uncertainty behaviour of the original ensemble over both in-domain and out-of-distribution tasks.
Tasks
Published	2020-01-14
URL	https://arxiv.org/abs/2001.04694v1
PDF	https://arxiv.org/pdf/2001.04694v1.pdf
PWC	https://paperswithcode.com/paper/hydra-preserving-ensemble-diversity-for-model-1
Repo
Framework

Brain tumor segmentation with missing modalities via latent multi-source correlation representation


Title	Brain tumor segmentation with missing modalities via latent multi-source correlation representation
Authors	Tongxue Zhou, Stephane Canu, Pierre Vera, Su Ruan
Abstract	Multimodal MR images can provide complementary information for accurate brain tumor segmentation. However, it’s common to have missing imaging modalities in clinical practice. Since it exists a strong correlation between multi modalities, a novel correlation representation block is proposed to specially discover the latent multi-source correlation. Thanks to the obtained correlation representation, the segmentation becomes more robust in the case of missing modality. The model parameter estimation module first maps the individual representation produced by each encoder to obtain independent parameters, then, under these parameters, correlation expression module transforms all the individual representations to form a latent multi-source correlation representation. Finally, the correlation representations across modalities are fused via attention mechanism into a shared representation to emphasize the most important features for segmentation. We evaluate our model on BraTS 2018 datasets, it outperforms the current state-of-the-art method and produces robust results when one or more modalities are missing.
Tasks	Brain Tumor Segmentation
Published	2020-03-19
URL	https://arxiv.org/abs/2003.08870v1
PDF	https://arxiv.org/pdf/2003.08870v1.pdf
PWC	https://paperswithcode.com/paper/brain-tumor-segmentation-with-missing
Repo
Framework