January 25, 2020

3032 words 15 mins read

Paper Group ANR 1685

No-Trick (Treat) Kernel Adaptive Filtering using Deterministic Features. Inference on weighted average value function in high-dimensional state space. Multi-Interest Network with Dynamic Routing for Recommendation at Tmall. Boosting Supervision with Self-Supervision for Few-shot Learning. CNN-LSTM models for Multi-Speaker Source Separation using Ba …

No-Trick (Treat) Kernel Adaptive Filtering using Deterministic Features


Title	No-Trick (Treat) Kernel Adaptive Filtering using Deterministic Features
Authors	Kan Li, Jose C. Principe
Abstract	Kernel methods form a powerful, versatile, and theoretically-grounded unifying framework to solve nonlinear problems in signal processing and machine learning. The standard approach relies on the kernel trick to perform pairwise evaluations of a kernel function, which leads to scalability issues for large datasets due to its linear and superlinear growth with respect to the training data. A popular approach to tackle this problem, known as random Fourier features (RFFs), samples from a distribution to obtain the data-independent basis of a higher finite-dimensional feature space, where its dot product approximates the kernel function. Recently, deterministic, rather than random construction has been shown to outperform RFFs, by approximating the kernel in the frequency domain using Gaussian quadrature. In this paper, we view the dot product of these explicit mappings not as an approximation, but as an equivalent positive-definite kernel that induces a new finite-dimensional reproducing kernel Hilbert space (RKHS). This opens the door to no-trick (NT) online kernel adaptive filtering (KAF) that is scalable and robust. Random features are prone to large variances in performance, especially for smaller dimensions. Here, we focus on deterministic feature-map construction based on polynomial-exact solutions and show their superiority over random constructions. Without loss of generality, we apply this approach to classical adaptive filtering algorithms and validate the methodology to show that deterministic features are faster to generate and outperform state-of-the-art kernel methods based on random Fourier features.
Tasks
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04530v1
PDF	https://arxiv.org/pdf/1912.04530v1.pdf
PWC	https://paperswithcode.com/paper/no-trick-treat-kernel-adaptive-filtering
Repo
Framework

Inference on weighted average value function in high-dimensional state space


Title	Inference on weighted average value function in high-dimensional state space
Authors	Victor Chernozhukov, Whitney Newey, Vira Semenova
Abstract	This paper gives a consistent, asymptotically normal estimator of the expected value function when the state space is high-dimensional and the first-stage nuisance functions are estimated by modern machine learning tools. First, we show that value function is orthogonal to the conditional choice probability, therefore, this nuisance function needs to be estimated only at $n^{-1/4}$ rate. Second, we give a correction term for the transition density of the state variable. The resulting orthogonal moment is robust to misspecification of the transition density and does not require this nuisance function to be consistently estimated. Third, we generalize this result by considering the weighted expected value. In this case, the orthogonal moment is doubly robust in the transition density and additional second-stage nuisance functions entering the correction term. We complete the asymptotic theory by providing bounds on second-order asymptotic terms.
Tasks
Published	2019-08-24
URL	https://arxiv.org/abs/1908.09173v1
PDF	https://arxiv.org/pdf/1908.09173v1.pdf
PWC	https://paperswithcode.com/paper/inference-on-weighted-average-value-function
Repo
Framework

Multi-Interest Network with Dynamic Routing for Recommendation at Tmall


Title	Multi-Interest Network with Dynamic Routing for Recommendation at Tmall
Authors	Chao Li, Zhiyuan Liu, Mengmeng Wu, Yuchi Xu, Pipei Huang, Huan Zhao, Guoliang Kang, Qiwei Chen, Wei Li, Dik Lun Lee
Abstract	Industrial recommender systems usually consist of the matching stage and the ranking stage, in order to handle the billion-scale of users and items. The matching stage retrieves candidate items relevant to user interests, while the ranking stage sorts candidate items by user interests. Thus, the most critical ability is to model and represent user interests for either stage. Most of the existing deep learning-based models represent one user as a single vector which is insufficient to capture the varying nature of user’s interests. In this paper, we approach this problem from a different view, to represent one user with multiple vectors encoding the different aspects of the user’s interests. We propose the Multi-Interest Network with Dynamic routing (MIND) for dealing with user’s diverse interests in the matching stage. Specifically, we design a multi-interest extractor layer based on capsule routing mechanism, which is applicable for clustering historical behaviors and extracting diverse interests. Furthermore, we develop a technique named label-aware attention to help learn a user representation with multiple vectors. Through extensive experiments on several public benchmarks and one large-scale industrial dataset from Tmall, we demonstrate that MIND can achieve superior performance than state-of-the-art methods for recommendation. Currently, MIND has been deployed for handling major online traffic at the homepage on Mobile Tmall App.
Tasks	Recommendation Systems
Published	2019-04-17
URL	http://arxiv.org/abs/1904.08030v1
PDF	http://arxiv.org/pdf/1904.08030v1.pdf
PWC	https://paperswithcode.com/paper/multi-interest-network-with-dynamic-routing
Repo
Framework

Boosting Supervision with Self-Supervision for Few-shot Learning


Title	Boosting Supervision with Self-Supervision for Few-shot Learning
Authors	Jong-Chyi Su, Subhransu Maji, Bharath Hariharan
Abstract	We present a technique to improve the transferability of deep representations learned on small labeled datasets by introducing self-supervised tasks as auxiliary loss functions. While recent approaches for self-supervised learning have shown the benefits of training on large unlabeled datasets, we find improvements in generalization even on small datasets and when combined with strong supervision. Learning representations with self-supervised losses reduces the relative error rate of a state-of-the-art meta-learner by 5-25% on several few-shot learning benchmarks, as well as off-the-shelf deep networks on standard classification tasks when training from scratch. We find the benefits of self-supervision increase with the difficulty of the task. Our approach utilizes the images within the dataset to construct self-supervised losses and hence is an effective way of learning transferable representations without relying on any external training data.
Tasks	Few-Shot Learning
Published	2019-06-17
URL	https://arxiv.org/abs/1906.07079v1
PDF	https://arxiv.org/pdf/1906.07079v1.pdf
PWC	https://paperswithcode.com/paper/boosting-supervision-with-self-supervision
Repo
Framework

CNN-LSTM models for Multi-Speaker Source Separation using Bayesian Hyper Parameter Optimization


Title	CNN-LSTM models for Multi-Speaker Source Separation using Bayesian Hyper Parameter Optimization
Authors	Jeroen Zegers, Hugo Van hamme
Abstract	In recent years there have been many deep learning approaches towards the multi-speaker source separation problem. Most use Long Short-Term Memory - Recurrent Neural Networks (LSTM-RNN) or Convolutional Neural Networks (CNN) to model the sequential behavior of speech. In this paper we propose a novel network for source separation using an encoder-decoder CNN and LSTM in parallel. Hyper parameters have to be chosen for both parts of the network and they are potentially mutually dependent. Since hyper parameter grid search has a high computational burden, random search is often preferred. However, when sampling a new point in the hyper parameter space, it can potentially be very close to a previously evaluated point and thus give little additional information. Furthermore, random sampling is as likely to sample in a promising area as in an hyper space area dominated with poor performing models. Therefore, we use a Bayesian hyper parameter optimization technique and find that the parallel CNN-LSTM outperforms the LSTM-only and CNN-only model.
Tasks	Multi-Speaker Source Separation
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09254v1
PDF	https://arxiv.org/pdf/1912.09254v1.pdf
PWC	https://paperswithcode.com/paper/cnn-lstm-models-for-multi-speaker-source
Repo
Framework

A general-purpose deep learning approach to model time-varying audio effects


Title	A general-purpose deep learning approach to model time-varying audio effects
Authors	Marco A. Martínez Ramírez, Emmanouil Benetos, Joshua D. Reiss
Abstract	Audio processors whose parameters are modified periodically over time are often referred as time-varying or modulation based audio effects. Most existing methods for modeling these type of effect units are often optimized to a very specific circuit and cannot be efficiently generalized to other time-varying effects. Based on convolutional and recurrent neural networks, we propose a deep learning architecture for generic black-box modeling of audio processors with long-term memory. We explore the capabilities of deep neural networks to learn such long temporal dependencies and we show the network modeling various linear and nonlinear, time-varying and time-invariant audio effects. In order to measure the performance of the model, we propose an objective metric based on the psychoacoustics of modulation frequency perception. We also analyze what the model is actually learning and how the given task is accomplished.
Tasks
Published	2019-05-15
URL	https://arxiv.org/abs/1905.06148v2
PDF	https://arxiv.org/pdf/1905.06148v2.pdf
PWC	https://paperswithcode.com/paper/a-general-purpose-deep-learning-approach-to
Repo
Framework

Generalising Deep Learning MRI Reconstruction across Different Domains


Title	Generalising Deep Learning MRI Reconstruction across Different Domains
Authors	Cheng Ouyang, Jo Schlemper, Carlo Biffi, Gavin Seegoolam, Jose Caballero, Anthony N. Price, Joseph V. Hajnal, Daniel Rueckert
Abstract	We look into robustness of deep learning based MRI reconstruction when tested on unseen contrasts and organs. We then propose to generalise the network by training with large publicly-available natural image datasets with synthesised phase information to achieve high cross-domain reconstruction performance which is competitive with domain-specific training. To explain its generalisation mechanism, we have also analysed patch sets for different training datasets.
Tasks
Published	2019-01-31
URL	http://arxiv.org/abs/1902.10815v1
PDF	http://arxiv.org/pdf/1902.10815v1.pdf
PWC	https://paperswithcode.com/paper/generalising-deep-learning-mri-reconstruction
Repo
Framework

Human Pose Estimation with Spatial Contextual Information


Title	Human Pose Estimation with Spatial Contextual Information
Authors	Hong Zhang, Hao Ouyang, Shu Liu, Xiaojuan Qi, Xiaoyong Shen, Ruigang Yang, Jiaya Jia
Abstract	We explore the importance of spatial contextual information in human pose estimation. Most state-of-the-art pose networks are trained in a multi-stage manner and produce several auxiliary predictions for deep supervision. With this principle, we present two conceptually simple and yet computational efficient modules, namely Cascade Prediction Fusion (CPF) and Pose Graph Neural Network (PGNN), to exploit underlying contextual information. Cascade prediction fusion accumulates prediction maps from previous stages to extract informative signals. The resulting maps also function as a prior to guide prediction at following stages. To promote spatial correlation among joints, our PGNN learns a structured representation of human pose as a graph. Direct message passing between different joints is enabled and spatial relation is captured. These two modules require very limited computational complexity. Experimental results demonstrate that our method consistently outperforms previous methods on MPII and LSP benchmark.
Tasks	Pose Estimation
Published	2019-01-07
URL	http://arxiv.org/abs/1901.01760v1
PDF	http://arxiv.org/pdf/1901.01760v1.pdf
PWC	https://paperswithcode.com/paper/human-pose-estimation-with-spatial-contextual
Repo
Framework


Title	Multi-modal Probabilistic Prediction of Interactive Behavior via an Interpretable Model
Authors	Yeping Hu, Wei Zhan, Liting Sun, Masayoshi Tomizuka
Abstract	For autonomous agents to successfully operate in real world, the ability to anticipate future motions of surrounding entities in the scene can greatly enhance their safety levels since potentially dangerous situations could be avoided in advance. While impressive results have been shown on predicting each agent’s behavior independently, we argue that it is not valid to consider road entities individually since transitions of vehicle states are highly coupled. Moreover, as the predicted horizon becomes longer, modeling prediction uncertainties and multi-modal distributions over future sequences will turn into a more challenging task. In this paper, we address this challenge by presenting a multi-modal probabilistic prediction approach. The proposed method is based on a generative model and is capable of jointly predicting sequential motions of each pair of interacting agents. Most importantly, our model is interpretable, which can explain the underneath logic as well as obtain more reliability to use in real applications. A complicate real-world roundabout scenario is utilized to implement and examine the proposed method.
Tasks
Published	2019-03-22
URL	https://arxiv.org/abs/1903.09381v2
PDF	https://arxiv.org/pdf/1903.09381v2.pdf
PWC	https://paperswithcode.com/paper/multi-modal-probabilistic-prediction-of
Repo
Framework

Anaphora Resolution in Dialogue Systems for South Asian Languages


Title	Anaphora Resolution in Dialogue Systems for South Asian Languages
Authors	Vinay Annam, Nikhil Koditala, Radhika Mamidi
Abstract	Anaphora resolution is a challenging task which has been the interest of NLP researchers for a long time. Traditional resolution techniques like eliminative constraints and weighted preferences were successful in many languages. However, they are ineffective in free word order languages like most SouthAsian languages.Heuristic and rule-based techniques were typical in these languages, which are constrained to context and domain.In this paper, we venture a new strategy us-ing neural networks for resolving anaphora in human-human dialogues. The architecture chiefly consists of three components, a shallow parser for extracting features, a feature vector generator which produces the word embed-dings, and a neural network model which will predict the antecedent mention of an anaphora.The system has been trained and tested on Telugu conversation corpus we generated. Given the advantage of the semantic information in word embeddings and appending actor, gender, number, person and part of plural features the model has reached an F1-score of 86.
Tasks	Word Embeddings
Published	2019-11-22
URL	https://arxiv.org/abs/1911.09994v1
PDF	https://arxiv.org/pdf/1911.09994v1.pdf
PWC	https://paperswithcode.com/paper/anaphora-resolution-in-dialogue-systems-for
Repo
Framework

Empirical Autopsy of Deep Video Captioning Frameworks


Title	Empirical Autopsy of Deep Video Captioning Frameworks
Authors	Nayyer Aafaq, Naveed Akhtar, Wei Liu, Ajmal Mian
Abstract	Contemporary deep learning based video captioning follows encoder-decoder framework. In encoder, visual features are extracted with 2D/3D Convolutional Neural Networks (CNNs) and a transformed version of those features is passed to the decoder. The decoder uses word embeddings and a language model to map visual features to natural language captions. Due to its composite nature, the encoder-decoder pipeline provides the freedom of multiple choices for each of its components, e.g the choices of CNNs models, feature transformations, word embeddings, and language models etc. Component selection can have drastic effects on the overall video captioning performance. However, current literature is void of any systematic investigation in this regard. This article fills this gap by providing the first thorough empirical analysis of the role that each major component plays in a contemporary video captioning pipeline. We perform extensive experiments by varying the constituent components of the video captioning framework, and quantify the performance gains that are possible by mere component selection. We use the popular MSVD dataset as the test-bed, and demonstrate that substantial performance gains are possible by careful selection of the constituent components without major changes to the pipeline itself. These results are expected to provide guiding principles for future research in the fast growing direction of video captioning.
Tasks	Language Modelling, Video Captioning, Word Embeddings
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09345v1
PDF	https://arxiv.org/pdf/1911.09345v1.pdf
PWC	https://paperswithcode.com/paper/empirical-autopsy-of-deep-video-captioning
Repo
Framework

Bootstrapping NLU Models with Multi-task Learning


Title	Bootstrapping NLU Models with Multi-task Learning
Authors	Shubham Kapoor, Caglar Tirkaz
Abstract	Bootstrapping natural language understanding (NLU) systems with minimal training data is a fundamental challenge of extending digital assistants like Alexa and Siri to a new language. A common approach that is adapted in digital assistants when responding to a user query is to process the input in a pipeline manner where the first task is to predict the domain, followed by the inference of intent and slots. However, this cascaded approach instigates error propagation and prevents information sharing among these tasks. Further, the use of words as the atomic units of meaning as done in many studies might lead to coverage problems for morphologically rich languages such as German and French when data is limited. We address these issues by introducing a character-level unified neural architecture for joint modeling of the domain, intent, and slot classification. We compose word-embeddings from characters and jointly optimize all classification tasks via multi-task learning. In our results, we show that the proposed architecture is an optimal choice for bootstrapping NLU systems in low-resource settings thus saving time, cost and human effort.
Tasks	Multi-Task Learning, Word Embeddings
Published	2019-11-15
URL	https://arxiv.org/abs/1911.06673v1
PDF	https://arxiv.org/pdf/1911.06673v1.pdf
PWC	https://paperswithcode.com/paper/bootstrapping-nlu-models-with-multi-task
Repo
Framework

Radar-based Feature Design and Multiclass Classification for Road User Recognition


Title	Radar-based Feature Design and Multiclass Classification for Road User Recognition
Authors	Nicolas Scheiner, Nils Appenrodt, Jürgen Dickmann, Bernhard Sick
Abstract	The classification of individual traffic participants is a complex task, especially for challenging scenarios with multiple road users or under bad weather conditions. Radar sensors provide an - with respect to well established camera systems - orthogonal way of measuring such scenes. In order to gain accurate classification results, 50 different features are extracted from the measurement data and tested on their performance. From these features a suitable subset is chosen and passed to random forest and long short-term memory (LSTM) classifiers to obtain class predictions for the radar input. Moreover, it is shown why data imbalance is an inherent problem in automotive radar classification when the dataset is not sufficiently large. To overcome this issue, classifier binarization is used among other techniques in order to better account for underrepresented classes. A new method to couple the resulting probabilities is proposed and compared to others with great success. Final results show substantial improvements when compared to ordinary multiclass classification
Tasks
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11256v1
PDF	https://arxiv.org/pdf/1905.11256v1.pdf
PWC	https://paperswithcode.com/paper/radar-based-feature-design-and-multiclass
Repo
Framework

Practical and Consistent Estimation of f-Divergences


Title	Practical and Consistent Estimation of f-Divergences
Authors	Paul K. Rubenstein, Olivier Bousquet, Josip Djolonga, Carlos Riquelme, Ilya Tolstikhin
Abstract	The estimation of an f-divergence between two probability distributions based on samples is a fundamental problem in statistics and machine learning. Most works study this problem under very weak assumptions, in which case it is provably hard. We consider the case of stronger structural assumptions that are commonly satisfied in modern machine learning, including representation learning and generative modelling with autoencoder architectures. Under these assumptions we propose and study an estimator that can be easily implemented, works well in high dimensions, and enjoys faster rates of convergence. We verify the behavior of our estimator empirically in both synthetic and real-data experiments, and discuss its direct implications for total correlation, entropy, and mutual information estimation.
Tasks	Representation Learning
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11112v2
PDF	https://arxiv.org/pdf/1905.11112v2.pdf
PWC	https://paperswithcode.com/paper/practical-and-consistent-estimation-of-f
Repo
Framework

Automated Multiscale 3D Feature Learning for Vessels Segmentation in Thorax CT Images


Title	Automated Multiscale 3D Feature Learning for Vessels Segmentation in Thorax CT Images
Authors	Tomasz Konopczyński, Thorben Kröger, Lei Zheng, Christoph S. Garbe, Jürgen Hesser
Abstract	We address the vessel segmentation problem by building upon the multiscale feature learning method of Kiros et al., which achieves the current top score in the VESSEL12 MICCAI challenge. Following their idea of feature learning instead of hand-crafted filters, we have extended the method to learn 3D features. The features are learned in an unsupervised manner in a multi-scale scheme using dictionary learning via least angle regression. The 3D feature kernels are further convolved with the input volumes in order to create feature maps. Those maps are used to train a supervised classifier with the annotated voxels. In order to process the 3D data with a large number of filters a parallel implementation has been developed. The algorithm has been applied on the example scans and annotations provided by the VESSEL12 challenge. We have compared our setup with Kiros et al. by running their implementation. Our current results show an improvement in accuracy over the slice wise method from 96.66$\pm$1.10% to 97.24$\pm$0.90%.
Tasks	Dictionary Learning
Published	2019-01-06
URL	http://arxiv.org/abs/1901.01562v1
PDF	http://arxiv.org/pdf/1901.01562v1.pdf
PWC	https://paperswithcode.com/paper/automated-multiscale-3d-feature-learning-for
Repo
Framework