January 29, 2020

3282 words 16 mins read

Paper Group ANR 697

Addressing Ambiguity of Emotion Labels Through Meta-Learning. Efficient Approximate Inference with Walsh-Hadamard Variational Inference. Nearly Optimal Algorithms for Piecewise-Stationary Cascading Bandits. Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction. Conditional Mutual Infor …

Addressing Ambiguity of Emotion Labels Through Meta-Learning


Title	Addressing Ambiguity of Emotion Labels Through Meta-Learning
Authors	Takuya Fujioka, Dario Bertero, Takeshi Homma, Kenji Nagamatsu
Abstract	Emotion labels in emotion recognition corpora are highly noisy and ambiguous, due to the annotators’ subjective perception of emotions. Such ambiguity may introduce errors in automatic classification and affect the overall performance. We therefore propose a dynamic label correction and sample contribution weight estimation model. Our model is based on a standard BLSTM model with attention with two extra parameters. The first learns a new corrected label distribution, and is aimed to fix the inaccurate labels from the dataset. The other instead estimates the contribution of each sample to the training process, and is aimed to ignore the ambiguous and noisy samples while giving higher weight to the clear ones. We train our model through an alternating optimization method, where in the first epoch we update the neural network parameters, and in the second we keep them fixed to update the label correction and sample importance parameters. When training and evaluating our model on the IEMOCAP dataset, we obtained a weighted accuracy (WA) and unweighted accuracy (UA) of respectively 65.9% and 61.4%. This yielded an absolute improvement of 2.5%, 2.7% respectively compared to a BLSTM with attention baseline, trained on the corpus gold labels.
Tasks	Emotion Recognition, Meta-Learning
Published	2019-11-06
URL	https://arxiv.org/abs/1911.02216v2
PDF	https://arxiv.org/pdf/1911.02216v2.pdf
PWC	https://paperswithcode.com/paper/addressing-ambiguity-of-emotion-labels
Repo
Framework

Efficient Approximate Inference with Walsh-Hadamard Variational Inference


Title	Efficient Approximate Inference with Walsh-Hadamard Variational Inference
Authors	Simone Rossi, Sebastien Marmin, Maurizio Filippone
Abstract	Variational inference offers scalable and flexible tools to tackle intractable Bayesian inference of modern statistical models like Bayesian neural networks and Gaussian processes. For largely over-parameterized models, however, the over-regularization property of the variational objective makes the application of variational inference challenging. Inspired by the literature on kernel methods, and in particular on structured approximations of distributions of random matrices, this paper proposes Walsh-Hadamard Variational Inference, which uses Walsh-Hadamard-based factorization strategies to reduce model parameterization, accelerate computations, and increase the expressiveness of the approximate posterior beyond fully factorized ones.
Tasks	Bayesian Inference, Gaussian Processes
Published	2019-11-29
URL	https://arxiv.org/abs/1912.00015v1
PDF	https://arxiv.org/pdf/1912.00015v1.pdf
PWC	https://paperswithcode.com/paper/efficient-approximate-inference-with-walsh
Repo
Framework

Nearly Optimal Algorithms for Piecewise-Stationary Cascading Bandits


Title	Nearly Optimal Algorithms for Piecewise-Stationary Cascading Bandits
Authors	Lingda Wang, Huozhi Zhou, Bingcong Li, Lav R. Varshney, Zhizhen Zhao
Abstract	Cascading bandit (CB) is a popular model for web search and online advertising, where an agent aims to learn the $K$ most attractive items out of a ground set of size $L$ during the interaction with a user. However, the stationary CB model may be too simple to apply to real-world problems, where user preferences may change over time. Considering piecewise-stationary environments, two efficient algorithms, \texttt{GLRT-CascadeUCB} and \texttt{GLRT-CascadeKL-UCB}, are developed and shown to ensure regret upper bounds on the order of $\mathcal{O}(\sqrt{NLT\log{T}})$, where $N$ is the number of piecewise-stationary segments, and $T$ is the number of time slots. At the crux of the proposed algorithms is an almost parameter-free change-point detector, the generalized likelihood ratio test (GLRT). Comparing with existing works, the GLRT-based algorithms: i) are free of change-point-dependent information for choosing parameters; ii) have fewer tuning parameters; iii) improve at least the $L$ dependence in regret upper bounds. In addition, we show that the proposed algorithms are optimal (up to a logarithm factor) in terms of regret by deriving a minimax lower bound on the order of $\Omega(\sqrt{NLT})$ for piecewise-stationary CB. The efficiency of the proposed algorithms relative to state-of-the-art approaches is validated through numerical experiments on both synthetic and real-world datasets.
Tasks
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05886v5
PDF	https://arxiv.org/pdf/1909.05886v5.pdf
PWC	https://paperswithcode.com/paper/be-aware-of-non-stationarity-nearly-optimal
Repo
Framework

Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction


Title	Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction
Authors	Osama Makansi, Eddy Ilg, Özgün Cicek, Thomas Brox
Abstract	Future prediction is a fundamental principle of intelligence that helps plan actions and avoid possible dangers. As the future is uncertain to a large extent, modeling the uncertainty and multimodality of the future states is of great relevance. Existing approaches are rather limited in this regard and mostly yield a single hypothesis of the future or, at the best, strongly constrained mixture components that suffer from instabilities in training and mode collapse. In this work, we present an approach that involves the prediction of several samples of the future with a winner-takes-all loss and iterative grouping of samples to multiple modes. Moreover, we discuss how to evaluate predicted multimodal distributions, including the common real scenario, where only a single sample from the ground-truth distribution is available for evaluation. We show on synthetic and real data that the proposed approach triggers good estimates of multimodal distributions and avoids mode collapse.
Tasks	Future prediction
Published	2019-06-09
URL	https://arxiv.org/abs/1906.03631v1
PDF	https://arxiv.org/pdf/1906.03631v1.pdf
PWC	https://paperswithcode.com/paper/overcoming-limitations-of-mixture-density-1
Repo
Framework

Conditional Mutual Information Neural Estimator


Title	Conditional Mutual Information Neural Estimator
Authors	Sina Molavipour, Germán Bassi, Mikael Skoglund
Abstract	Several recent works in communication systems have proposed to leverage the power of neural networks in the design of encoders and decoders. In this approach, these blocks can be tailored to maximize the transmission rate based on aggregated samples from the channel. Motivated by the fact that, in many communication schemes, the achievable transmission rate is determined by a conditional mutual information term, this paper focuses on neural-based estimators for this information-theoretic quantity. Our results are based on variational bounds for the KL-divergence and, in contrast to some previous works, we provide a mathematically rigorous lower bound. However, additional challenges with respect to the unconditional mutual information emerge due to the presence of a conditional density function which we address here.
Tasks
Published	2019-11-06
URL	https://arxiv.org/abs/1911.02277v2
PDF	https://arxiv.org/pdf/1911.02277v2.pdf
PWC	https://paperswithcode.com/paper/conditional-mutual-information-neural
Repo
Framework

A tensorized logic programming language for large-scale data


Title	A tensorized logic programming language for large-scale data
Authors	Ryosuke Kojima, Taisuke Sato
Abstract	We introduce a new logic programming language T-PRISM based on tensor embeddings. Our embedding scheme is a modification of the distribution semantics in PRISM, one of the state-of-the-art probabilistic logic programming languages, by replacing distribution functions with multidimensional arrays, i.e., tensors. T-PRISM consists of two parts: logic programming part and numerical computation part. The former provides flexible and interpretable modeling at the level of first order logic, and the latter part provides scalable computation utilizing parallelization and hardware acceleration with GPUs. Combing these two parts provides a remarkably wide range of high-level declarative modeling from symbolic reasoning to deep learning. To embody this programming language, we also introduce a new semantics, termed tensorized semantics, which combines the traditional least model semantics in logic programming with the embeddings of tensors. In T-PRISM, we first derive a set of equations related to tensors from a given program using logical inference, i.e., Prolog execution in a symbolic space and then solve the derived equations in a continuous space by TensorFlow. Using our preliminary implementation of T-PRISM, we have successfully dealt with a wide range of modeling. We have succeeded in dealing with real large-scale data in the declarative modeling. This paper presents a DistMult model for knowledge graphs using the FB15k and WN18 datasets.
Tasks	Knowledge Graphs
Published	2019-01-20
URL	http://arxiv.org/abs/1901.08548v1
PDF	http://arxiv.org/pdf/1901.08548v1.pdf
PWC	https://paperswithcode.com/paper/a-tensorized-logic-programming-language-for
Repo
Framework

ZiMM: a deep learning model for long term and blurry relapses with non-clinical claims data


Title	ZiMM: a deep learning model for long term and blurry relapses with non-clinical claims data
Authors	Anastasiia Kabeshova, Yiyang Yu, Bertrand Lukacs, Emmanuel Bacry, Stéphane Gaïffas
Abstract	This paper considers the problems of modeling and predicting a long-term and “blurry” relapse that occurs after a medical act, such as a surgery. We do not consider a short-term complication related to the act itself, but a long-term relapse that clinicians cannot explain easily, since it depends on unknown sets or sequences of past events that occurred before the act. The relapse is observed only indirectly, in a “blurry” fashion, through longitudinal prescriptions of drugs over a long period of time after the medical act. We introduce a new model, called ZiMM (Zero-inflated Mixture of Multinomial distributions) in order to capture long-term and blurry relapses. On top of it, we build an end-to-end deep-learning architecture called ZiMM Encoder-Decoder (ZiMM ED) that can learn from the complex, irregular, highly heterogeneous and sparse patterns of health events that are observed through a claims-only database. ZiMM ED is applied on a “non-clinical” claims database, that contains only timestamped reimbursement codes for drug purchases, medical procedures and hospital diagnoses, the only available clinical feature being the age of the patient. This setting is more challenging than a setting where bedside clinical signals are available. Indeed, we consider a dataset containing the claims of almost all French citizens who had surgery for prostatic problems, with a history between 1.5 and 5 years. We consider a long-term (18 months) relapse (urination problems still occur despite surgery), which is blurry since it is observed only through the reimbursement of a specific set of drugs for urination problems. Our experiments show that ZiMM ED improves several baselines, including non-deep learning and deep-learning approaches, and that it allows working on such a dataset with minimal preprocessing work.
Tasks
Published	2019-11-13
URL	https://arxiv.org/abs/1911.05346v2
PDF	https://arxiv.org/pdf/1911.05346v2.pdf
PWC	https://paperswithcode.com/paper/zimm-a-deep-learning-model-for-long-term
Repo
Framework

Distributed Learning in Non-Convex Environments – Part II: Polynomial Escape from Saddle-Points


Title	Distributed Learning in Non-Convex Environments – Part II: Polynomial Escape from Saddle-Points
Authors	Stefan Vlaski, Ali H. Sayed
Abstract	The diffusion strategy for distributed learning from streaming data employs local stochastic gradient updates along with exchange of iterates over neighborhoods. In Part I [2] of this work we established that agents cluster around a network centroid and proceeded to study the dynamics of this point. We established expected descent in non-convex environments in the large-gradient regime and introduced a short-term model to examine the dynamics over finite-time horizons. Using this model, we establish in this work that the diffusion strategy is able to escape from strict saddle-points in O(1/$\mu$) iterations; it is also able to return approximately second-order stationary points in a polynomial number of iterations. Relative to prior works on the polynomial escape from saddle-points, most of which focus on centralized perturbed or stochastic gradient descent, our approach requires less restrictive conditions on the gradient noise process.
Tasks
Published	2019-07-03
URL	https://arxiv.org/abs/1907.01849v1
PDF	https://arxiv.org/pdf/1907.01849v1.pdf
PWC	https://paperswithcode.com/paper/distributed-learning-in-non-convex
Repo
Framework

Identifying Points of Interest and Similar Individuals from Raw GPS Data


Title	Identifying Points of Interest and Similar Individuals from Raw GPS Data
Authors	Thiago Andrade, João Gama
Abstract	Smartphones and portable devices have become ubiquitous and part of everyone’s life. Due to the fact of its portability, these devices are perfect to record individuals’ traces and life-logging generating vast amounts of data at low costs. These data is emerging as a new source for studies in human mobility patterns raising the number of research projects and techniques aiming to analyze and retrieve useful information from it. The aim of this paper is to explore GPS raw data from different individuals in a community and apply data mining algorithms to identify meaningful places in a region and describe user’s profiles and its similarities. We evaluate the proposed method with a real-world dataset. The experimental results show that the steps performed to identify points of interest (POIs) and further the similarity between the users are quite satisfactory serving as a supplement for urban planning and social networks.
Tasks
Published	2019-04-19
URL	http://arxiv.org/abs/1904.09357v1
PDF	http://arxiv.org/pdf/1904.09357v1.pdf
PWC	https://paperswithcode.com/paper/190409357
Repo
Framework

Personalized Student Stress Prediction with Deep Multitask Network


Title	Personalized Student Stress Prediction with Deep Multitask Network
Authors	Abhinav Shaw, Natcha Simsiri, Iman Deznaby, Madalina Fiterau, Tauhidur Rahaman
Abstract	With the growing popularity of wearable devices, the ability to utilize physiological data collected from these devices to predict the wearer’s mental state such as mood and stress suggests great clinical applications, yet such a task is extremely challenging. In this paper, we present a general platform for personalized predictive modeling of behavioural states like students’ level of stress. Through the use of Auto-encoders and Multitask learning we extend the prediction of stress to both sequences of passive sensor data and high-level covariates. Our model outperforms the state-of-the-art in the prediction of stress level from mobile sensor data, obtaining a 45.6 % improvement in F1 score on the StudentLife dataset.
Tasks
Published	2019-06-26
URL	https://arxiv.org/abs/1906.11356v1
PDF	https://arxiv.org/pdf/1906.11356v1.pdf
PWC	https://paperswithcode.com/paper/personalized-student-stress-prediction-with
Repo
Framework

BoostGAN for Occlusive Profile Face Frontalization and Recognition


Title	BoostGAN for Occlusive Profile Face Frontalization and Recognition
Authors	Qingyan Duan, Lei Zhang
Abstract	There are many facts affecting human face recognition, such as pose, occlusion, illumination, age, etc. First and foremost are large pose and occlusion problems, which can even result in more than 10% performance degradation. Pose-invariant feature representation and face frontalization with generative adversarial networks (GAN) have been widely used to solve the pose problem. However, the synthesis and recognition of occlusive but profile faces is still an uninvestigated problem. To address this issue, in this paper, we aim to contribute an effective solution on how to recognize occlusive but profile faces, even with facial keypoint region (e.g. eyes, nose, etc.) corrupted. Specifically, we propose a boosting Generative Adversarial Network (BoostGAN) for de-occlusion, frontalization, and recognition of faces. Upon the assumption that facial occlusion is partial and incomplete, multiple patch occluded images are fed as inputs for knowledge boosting, such as identity and texture information. A new aggregation structure composed of a deep GAN for coarse face synthesis and a shallow boosting net for fine face generation is further designed. Exhaustive experiments demonstrate that the proposed approach not only presents clear perceptual photo-realistic results but also shows state-of-the-art recognition performance for occlusive but profile faces.
Tasks	Face Generation, Face Recognition
Published	2019-02-26
URL	http://arxiv.org/abs/1902.09782v1
PDF	http://arxiv.org/pdf/1902.09782v1.pdf
PWC	https://paperswithcode.com/paper/boostgan-for-occlusive-profile-face
Repo
Framework

Boltzmann machine learning and regularization methods for inferring evolutionary fields and couplings from a multiple sequence alignment


Title	Boltzmann machine learning and regularization methods for inferring evolutionary fields and couplings from a multiple sequence alignment
Authors	Sanzo Miyazawa
Abstract	The inverse Potts problem to infer the Boltzmann distribution for homologous protein sequences from their single-site and pairwise frequencies recently attracts a great deal of attention due to its capacity to accurately predict residue-residue contacts in a 3D protein complex. A Boltzmann machine for the accurate estimation of the field and coupling interactions, which is required for other studies in protein evolution and folding, is studied about learning methods, regularization models and a tuning method of regularization parameters in order to infer the interactions with reasonable characteristics. Using $L_2$ regularization for fields, group $L_1$ for couplings is shown to be very effective for parse couplings in comparison with $L_2$ and with $L_1$. Two regularization parameters for fields and couplings are tuned to yield equal values for both the sample average and the ensemble average of evolutionary energies of natural proteins. Both the averages along a learning process smoothly change and converge, but their profiles are very different between the learning methods. Most per-parameter adaptive learning methods invented for machine learning cannot learn reasonable parameters for sparse-interaction systems. A modified Adam (ModAdam) method is invented to make step-size proportional to the partial derivative for sparse couplings and to use a soft thresholding function for group $L_1$. It is shown by first inferring interactions from protein sequences and then from Monte Carlo samples that the fields and couplings can be well recovered by the group $L_1$ and the ModAdam method. However, the distribution of evolutionary energies over natural proteins is shifted towards lower energies from that of Monte Carlo samples, indicating that there may be higher-order interactions to favor natural sequences.
Tasks
Published	2019-09-10
URL	https://arxiv.org/abs/1909.05006v1
PDF	https://arxiv.org/pdf/1909.05006v1.pdf
PWC	https://paperswithcode.com/paper/boltzmann-machine-learning-and-regularization
Repo
Framework

EXmatcher: Combining Features Based on Reference Strings and Segments to Enhance Citation Matching


Title	EXmatcher: Combining Features Based on Reference Strings and Segments to Enhance Citation Matching
Authors	Behnam Ghavimi, Wolfgang Otto, Philipp Mayr
Abstract	Citation matching is a challenging task due to different problems such as the variety of citation styles, mistakes in reference strings and the quality of identified reference segments. The classic citation matching configuration used in this paper is the combination of blocking technique and a binary classifier. Three different possible inputs (reference strings, reference segments and a combination of reference strings and segments) were tested to find the most efficient strategy for citation matching. In the classification step, we describe the effect which the probabilities of reference segments can have in citation matching. Our evaluation on a manually curated gold standard showed that the input data consisting of the combination of reference segments and reference strings lead to the best result. In addition, the usage of the probabilities of the segmentation slightly improves the result.
Tasks
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04484v1
PDF	https://arxiv.org/pdf/1906.04484v1.pdf
PWC	https://paperswithcode.com/paper/exmatcher-combining-features-based-on
Repo
Framework

Noisy Accelerated Power Method for Eigenproblems with Applications


Title	Noisy Accelerated Power Method for Eigenproblems with Applications
Authors	Vien V. Mai, Mikael Johansson
Abstract	This paper introduces an efficient algorithm for finding the dominant generalized eigenvectors of a pair of symmetric matrices. Combining tools from approximation theory and convex optimization, we develop a simple scalable algorithm with strong theoretical performance guarantees. More precisely, the algorithm retains the simplicity of the well-known power method but enjoys the asymptotic iteration complexity of the powerful Lanczos method. Unlike these classic techniques, our algorithm is designed to decompose the overall problem into a series of subproblems that only need to be solved approximately. The combination of good initializations, fast iterative solvers, and appropriate error control in solving the subproblems lead to a linear running time in the input sizes compared to the superlinear time for the traditional methods. The improved running time immediately offers acceleration for several applications. As an example, we demonstrate how the proposed algorithm can be used to accelerate canonical correlation analysis, which is a fundamental statistical tool for learning of a low-dimensional representation of high-dimensional objects. Numerical experiments on real-world data sets confirm that our approach yields significant improvements over the current state-of-the-art.
Tasks
Published	2019-03-20
URL	http://arxiv.org/abs/1903.08742v1
PDF	http://arxiv.org/pdf/1903.08742v1.pdf
PWC	https://paperswithcode.com/paper/noisy-accelerated-power-method-for
Repo
Framework

Polynomial-time Algorithms for Multiple-arm Identification with Full-bandit Feedback


Title	Polynomial-time Algorithms for Multiple-arm Identification with Full-bandit Feedback
Authors	Yuko Kuroki, Liyuan Xu, Atsushi Miyauchi, Junya Honda, Masashi Sugiyama
Abstract	We study the problem of stochastic combinatorial pure exploration (CPE), where an agent sequentially pulls a set of single arms (a.k.a. a super arm) and tries to find the best super arm. Among a variety of problem settings of the CPE, we focus on the full-bandit setting, where we cannot observe the reward of each single arm, but only the sum of the rewards. Although we can regard the CPE with full-bandit feedback as a special case of pure exploration in linear bandits, an approach based on linear bandits is not computationally feasible since the number of super arms may be exponential. In this paper, we first propose a polynomial-time bandit algorithm for the CPE under general combinatorial constraints and provide an upper bound of the sample complexity. Second, we design an approximation algorithm for the 0-1 quadratic maximization problem, which arises in many bandit algorithms with confidence ellipsoids. Based on our approximation algorithm, we propose novel bandit algorithms for the top-k selection problem, and prove that our algorithms run in polynomial time. Finally, we conduct experiments on synthetic and real-world datasets, and confirm the validity of our theoretical analysis in terms of both the computation time and the sample complexity.
Tasks
Published	2019-02-27
URL	https://arxiv.org/abs/1902.10582v2
PDF	https://arxiv.org/pdf/1902.10582v2.pdf
PWC	https://paperswithcode.com/paper/polynomial-time-algorithms-for-combinatorial
Repo
Framework