January 29, 2020

3282 words 16 mins read

Paper Group ANR 697

Paper Group ANR 697

Addressing Ambiguity of Emotion Labels Through Meta-Learning. Efficient Approximate Inference with Walsh-Hadamard Variational Inference. Nearly Optimal Algorithms for Piecewise-Stationary Cascading Bandits. Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction. Conditional Mutual Infor …

Addressing Ambiguity of Emotion Labels Through Meta-Learning

Title Addressing Ambiguity of Emotion Labels Through Meta-Learning
Authors Takuya Fujioka, Dario Bertero, Takeshi Homma, Kenji Nagamatsu
Abstract Emotion labels in emotion recognition corpora are highly noisy and ambiguous, due to the annotators’ subjective perception of emotions. Such ambiguity may introduce errors in automatic classification and affect the overall performance. We therefore propose a dynamic label correction and sample contribution weight estimation model. Our model is based on a standard BLSTM model with attention with two extra parameters. The first learns a new corrected label distribution, and is aimed to fix the inaccurate labels from the dataset. The other instead estimates the contribution of each sample to the training process, and is aimed to ignore the ambiguous and noisy samples while giving higher weight to the clear ones. We train our model through an alternating optimization method, where in the first epoch we update the neural network parameters, and in the second we keep them fixed to update the label correction and sample importance parameters. When training and evaluating our model on the IEMOCAP dataset, we obtained a weighted accuracy (WA) and unweighted accuracy (UA) of respectively 65.9% and 61.4%. This yielded an absolute improvement of 2.5%, 2.7% respectively compared to a BLSTM with attention baseline, trained on the corpus gold labels.
Tasks Emotion Recognition, Meta-Learning
Published 2019-11-06
URL https://arxiv.org/abs/1911.02216v2
PDF https://arxiv.org/pdf/1911.02216v2.pdf
PWC https://paperswithcode.com/paper/addressing-ambiguity-of-emotion-labels
Repo
Framework

Efficient Approximate Inference with Walsh-Hadamard Variational Inference

Title Efficient Approximate Inference with Walsh-Hadamard Variational Inference
Authors Simone Rossi, Sebastien Marmin, Maurizio Filippone
Abstract Variational inference offers scalable and flexible tools to tackle intractable Bayesian inference of modern statistical models like Bayesian neural networks and Gaussian processes. For largely over-parameterized models, however, the over-regularization property of the variational objective makes the application of variational inference challenging. Inspired by the literature on kernel methods, and in particular on structured approximations of distributions of random matrices, this paper proposes Walsh-Hadamard Variational Inference, which uses Walsh-Hadamard-based factorization strategies to reduce model parameterization, accelerate computations, and increase the expressiveness of the approximate posterior beyond fully factorized ones.
Tasks Bayesian Inference, Gaussian Processes
Published 2019-11-29
URL https://arxiv.org/abs/1912.00015v1
PDF https://arxiv.org/pdf/1912.00015v1.pdf
PWC https://paperswithcode.com/paper/efficient-approximate-inference-with-walsh
Repo
Framework

Nearly Optimal Algorithms for Piecewise-Stationary Cascading Bandits

Title Nearly Optimal Algorithms for Piecewise-Stationary Cascading Bandits
Authors Lingda Wang, Huozhi Zhou, Bingcong Li, Lav R. Varshney, Zhizhen Zhao
Abstract Cascading bandit (CB) is a popular model for web search and online advertising, where an agent aims to learn the $K$ most attractive items out of a ground set of size $L$ during the interaction with a user. However, the stationary CB model may be too simple to apply to real-world problems, where user preferences may change over time. Considering piecewise-stationary environments, two efficient algorithms, \texttt{GLRT-CascadeUCB} and \texttt{GLRT-CascadeKL-UCB}, are developed and shown to ensure regret upper bounds on the order of $\mathcal{O}(\sqrt{NLT\log{T}})$, where $N$ is the number of piecewise-stationary segments, and $T$ is the number of time slots. At the crux of the proposed algorithms is an almost parameter-free change-point detector, the generalized likelihood ratio test (GLRT). Comparing with existing works, the GLRT-based algorithms: i) are free of change-point-dependent information for choosing parameters; ii) have fewer tuning parameters; iii) improve at least the $L$ dependence in regret upper bounds. In addition, we show that the proposed algorithms are optimal (up to a logarithm factor) in terms of regret by deriving a minimax lower bound on the order of $\Omega(\sqrt{NLT})$ for piecewise-stationary CB. The efficiency of the proposed algorithms relative to state-of-the-art approaches is validated through numerical experiments on both synthetic and real-world datasets.
Tasks
Published 2019-09-12
URL https://arxiv.org/abs/1909.05886v5
PDF https://arxiv.org/pdf/1909.05886v5.pdf
PWC https://paperswithcode.com/paper/be-aware-of-non-stationarity-nearly-optimal
Repo
Framework

Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction

Title Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction
Authors Osama Makansi, Eddy Ilg, Özgün Cicek, Thomas Brox
Abstract Future prediction is a fundamental principle of intelligence that helps plan actions and avoid possible dangers. As the future is uncertain to a large extent, modeling the uncertainty and multimodality of the future states is of great relevance. Existing approaches are rather limited in this regard and mostly yield a single hypothesis of the future or, at the best, strongly constrained mixture components that suffer from instabilities in training and mode collapse. In this work, we present an approach that involves the prediction of several samples of the future with a winner-takes-all loss and iterative grouping of samples to multiple modes. Moreover, we discuss how to evaluate predicted multimodal distributions, including the common real scenario, where only a single sample from the ground-truth distribution is available for evaluation. We show on synthetic and real data that the proposed approach triggers good estimates of multimodal distributions and avoids mode collapse.
Tasks Future prediction
Published 2019-06-09
URL https://arxiv.org/abs/1906.03631v1
PDF https://arxiv.org/pdf/1906.03631v1.pdf
PWC https://paperswithcode.com/paper/overcoming-limitations-of-mixture-density-1
Repo
Framework

Conditional Mutual Information Neural Estimator

Title Conditional Mutual Information Neural Estimator
Authors Sina Molavipour, Germán Bassi, Mikael Skoglund
Abstract Several recent works in communication systems have proposed to leverage the power of neural networks in the design of encoders and decoders. In this approach, these blocks can be tailored to maximize the transmission rate based on aggregated samples from the channel. Motivated by the fact that, in many communication schemes, the achievable transmission rate is determined by a conditional mutual information term, this paper focuses on neural-based estimators for this information-theoretic quantity. Our results are based on variational bounds for the KL-divergence and, in contrast to some previous works, we provide a mathematically rigorous lower bound. However, additional challenges with respect to the unconditional mutual information emerge due to the presence of a conditional density function which we address here.
Tasks
Published 2019-11-06
URL https://arxiv.org/abs/1911.02277v2
PDF https://arxiv.org/pdf/1911.02277v2.pdf
PWC https://paperswithcode.com/paper/conditional-mutual-information-neural
Repo
Framework

A tensorized logic programming language for large-scale data

Title A tensorized logic programming language for large-scale data
Authors Ryosuke Kojima, Taisuke Sato
Abstract We introduce a new logic programming language T-PRISM based on tensor embeddings. Our embedding scheme is a modification of the distribution semantics in PRISM, one of the state-of-the-art probabilistic logic programming languages, by replacing distribution functions with multidimensional arrays, i.e., tensors. T-PRISM consists of two parts: logic programming part and numerical computation part. The former provides flexible and interpretable modeling at the level of first order logic, and the latter part provides scalable computation utilizing parallelization and hardware acceleration with GPUs. Combing these two parts provides a remarkably wide range of high-level declarative modeling from symbolic reasoning to deep learning. To embody this programming language, we also introduce a new semantics, termed tensorized semantics, which combines the traditional least model semantics in logic programming with the embeddings of tensors. In T-PRISM, we first derive a set of equations related to tensors from a given program using logical inference, i.e., Prolog execution in a symbolic space and then solve the derived equations in a continuous space by TensorFlow. Using our preliminary implementation of T-PRISM, we have successfully dealt with a wide range of modeling. We have succeeded in dealing with real large-scale data in the declarative modeling. This paper presents a DistMult model for knowledge graphs using the FB15k and WN18 datasets.
Tasks Knowledge Graphs
Published 2019-01-20
URL http://arxiv.org/abs/1901.08548v1
PDF http://arxiv.org/pdf/1901.08548v1.pdf
PWC https://paperswithcode.com/paper/a-tensorized-logic-programming-language-for
Repo
Framework

ZiMM: a deep learning model for long term and blurry relapses with non-clinical claims data

Title ZiMM: a deep learning model for long term and blurry relapses with non-clinical claims data
Authors Anastasiia Kabeshova, Yiyang Yu, Bertrand Lukacs, Emmanuel Bacry, Stéphane Gaïffas
Abstract This paper considers the problems of modeling and predicting a long-term and “blurry” relapse that occurs after a medical act, such as a surgery. We do not consider a short-term complication related to the act itself, but a long-term relapse that clinicians cannot explain easily, since it depends on unknown sets or sequences of past events that occurred before the act. The relapse is observed only indirectly, in a “blurry” fashion, through longitudinal prescriptions of drugs over a long period of time after the medical act. We introduce a new model, called ZiMM (Zero-inflated Mixture of Multinomial distributions) in order to capture long-term and blurry relapses. On top of it, we build an end-to-end deep-learning architecture called ZiMM Encoder-Decoder (ZiMM ED) that can learn from the complex, irregular, highly heterogeneous and sparse patterns of health events that are observed through a claims-only database. ZiMM ED is applied on a “non-clinical” claims database, that contains only timestamped reimbursement codes for drug purchases, medical procedures and hospital diagnoses, the only available clinical feature being the age of the patient. This setting is more challenging than a setting where bedside clinical signals are available. Indeed, we consider a dataset containing the claims of almost all French citizens who had surgery for prostatic problems, with a history between 1.5 and 5 years. We consider a long-term (18 months) relapse (urination problems still occur despite surgery), which is blurry since it is observed only through the reimbursement of a specific set of drugs for urination problems. Our experiments show that ZiMM ED improves several baselines, including non-deep learning and deep-learning approaches, and that it allows working on such a dataset with minimal preprocessing work.
Tasks
Published 2019-11-13
URL https://arxiv.org/abs/1911.05346v2
PDF https://arxiv.org/pdf/1911.05346v2.pdf
PWC https://paperswithcode.com/paper/zimm-a-deep-learning-model-for-long-term
Repo
Framework

Distributed Learning in Non-Convex Environments – Part II: Polynomial Escape from Saddle-Points

Title Distributed Learning in Non-Convex Environments – Part II: Polynomial Escape from Saddle-Points
Authors Stefan Vlaski, Ali H. Sayed
Abstract The diffusion strategy for distributed learning from streaming data employs local stochastic gradient updates along with exchange of iterates over neighborhoods. In Part I [2] of this work we established that agents cluster around a network centroid and proceeded to study the dynamics of this point. We established expected descent in non-convex environments in the large-gradient regime and introduced a short-term model to examine the dynamics over finite-time horizons. Using this model, we establish in this work that the diffusion strategy is able to escape from strict saddle-points in O(1/$\mu$) iterations; it is also able to return approximately second-order stationary points in a polynomial number of iterations. Relative to prior works on the polynomial escape from saddle-points, most of which focus on centralized perturbed or stochastic gradient descent, our approach requires less restrictive conditions on the gradient noise process.
Tasks
Published 2019-07-03
URL https://arxiv.org/abs/1907.01849v1
PDF https://arxiv.org/pdf/1907.01849v1.pdf
PWC https://paperswithcode.com/paper/distributed-learning-in-non-convex
Repo
Framework

Identifying Points of Interest and Similar Individuals from Raw GPS Data

Title Identifying Points of Interest and Similar Individuals from Raw GPS Data
Authors Thiago Andrade, João Gama
Abstract Smartphones and portable devices have become ubiquitous and part of everyone’s life. Due to the fact of its portability, these devices are perfect to record individuals’ traces and life-logging generating vast amounts of data at low costs. These data is emerging as a new source for studies in human mobility patterns raising the number of research projects and techniques aiming to analyze and retrieve useful information from it. The aim of this paper is to explore GPS raw data from different individuals in a community and apply data mining algorithms to identify meaningful places in a region and describe user’s profiles and its similarities. We evaluate the proposed method with a real-world dataset. The experimental results show that the steps performed to identify points of interest (POIs) and further the similarity between the users are quite satisfactory serving as a supplement for urban planning and social networks.
Tasks
Published 2019-04-19
URL http://arxiv.org/abs/1904.09357v1
PDF http://arxiv.org/pdf/1904.09357v1.pdf
PWC https://paperswithcode.com/paper/190409357
Repo
Framework

Personalized Student Stress Prediction with Deep Multitask Network

Title Personalized Student Stress Prediction with Deep Multitask Network
Authors Abhinav Shaw, Natcha Simsiri, Iman Deznaby, Madalina Fiterau, Tauhidur Rahaman
Abstract With the growing popularity of wearable devices, the ability to utilize physiological data collected from these devices to predict the wearer’s mental state such as mood and stress suggests great clinical applications, yet such a task is extremely challenging. In this paper, we present a general platform for personalized predictive modeling of behavioural states like students’ level of stress. Through the use of Auto-encoders and Multitask learning we extend the prediction of stress to both sequences of passive sensor data and high-level covariates. Our model outperforms the state-of-the-art in the prediction of stress level from mobile sensor data, obtaining a 45.6 % improvement in F1 score on the StudentLife dataset.
Tasks
Published 2019-06-26
URL https://arxiv.org/abs/1906.11356v1
PDF https://arxiv.org/pdf/1906.11356v1.pdf
PWC https://paperswithcode.com/paper/personalized-student-stress-prediction-with
Repo
Framework

BoostGAN for Occlusive Profile Face Frontalization and Recognition

Title BoostGAN for Occlusive Profile Face Frontalization and Recognition
Authors Qingyan Duan, Lei Zhang
Abstract There are many facts affecting human face recognition, such as pose, occlusion, illumination, age, etc. First and foremost are large pose and occlusion problems, which can even result in more than 10% performance degradation. Pose-invariant feature representation and face frontalization with generative adversarial networks (GAN) have been widely used to solve the pose problem. However, the synthesis and recognition of occlusive but profile faces is still an uninvestigated problem. To address this issue, in this paper, we aim to contribute an effective solution on how to recognize occlusive but profile faces, even with facial keypoint region (e.g. eyes, nose, etc.) corrupted. Specifically, we propose a boosting Generative Adversarial Network (BoostGAN) for de-occlusion, frontalization, and recognition of faces. Upon the assumption that facial occlusion is partial and incomplete, multiple patch occluded images are fed as inputs for knowledge boosting, such as identity and texture information. A new aggregation structure composed of a deep GAN for coarse face synthesis and a shallow boosting net for fine face generation is further designed. Exhaustive experiments demonstrate that the proposed approach not only presents clear perceptual photo-realistic results but also shows state-of-the-art recognition performance for occlusive but profile faces.
Tasks Face Generation, Face Recognition
Published 2019-02-26
URL http://arxiv.org/abs/1902.09782v1
PDF http://arxiv.org/pdf/1902.09782v1.pdf
PWC https://paperswithcode.com/paper/boostgan-for-occlusive-profile-face
Repo
Framework

Boltzmann machine learning and regularization methods for inferring evolutionary fields and couplings from a multiple sequence alignment

Title Boltzmann machine learning and regularization methods for inferring evolutionary fields and couplings from a multiple sequence alignment
Authors Sanzo Miyazawa
Abstract The inverse Potts problem to infer the Boltzmann distribution for homologous protein sequences from their single-site and pairwise frequencies recently attracts a great deal of attention due to its capacity to accurately predict residue-residue contacts in a 3D protein complex. A Boltzmann machine for the accurate estimation of the field and coupling interactions, which is required for other studies in protein evolution and folding, is studied about learning methods, regularization models and a tuning method of regularization parameters in order to infer the interactions with reasonable characteristics. Using $L_2$ regularization for fields, group $L_1$ for couplings is shown to be very effective for parse couplings in comparison with $L_2$ and with $L_1$. Two regularization parameters for fields and couplings are tuned to yield equal values for both the sample average and the ensemble average of evolutionary energies of natural proteins. Both the averages along a learning process smoothly change and converge, but their profiles are very different between the learning methods. Most per-parameter adaptive learning methods invented for machine learning cannot learn reasonable parameters for sparse-interaction systems. A modified Adam (ModAdam) method is invented to make step-size proportional to the partial derivative for sparse couplings and to use a soft thresholding function for group $L_1$. It is shown by first inferring interactions from protein sequences and then from Monte Carlo samples that the fields and couplings can be well recovered by the group $L_1$ and the ModAdam method. However, the distribution of evolutionary energies over natural proteins is shifted towards lower energies from that of Monte Carlo samples, indicating that there may be higher-order interactions to favor natural sequences.
Tasks
Published 2019-09-10
URL https://arxiv.org/abs/1909.05006v1
PDF https://arxiv.org/pdf/1909.05006v1.pdf
PWC https://paperswithcode.com/paper/boltzmann-machine-learning-and-regularization
Repo
Framework

EXmatcher: Combining Features Based on Reference Strings and Segments to Enhance Citation Matching

Title EXmatcher: Combining Features Based on Reference Strings and Segments to Enhance Citation Matching
Authors Behnam Ghavimi, Wolfgang Otto, Philipp Mayr
Abstract Citation matching is a challenging task due to different problems such as the variety of citation styles, mistakes in reference strings and the quality of identified reference segments. The classic citation matching configuration used in this paper is the combination of blocking technique and a binary classifier. Three different possible inputs (reference strings, reference segments and a combination of reference strings and segments) were tested to find the most efficient strategy for citation matching. In the classification step, we describe the effect which the probabilities of reference segments can have in citation matching. Our evaluation on a manually curated gold standard showed that the input data consisting of the combination of reference segments and reference strings lead to the best result. In addition, the usage of the probabilities of the segmentation slightly improves the result.
Tasks
Published 2019-06-11
URL https://arxiv.org/abs/1906.04484v1
PDF https://arxiv.org/pdf/1906.04484v1.pdf
PWC https://paperswithcode.com/paper/exmatcher-combining-features-based-on
Repo
Framework

Noisy Accelerated Power Method for Eigenproblems with Applications

Title Noisy Accelerated Power Method for Eigenproblems with Applications
Authors Vien V. Mai, Mikael Johansson
Abstract This paper introduces an efficient algorithm for finding the dominant generalized eigenvectors of a pair of symmetric matrices. Combining tools from approximation theory and convex optimization, we develop a simple scalable algorithm with strong theoretical performance guarantees. More precisely, the algorithm retains the simplicity of the well-known power method but enjoys the asymptotic iteration complexity of the powerful Lanczos method. Unlike these classic techniques, our algorithm is designed to decompose the overall problem into a series of subproblems that only need to be solved approximately. The combination of good initializations, fast iterative solvers, and appropriate error control in solving the subproblems lead to a linear running time in the input sizes compared to the superlinear time for the traditional methods. The improved running time immediately offers acceleration for several applications. As an example, we demonstrate how the proposed algorithm can be used to accelerate canonical correlation analysis, which is a fundamental statistical tool for learning of a low-dimensional representation of high-dimensional objects. Numerical experiments on real-world data sets confirm that our approach yields significant improvements over the current state-of-the-art.
Tasks
Published 2019-03-20
URL http://arxiv.org/abs/1903.08742v1
PDF http://arxiv.org/pdf/1903.08742v1.pdf
PWC https://paperswithcode.com/paper/noisy-accelerated-power-method-for
Repo
Framework

Polynomial-time Algorithms for Multiple-arm Identification with Full-bandit Feedback

Title Polynomial-time Algorithms for Multiple-arm Identification with Full-bandit Feedback
Authors Yuko Kuroki, Liyuan Xu, Atsushi Miyauchi, Junya Honda, Masashi Sugiyama
Abstract We study the problem of stochastic combinatorial pure exploration (CPE), where an agent sequentially pulls a set of single arms (a.k.a. a super arm) and tries to find the best super arm. Among a variety of problem settings of the CPE, we focus on the full-bandit setting, where we cannot observe the reward of each single arm, but only the sum of the rewards. Although we can regard the CPE with full-bandit feedback as a special case of pure exploration in linear bandits, an approach based on linear bandits is not computationally feasible since the number of super arms may be exponential. In this paper, we first propose a polynomial-time bandit algorithm for the CPE under general combinatorial constraints and provide an upper bound of the sample complexity. Second, we design an approximation algorithm for the 0-1 quadratic maximization problem, which arises in many bandit algorithms with confidence ellipsoids. Based on our approximation algorithm, we propose novel bandit algorithms for the top-k selection problem, and prove that our algorithms run in polynomial time. Finally, we conduct experiments on synthetic and real-world datasets, and confirm the validity of our theoretical analysis in terms of both the computation time and the sample complexity.
Tasks
Published 2019-02-27
URL https://arxiv.org/abs/1902.10582v2
PDF https://arxiv.org/pdf/1902.10582v2.pdf
PWC https://paperswithcode.com/paper/polynomial-time-algorithms-for-combinatorial
Repo
Framework
comments powered by Disqus