January 28, 2020

3053 words 15 mins read

Paper Group ANR 1058

Learning from demonstration with model-based Gaussian process. Two-stream Spatiotemporal Feature for Video QA Task. A Distributed and Approximated Nearest Neighbors Algorithm for an Efficient Large Scale Mean Shift Clustering. Bag-of-Audio-Words based on Autoencoder Codebook for Continuous Emotion Prediction. Proceedings 35th International Conferen …

Learning from demonstration with model-based Gaussian process


Title	Learning from demonstration with model-based Gaussian process
Authors	Noémie Jaquier, David Ginsbourger, Sylvain Calinon
Abstract	In learning from demonstrations, it is often desirable to adapt the behavior of the robot as a function of the variability retrieved from human demonstrations and the (un)certainty encoded in different parts of the task. In this paper, we propose a novel multi-output Gaussian process (MOGP) based on Gaussian mixture regression (GMR). The proposed approach encapsulates the variability retrieved from the demonstrations in the covariance of the MOGP. Leveraging the generative nature of GP models, our approach can efficiently modulate trajectories towards new start-, via- or end-points defined by the task. Our framework allows the robot to precisely track via-points while being compliant in regions of high variability. We illustrate the proposed approach in simulated examples and validate it in a real-robot experiment.
Tasks
Published	2019-10-11
URL	https://arxiv.org/abs/1910.05005v1
PDF	https://arxiv.org/pdf/1910.05005v1.pdf
PWC	https://paperswithcode.com/paper/learning-from-demonstration-with-model-based
Repo
Framework

Two-stream Spatiotemporal Feature for Video QA Task


Title	Two-stream Spatiotemporal Feature for Video QA Task
Authors	Chiwan Song, Woobin Im, Sung-eui Yoon
Abstract	Understanding the content of videos is one of the core techniques for developing various helpful applications in the real world, such as recognizing various human actions for surveillance systems or customer behavior analysis in an autonomous shop. However, understanding the content or story of the video still remains a challenging problem due to its sheer amount of data and temporal structure. In this paper, we propose a multi-channel neural network structure that adopts a two-stream network structure, which has been shown high performance in human action recognition field, and use it as a spatiotemporal video feature extractor for solving video question and answering task. We also adopt a squeeze-and-excitation structure to two-stream network structure for achieving a channel-wise attended spatiotemporal feature. For jointly modeling the spatiotemporal features from video and the textual features from the question, we design a context matching module with a level adjusting layer to remove the gap of information between visual and textual features by applying attention mechanism on joint modeling. Finally, we adopt a scoring mechanism and smoothed ranking loss objective function for selecting the correct answer from answer candidates. We evaluate our model with TVQA dataset, and our approach shows the improved result in textual only setting, but the result with visual feature shows the limitation and possibility of our approach.
Tasks	Temporal Action Localization
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05006v1
PDF	https://arxiv.org/pdf/1907.05006v1.pdf
PWC	https://paperswithcode.com/paper/two-stream-spatiotemporal-feature-for-video
Repo
Framework

A Distributed and Approximated Nearest Neighbors Algorithm for an Efficient Large Scale Mean Shift Clustering


Title	A Distributed and Approximated Nearest Neighbors Algorithm for an Efficient Large Scale Mean Shift Clustering
Authors	Gaël Beck, Tarn Duong, Mustapha Lebbah, Hanane Azzag, Christophe Cérin
Abstract	In this paper we target the class of modal clustering methods where clusters are defined in terms of the local modes of the probability density function which generates the data. The most well-known modal clustering method is the k-means clustering. Mean Shift clustering is a generalization of the k-means clustering which computes arbitrarily shaped clusters as defined as the basins of attraction to the local modes created by the density gradient ascent paths. Despite its potential, the Mean Shift approach is a computationally expensive method for unsupervised learning. Thus, we introduce two contributions aiming to provide clustering algorithms with a linear time complexity, as opposed to the quadratic time complexity for the exact Mean Shift clustering. Firstly we propose a scalable procedure to approximate the density gradient ascent. Second, our proposed scalable cluster labeling technique is presented. Both propositions are based on Locality Sensitive Hashing (LSH) to approximate nearest neighbors. These two techniques may be used for moderate sized datasets. Furthermore, we show that using our proposed approximations of the density gradient ascent as a pre-processing step in other clustering methods can also improve dedicated classification metrics. For the latter, a distributed implementation, written for the Spark/Scala ecosystem is proposed. For all these considered clustering methods, we present experimental results illustrating their labeling accuracy and their potential to solve concrete problems.
Tasks
Published	2019-02-11
URL	http://arxiv.org/abs/1902.03833v1
PDF	http://arxiv.org/pdf/1902.03833v1.pdf
PWC	https://paperswithcode.com/paper/a-distributed-and-approximated-nearest
Repo
Framework

Bag-of-Audio-Words based on Autoencoder Codebook for Continuous Emotion Prediction


Title	Bag-of-Audio-Words based on Autoencoder Codebook for Continuous Emotion Prediction
Authors	Mohammed Senoussaoui, Patrick Cardinal, Alessandro Lameiras Koerich
Abstract	In this paper we present a novel approach for extracting a Bag-of-Words (BoW) representation based on a Neural Network codebook. The conventional BoW model is based on a dictionary (codebook) built from elementary representations which are selected randomly or by using a clustering algorithm on a training dataset. A metric is then used to assign unseen elementary representations to the closest dictionary entries in order to produce a histogram. In the proposed approach, an autoencoder (AE) encompasses the role of both the dictionary creation and the assignment metric. The dimension of the encoded layer of the AE corresponds to the size of the dictionary and the output of its neurons represents the assignment metric. Experimental results for the continuous emotion prediction task on the AVEC 2017 audio dataset have shown an improvement of the Concordance Correlation Coefficient (CCC) from 0.225 to 0.322 for arousal dimension and from 0.244 to 0.368 for valence dimension relative to the conventional BoW version implemented in a baseline system.
Tasks
Published	2019-07-06
URL	https://arxiv.org/abs/1907.04928v1
PDF	https://arxiv.org/pdf/1907.04928v1.pdf
PWC	https://paperswithcode.com/paper/bag-of-audio-words-based-on-autoencoder
Repo
Framework

Proceedings 35th International Conference on Logic Programming (Technical Communications)


Title	Proceedings 35th International Conference on Logic Programming (Technical Communications)
Authors	Bart Bogaerts, Esra Erdem, Paul Fodor, Andrea Formisano, Giovambattista Ianni, Daniela Inclezan, German Vidal, Alicia Villanueva, Marina De Vos, Fangkai Yang
Abstract	Since the first conference held in Marseille in 1982, ICLP has been the premier international event for presenting research in logic programming. Contributions are sought in all areas of logic programming, including but not restricted to: Foundations: Semantics, Formalisms, Nonmonotonic reasoning, Knowledge representation. Languages: Concurrency, Objects, Coordination, Mobility, Higher Order, Types, Modes, Assertions, Modules, Meta-programming, Logic-based domain-specific languages, Programming Techniques. Declarative programming: Declarative program development, Analysis, Type and mode inference, Partial evaluation, Abstract interpretation, Transformation, Validation, Verification, Debugging, Profiling, Testing, Execution visualization Implementation: Virtual machines, Compilation, Memory management, Parallel/distributed execution, Constraint handling rules, Tabling, Foreign interfaces, User interfaces. Related Paradigms and Synergies: Inductive and Co-inductive Logic Programming, Constraint Logic Programming, Answer Set Programming, Interaction with SAT, SMT and CSP solvers, Logic programming techniques for type inference and theorem proving, Argumentation, Probabilistic Logic Programming, Relations to object-oriented and Functional programming. Applications: Databases, Big Data, Data integration and federation, Software engineering, Natural language processing, Web and Semantic Web, Agents, Artificial intelligence, Computational life sciences, Education, Cybersecurity, and Robotics.
Tasks	Automated Theorem Proving
Published	2019-09-17
URL	https://arxiv.org/abs/1909.07646v1
PDF	https://arxiv.org/pdf/1909.07646v1.pdf
PWC	https://paperswithcode.com/paper/proceedings-35th-international-conference-on
Repo
Framework

An interdisciplinary survey of network similarity methods


Title	An interdisciplinary survey of network similarity methods
Authors	Emily Evans, Marissa Graham
Abstract	Comparative graph and network analysis play an important role in both systems biology and pattern recognition, but existing surveys on the topic have historically ignored or underserved one or the other of these fields. We present an integrative introduction to the key objectives and methods of graph and network comparison in each field, with the intent of remaining accessible to relative novices in order to mitigate the barrier to interdisciplinary idea crossover. To guide our investigation, and to quantitatively justify our assertions about what the key objectives and methods of each field are, we have constructed a citation network containing 5,793 vertices from the full reference lists of over two hundred relevant papers, which we collected by searching Google Scholar for ten different network comparison-related search terms. We investigate its basic statistics and community structure, and frame our presentation around the papers found to have high importance according to five different standard centrality measures.
Tasks
Published	2019-05-15
URL	https://arxiv.org/abs/1905.06457v1
PDF	https://arxiv.org/pdf/1905.06457v1.pdf
PWC	https://paperswithcode.com/paper/an-interdisciplinary-survey-of-network
Repo
Framework

Training Modern Deep Neural Networks for Memory-Fault Robustness


Title	Training Modern Deep Neural Networks for Memory-Fault Robustness
Authors	Ghouthi Boukli Hacene, François Leduc-Primeau, Amal Ben Soussia, Vincent Gripon, François Gagnon
Abstract	Because deep neural networks (DNNs) rely on a large number of parameters and computations, their implementation in energy-constrained systems is challenging. In this paper, we investigate the solution of reducing the supply voltage of the memories used in the system, which results in bit-cell faults. We explore the robustness of state-of-the-art DNN architectures towards such defects and propose a regularizer meant to mitigate their effects on accuracy. Our experiments clearly demonstrate the interest of operating the system in a faulty regime to save energy without reducing accuracy.
Tasks
Published	2019-11-23
URL	https://arxiv.org/abs/1911.10287v1
PDF	https://arxiv.org/pdf/1911.10287v1.pdf
PWC	https://paperswithcode.com/paper/training-modern-deep-neural-networks-for
Repo
Framework

Cost-Effective Incentive Allocation via Structured Counterfactual Inference


Title	Cost-Effective Incentive Allocation via Structured Counterfactual Inference
Authors	Romain Lopez, Chenchen Li, Xiang Yan, Junwu Xiong, Michael I. Jordan, Yuan Qi, Le Song
Abstract	We address a practical problem ubiquitous in modern marketing campaigns, in which a central agent tries to learn a policy for allocating strategic financial incentives to customers and observes only bandit feedback. In contrast to traditional policy optimization frameworks, we take into account the additional reward structure and budget constraints common in this setting, and develop a new two-step method for solving this constrained counterfactual policy optimization problem. Our method first casts the reward estimation problem as a domain adaptation problem with supplementary structure, and then subsequently uses the estimators for optimizing the policy with constraints. We also establish theoretical error bounds for our estimation procedure and we empirically show that the approach leads to significant improvement on both synthetic and real datasets.
Tasks	Counterfactual Inference, Domain Adaptation
Published	2019-02-07
URL	https://arxiv.org/abs/1902.02495v3
PDF	https://arxiv.org/pdf/1902.02495v3.pdf
PWC	https://paperswithcode.com/paper/cost-effective-incentive-allocation-via
Repo
Framework

Fast Decomposable Submodular Function Minimization using Constrained Total Variation


Title	Fast Decomposable Submodular Function Minimization using Constrained Total Variation
Authors	K S Sesh Kumar, Francis Bach, Thomas Pock
Abstract	We consider the problem of minimizing the sum of submodular set functions assuming minimization oracles of each summand function. Most existing approaches reformulate the problem as the convex minimization of the sum of the corresponding Lov'asz extensions and the squared Euclidean norm, leading to algorithms requiring total variation oracles of the summand functions; without further assumptions, these more complex oracles require many calls to the simpler minimization oracles often available in practice. In this paper, we consider a modified convex problem requiring constrained version of the total variation oracles that can be solved with significantly fewer calls to the simple minimization oracles. We support our claims by showing results on graph cuts for 2D and 3D graphs
Tasks
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11327v1
PDF	https://arxiv.org/pdf/1905.11327v1.pdf
PWC	https://paperswithcode.com/paper/fast-decomposable-submodular-function
Repo
Framework

A Measure of Similarity in Textual Data Using Spearman’s Rank Correlation Coefficient


Title	A Measure of Similarity in Textual Data Using Spearman’s Rank Correlation Coefficient
Authors	Nino Arsov, Milan Dukovski, Blagoja Evkoski, Stefan Cvetkovski
Abstract	In the last decade, many diverse advances have occurred in the field of information extraction from data. Information extraction in its simplest form takes place in computing environments, where structured data can be extracted through a series of queries. The continuous expansion of quantities of data have therefore provided an opportunity for knowledge extraction (KE) from a textual document (TD). A typical problem of this kind is the extraction of common characteristics and knowledge from a group of TDs, with the possibility to group such similar TDs in a process known as clustering. In this paper we present a technique for such KE among a group of TDs related to the common characteristics and meaning of their content. Our technique is based on the Spearman’s Rank Correlation Coefficient (SRCC), for which the conducted experiments have proven to be comprehensive measure to achieve a high-quality KE.
Tasks
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11750v1
PDF	https://arxiv.org/pdf/1911.11750v1.pdf
PWC	https://paperswithcode.com/paper/a-measure-of-similarity-in-textual-data-using
Repo
Framework

Effectiveness of self-supervised pre-training for speech recognition


Title	Effectiveness of self-supervised pre-training for speech recognition
Authors	Alexei Baevski, Michael Auli, Abdelrahman Mohamed
Abstract	We compare self-supervised representation learning algorithms which either explicitly quantize the audio data or learn representations without quantization. We find the former to be more accurate since it builds a good vocabulary of the data through vq-wav2vec [1] to enable learning of effective representations in subsequent BERT training. Different to previous work, we directly fine-tune the pre-trained BERT models on transcribed speech using a Connectionist Temporal Classification (CTC) loss instead of feeding the representations into a task-specific model. We also propose a BERT-style model learning directly from the continuous audio data and compare pre-training on raw audio to spectral features. Fine-tuning a BERT model on 10 hour of labeled Librispeech data with a vq-wav2vec vocabulary is almost as good as the best known reported system trained on 100 hours of labeled data on testclean, while achieving a 25% WER reduction on test-other. When using only 10 minutes of labeled data, WER is 25.2 on test-other and 16.3 on test-clean. This demonstrates that self-supervision can enable speech recognition systems trained on a near-zero amount of transcribed data.
Tasks	Language Modelling, Quantization, Representation Learning, Speech Recognition
Published	2019-11-10
URL	https://arxiv.org/abs/1911.03912v2
PDF	https://arxiv.org/pdf/1911.03912v2.pdf
PWC	https://paperswithcode.com/paper/effectiveness-of-self-supervised-pre-training
Repo
Framework

A Polynomial-time Solution for Robust Registration with Extreme Outlier Rates


Title	A Polynomial-time Solution for Robust Registration with Extreme Outlier Rates
Authors	Heng Yang, Luca Carlone
Abstract	We propose a robust approach for the registration of two sets of 3D points in the presence of a large amount of outliers. Our first contribution is to reformulate the registration problem using a Truncated Least Squares (TLS) cost that makes the estimation insensitive to a large fraction of spurious point-to-point correspondences. The second contribution is a general framework to decouple rotation, translation, and scale estimation, which allows solving in cascade for the three transformations. Since each subproblem (scale, rotation, and translation estimation) is still non-convex and combinatorial in nature, out third contribution is to show that (i) TLS scale and (component-wise) translation estimation can be solved exactly and in polynomial time via an adaptive voting scheme, (ii) TLS rotation estimation can be relaxed to a semidefinite program and the relaxation is tight in practice, even in the presence of an extreme amount of outliers. We validate the proposed algorithm, named TEASER (Truncated least squares Estimation And SEmidefinite Relaxation), in standard registration benchmarks showing that the algorithm outperforms RANSAC and robust local optimization techniques, and favorably compares with Branch-and-Bound methods, while being a polynomial-time algorithm. TEASER can tolerate up to 99% outliers and returns highly-accurate solutions.
Tasks
Published	2019-03-20
URL	https://arxiv.org/abs/1903.08588v2
PDF	https://arxiv.org/pdf/1903.08588v2.pdf
PWC	https://paperswithcode.com/paper/a-polynomial-time-solution-for-robust
Repo
Framework

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?


Title	Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?
Authors	Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang
Abstract	Modern deep learning methods provide effective means to learn good representations. However, is a good representation itself sufficient for sample efficient reinforcement learning? This question has largely been studied only with respect to (worst-case) approximation error, in the more classical approximate dynamic programming literature. With regards to the statistical viewpoint, this question is largely unexplored, and the extant body of literature mainly focuses on conditions which permit sample efficient reinforcement learning with little understanding of what are necessary conditions for efficient reinforcement learning. This work shows that, from the statistical viewpoint, the situation is far subtler than suggested by the more traditional approximation viewpoint, where the requirements on the representation that suffice for sample efficient RL are even more stringent. Our main results provide sharp thresholds for reinforcement learning methods, showing that there are hard limitations on what constitutes good function approximation (in terms of the dimensionality of the representation), where we focus on natural representational conditions relevant to value-based, model-based, and policy-based learning. These lower bounds highlight that having a good (value-based, model-based, or policy-based) representation in and of itself is insufficient for efficient reinforcement learning, unless the quality of this approximation passes certain hard thresholds. Furthermore, our lower bounds also imply exponential separations on the sample complexity between 1) value-based learning with perfect representation and value-based learning with a good-but-not-perfect representation, 2) value-based learning and policy-based learning, 3) policy-based learning and supervised learning and 4) reinforcement learning and imitation learning.
Tasks	Imitation Learning
Published	2019-10-07
URL	https://arxiv.org/abs/1910.03016v4
PDF	https://arxiv.org/pdf/1910.03016v4.pdf
PWC	https://paperswithcode.com/paper/is-a-good-representation-sufficient-for
Repo
Framework

How to Pre-Train Your Model? Comparison of Different Pre-Training Models for Biomedical Question Answering


Title	How to Pre-Train Your Model? Comparison of Different Pre-Training Models for Biomedical Question Answering
Authors	Sanjay Kamath, Brigitte Grau, Yue Ma
Abstract	Using deep learning models on small scale datasets would result in overfitting. To overcome this problem, the process of pre-training a model and fine-tuning it to the small scale dataset has been used extensively in domains such as image processing. Similarly for question answering, pre-training and fine-tuning can be done in several ways. Commonly reading comprehension models are used for pre-training, but we show that other types of pre-training can work better. We compare two pre-training models based on reading comprehension and open domain question answering models and determine the performance when fine-tuned and tested over BIOASQ question answering dataset. We find open domain question answering model to be a better fit for this task rather than reading comprehension model.
Tasks	Open-Domain Question Answering, Question Answering, Reading Comprehension
Published	2019-11-02
URL	https://arxiv.org/abs/1911.00712v1
PDF	https://arxiv.org/pdf/1911.00712v1.pdf
PWC	https://paperswithcode.com/paper/how-to-pre-train-your-model-comparison-of
Repo
Framework

Signed Laplacian Deep Learning with Adversarial Augmentation for Improved Mammography Diagnosis


Title	Signed Laplacian Deep Learning with Adversarial Augmentation for Improved Mammography Diagnosis
Authors	Heyi Li, Dongdong Chen, William H. Nailon, Mike E. Davies, David I. Laurenson
Abstract	Computer-aided breast cancer diagnosis in mammography is limited by inadequate data and the similarity between benign and cancerous masses. To address this, we propose a signed graph regularized deep neural network with adversarial augmentation, named \textsc{DiagNet}. Firstly, we use adversarial learning to generate positive and negative mass-contained mammograms for each mass class. After that, a signed similarity graph is built upon the expanded data to further highlight the discrimination. Finally, a deep convolutional neural network is trained by jointly optimizing the signed graph regularization and classification loss. Experiments show that the \textsc{DiagNet} framework outperforms the state-of-the-art in breast mass diagnosis in mammography.
Tasks
Published	2019-06-30
URL	https://arxiv.org/abs/1907.00300v2
PDF	https://arxiv.org/pdf/1907.00300v2.pdf
PWC	https://paperswithcode.com/paper/signed-laplacian-deep-learning-with
Repo
Framework