October 16, 2019

3448 words 17 mins read

Paper Group ANR 1124

Lazy Modeling of Variants of Token Swapping Problem and Multi-agent Path Finding through Combination of Satisfiability Modulo Theories and Conflict-based Search. VideoCapsuleNet: A Simplified Network for Action Detection. Active Learning with Logged Data. Multi-Sensor Integration for Indoor 3D Reconstruction. Assigning a Grade: Accurate Measurement …

Lazy Modeling of Variants of Token Swapping Problem and Multi-agent Path Finding through Combination of Satisfiability Modulo Theories and Conflict-based Search


Title	Lazy Modeling of Variants of Token Swapping Problem and Multi-agent Path Finding through Combination of Satisfiability Modulo Theories and Conflict-based Search
Authors	Pavel Surynek
Abstract	We address item relocation problems in graphs in this paper. We assume items placed in vertices of an undirected graph with at most one item per vertex. Items can be moved across edges while various constraints depending on the type of relocation problem must be satisfied. We introduce a general problem formulation that encompasses known types of item relocation problems such as multi-agent path finding (MAPF) and token swapping (TSWAP). In this formulation we express two new types of relocation problems derived from token swapping that we call token rotation (TROT) and token permutation (TPERM). Our solving approach for item relocation combines satisfiability modulo theory (SMT) with conflict-based search (CBS). We interpret CBS in the SMT framework where we start with the basic model and refine the model with a collision resolution constraint whenever a collision between items occurs in the current solution. The key difference between the standard CBS and our SMT-based modification of CBS (SMT-CBS) is that the standard CBS branches the search to resolve the collision while in SMT-CBS we iteratively add a single disjunctive collision resolution constraint. Experimental evaluation on several benchmarks shows that the SMT-CBS algorithm significantly outperforms the standard CBS. We also compared SMT-CBS with a modification of the SAT-based MDD-SAT solver that uses an eager modeling of item relocation in which all potential collisions are eliminated by constrains in advance. Experiments show that lazy approach in SMT-CBS produce fewer constraint than MDD-SAT and also achieves faster solving run-times.
Tasks	Multi-Agent Path Finding
Published	2018-09-16
URL	http://arxiv.org/abs/1809.05959v1
PDF	http://arxiv.org/pdf/1809.05959v1.pdf
PWC	https://paperswithcode.com/paper/lazy-modeling-of-variants-of-token-swapping
Repo
Framework

VideoCapsuleNet: A Simplified Network for Action Detection


Title	VideoCapsuleNet: A Simplified Network for Action Detection
Authors	Kevin Duarte, Yogesh S Rawat, Mubarak Shah
Abstract	The recent advances in Deep Convolutional Neural Networks (DCNNs) have shown extremely good results for video human action classification, however, action detection is still a challenging problem. The current action detection approaches follow a complex pipeline which involves multiple tasks such as tube proposals, optical flow, and tube classification. In this work, we present a more elegant solution for action detection based on the recently developed capsule network. We propose a 3D capsule network for videos, called VideoCapsuleNet: a unified network for action detection which can jointly perform pixel-wise action segmentation along with action classification. The proposed network is a generalization of capsule network from 2D to 3D, which takes a sequence of video frames as input. The 3D generalization drastically increases the number of capsules in the network, making capsule routing computationally expensive. We introduce capsule-pooling in the convolutional capsule layer to address this issue which makes the voting algorithm tractable. The routing-by-agreement in the network inherently models the action representations and various action characteristics are captured by the predicted capsules. This inspired us to utilize the capsules for action localization and the class-specific capsules predicted by the network are used to determine a pixel-wise localization of actions. The localization is further improved by parameterized skip connections with the convolutional capsule layers and the network is trained end-to-end with a classification as well as localization loss. The proposed network achieves sate-of-the-art performance on multiple action detection datasets including UCF-Sports, J-HMDB, and UCF-101 (24 classes) with an impressive ~20% improvement on UCF-101 and ~15% improvement on J-HMDB in terms of v-mAP scores.
Tasks	Action Classification, Action Detection, Action Localization, action segmentation, Multiple Action Detection, Optical Flow Estimation
Published	2018-05-21
URL	http://arxiv.org/abs/1805.08162v1
PDF	http://arxiv.org/pdf/1805.08162v1.pdf
PWC	https://paperswithcode.com/paper/videocapsulenet-a-simplified-network-for
Repo
Framework

Active Learning with Logged Data


Title	Active Learning with Logged Data
Authors	Songbai Yan, Kamalika Chaudhuri, Tara Javidi
Abstract	We consider active learning with logged data, where labeled examples are drawn conditioned on a predetermined logging policy, and the goal is to learn a classifier on the entire population, not just conditioned on the logging policy. Prior work addresses this problem either when only logged data is available, or purely in a controlled random experimentation setting where the logged data is ignored. In this work, we combine both approaches to provide an algorithm that uses logged data to bootstrap and inform experimentation, thus achieving the best of both worlds. Our work is inspired by a connection between controlled random experimentation and active learning, and modifies existing disagreement-based active learning algorithms to exploit logged data.
Tasks	Active Learning
Published	2018-02-25
URL	http://arxiv.org/abs/1802.09069v3
PDF	http://arxiv.org/pdf/1802.09069v3.pdf
PWC	https://paperswithcode.com/paper/active-learning-with-logged-data
Repo
Framework

Multi-Sensor Integration for Indoor 3D Reconstruction


Title	Multi-Sensor Integration for Indoor 3D Reconstruction
Authors	Jacky C. K. Chow
Abstract	Outdoor maps and navigation information delivered by modern services and technologies like Google Maps and Garmin navigators have revolutionized the lifestyle of many people. Motivated by the desire for similar navigation systems for indoor usage from consumers, advertisers, emergency rescuers/responders, etc., many indoor environments such as shopping malls, museums, casinos, airports, transit stations, offices, and schools need to be mapped. Typically, the environment is first reconstructed by capturing many point clouds from various stations and defining their spatial relationships. Currently, there is a lack of an accurate, rigorous, and speedy method for relating point clouds in indoor, urban, satellite-denied environments. This thesis presents a novel and automatic way for fusing calibrated point clouds obtained using a terrestrial laser scanner and the Microsoft Kinect by integrating them with a low-cost inertial measurement unit. The developed system, titled the Scannect, is the first joint static-kinematic indoor 3D mapper.
Tasks	3D Reconstruction
Published	2018-02-22
URL	http://arxiv.org/abs/1802.07866v1
PDF	http://arxiv.org/pdf/1802.07866v1.pdf
PWC	https://paperswithcode.com/paper/multi-sensor-integration-for-indoor-3d
Repo
Framework

Assigning a Grade: Accurate Measurement of Road Quality Using Satellite Imagery


Title	Assigning a Grade: Accurate Measurement of Road Quality Using Satellite Imagery
Authors	Gabriel Cadamuro, Aggrey Muhebwa, Jay Taneja
Abstract	Roads are critically important infrastructure to societal and economic development, with huge investments made by governments every year. However, methods for monitoring those investments tend to be time-consuming, laborious, and expensive, placing them out of reach for many developing regions. In this work, we develop a model for monitoring the quality of road infrastructure using satellite imagery. For this task, we harness two trends: the increasing availability of high-resolution, often-updated satellite imagery, and the enormous improvement in speed and accuracy of convolutional neural network-based methods for performing computer vision tasks. We employ a unique dataset of road quality information on 7000km of roads in Kenya combined with 50cm resolution satellite imagery. We create models for a binary classification task as well as a comprehensive 5-category classification task, with accuracy scores of 88 and 73 percent respectively. We also provide evidence of the robustness of our methods with challenging held-out scenarios, though we note some improvement is still required for confident analysis of a never before seen road. We believe these results are well-positioned to have substantial impact on a broad set of transport applications.
Tasks
Published	2018-12-01
URL	http://arxiv.org/abs/1812.01699v2
PDF	http://arxiv.org/pdf/1812.01699v2.pdf
PWC	https://paperswithcode.com/paper/assigning-a-grade-accurate-measurement-of
Repo
Framework

A rotation-equivariant convolutional neural network model of primary visual cortex


Title	A rotation-equivariant convolutional neural network model of primary visual cortex
Authors	Alexander S. Ecker, Fabian H. Sinz, Emmanouil Froudarakis, Paul G. Fahey, Santiago A. Cadena, Edgar Y. Walker, Erick Cobos, Jacob Reimer, Andreas S. Tolias, Matthias Bethge
Abstract	Classical models describe primary visual cortex (V1) as a filter bank of orientation-selective linear-nonlinear (LN) or energy models, but these models fail to predict neural responses to natural stimuli accurately. Recent work shows that models based on convolutional neural networks (CNNs) lead to much more accurate predictions, but it remains unclear which features are extracted by V1 neurons beyond orientation selectivity and phase invariance. Here we work towards systematically studying V1 computations by categorizing neurons into groups that perform similar computations. We present a framework to identify common features independent of individual neurons’ orientation selectivity by using a rotation-equivariant convolutional neural network, which automatically extracts every feature at multiple different orientations. We fit this model to responses of a population of 6000 neurons to natural images recorded in mouse primary visual cortex using two-photon imaging. We show that our rotation-equivariant network not only outperforms a regular CNN with the same number of feature maps, but also reveals a number of common features shared by many V1 neurons, which deviate from the typical textbook idea of V1 as a bank of Gabor filters. Our findings are a first step towards a powerful new tool to study the nonlinear computations in V1.
Tasks
Published	2018-09-27
URL	http://arxiv.org/abs/1809.10504v1
PDF	http://arxiv.org/pdf/1809.10504v1.pdf
PWC	https://paperswithcode.com/paper/a-rotation-equivariant-convolutional-neural
Repo
Framework

Features Extraction Based on an Origami Representation of 3D Landmarks


Title	Features Extraction Based on an Origami Representation of 3D Landmarks
Authors	Juan Manuel Fernandez Montenegro, Mahdi Maktab Dar Oghaz, Athanasios Gkelias, Georgios Tzimiropoulos, Vasileios Argyriou
Abstract	Feature extraction analysis has been widely investigated during the last decades in computer vision community due to the large range of possible applications. Significant work has been done in order to improve the performance of the emotion detection methods. Classification algorithms have been refined, novel preprocessing techniques have been applied and novel representations from images and videos have been introduced. In this paper, we propose a preprocessing method and a novel facial landmarks’ representation aiming to improve the facial emotion detection accuracy. We apply our novel methodology on the extended Cohn-Kanade (CK+) dataset and other datasets for affect classification based on Action Units (AU). The performance evaluation demonstrates an improvement on facial emotion classification (accuracy and F1 score) that indicates the superiority of the proposed methodology.
Tasks	Emotion Classification
Published	2018-12-12
URL	http://arxiv.org/abs/1812.05082v1
PDF	http://arxiv.org/pdf/1812.05082v1.pdf
PWC	https://paperswithcode.com/paper/features-extraction-based-on-an-origami
Repo
Framework

Temporal graph-based clustering for historical record linkage


Title	Temporal graph-based clustering for historical record linkage
Authors	Charini Nanayakkara, Peter Christen, Thilina Ranbaduge
Abstract	Research in the social sciences is increasingly based on large and complex data collections, where individual data sets from different domains are linked and integrated to allow advanced analytics. A popular type of data used in such a context are historical censuses, as well as birth, death, and marriage certificates. Individually, such data sets however limit the types of studies that can be conducted. Specifically, it is impossible to track individuals, families, or households over time. Once such data sets are linked and family trees spanning several decades are available it is possible to, for example, investigate how education, health, mobility, employment, and social status influence each other and the lives of people over two or even more generations. A major challenge is however the accurate linkage of historical data sets which is due to data quality and commonly also the lack of ground truth data being available. Unsupervised techniques need to be employed, which can be based on similarity graphs generated by comparing individual records. In this paper we present initial results from clustering birth records from Scotland where we aim to identify all births of the same mother and group siblings into clusters. We extend an existing clustering technique for record linkage by incorporating temporal constraints that must hold between births by the same mother, and propose a novel greedy temporal clustering technique. Experimental results show improvements over non-temporary approaches, however further work is needed to obtain links of high quality.
Tasks
Published	2018-07-06
URL	http://arxiv.org/abs/1807.02262v1
PDF	http://arxiv.org/pdf/1807.02262v1.pdf
PWC	https://paperswithcode.com/paper/temporal-graph-based-clustering-for
Repo
Framework

Record Linkage to Match Customer Names: A Probabilistic Approach


Title	Record Linkage to Match Customer Names: A Probabilistic Approach
Authors	Bahare Fatemi, Seyed Mehran Kazemi, David Poole
Abstract	Consider the following problem: given a database of records indexed by names (e.g., name of companies, restaurants, businesses, or universities) and a new name, determine whether the new name is in the database, and if so, which record it refers to. This problem is an instance of record linkage problem and is a challenging problem because people do not consistently use the official name, but use abbreviations, synonyms, different order of terms, different spelling of terms, short form of terms, and the name can contain typos or spacing issues. We provide a probabilistic model using relational logistic regression to find the probability of each record in the database being the desired record for a given query and find the best record(s) with respect to the probabilities. Building on term-matching and translational approaches for search, our model addresses many of the aforementioned challenges and provides good results when existing baselines fail. Using the probabilities outputted by the model, we can automate the search process for a portion of queries whose desired documents get a probability higher than a trust threshold. We evaluate our model on a large real-world dataset from a telecommunications company and compare it to several state-of-the-art baselines. The obtained results show that our model is a promising probabilistic model for record linkage for names. We also test if the knowledge learned by our model on one domain can be effectively transferred to a new domain. For this purpose, we test our model on an unseen test set from the business names of the secondString dataset. Promising results show that our model can be effectively applied to unseen datasets. Finally, we study the sensitivity of our model to the statistics of datasets.
Tasks
Published	2018-06-26
URL	http://arxiv.org/abs/1806.10928v1
PDF	http://arxiv.org/pdf/1806.10928v1.pdf
PWC	https://paperswithcode.com/paper/record-linkage-to-match-customer-names-a
Repo
Framework

Cycle-consistency training for end-to-end speech recognition


Title	Cycle-consistency training for end-to-end speech recognition
Authors	Takaaki Hori, Ramon Astudillo, Tomoki Hayashi, Yu Zhang, Shinji Watanabe, Jonathan Le Roux
Abstract	This paper presents a method to train end-to-end automatic speech recognition (ASR) models using unpaired data. Although the end-to-end approach can eliminate the need for expert knowledge such as pronunciation dictionaries to build ASR systems, it still requires a large amount of paired data, i.e., speech utterances and their transcriptions. Cycle-consistency losses have been recently proposed as a way to mitigate the problem of limited paired data. These approaches compose a reverse operation with a given transformation, e.g., text-to-speech (TTS) with ASR, to build a loss that only requires unsupervised data, speech in this example. Applying cycle consistency to ASR models is not trivial since fundamental information, such as speaker traits, are lost in the intermediate text bottleneck. To solve this problem, this work presents a loss that is based on the speech encoder state sequence instead of the raw speech signal. This is achieved by training a Text-To-Encoder model and defining a loss based on the encoder reconstruction error. Experimental results on the LibriSpeech corpus show that the proposed cycle-consistency training reduced the word error rate by 14.7% from an initial model trained with 100-hour paired data, using an additional 360 hours of audio data without transcriptions. We also investigate the use of text-only data mainly for language modeling to further improve the performance in the unpaired data training scenario.
Tasks	End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published	2018-11-02
URL	https://arxiv.org/abs/1811.01690v2
PDF	https://arxiv.org/pdf/1811.01690v2.pdf
PWC	https://paperswithcode.com/paper/cycle-consistency-training-for-end-to-end
Repo
Framework

Improving Generalization for Abstract Reasoning Tasks Using Disentangled Feature Representations


Title	Improving Generalization for Abstract Reasoning Tasks Using Disentangled Feature Representations
Authors	Xander Steenbrugge, Sam Leroux, Tim Verbelen, Bart Dhoedt
Abstract	In this work we explore the generalization characteristics of unsupervised representation learning by leveraging disentangled VAE’s to learn a useful latent space on a set of relational reasoning problems derived from Raven Progressive Matrices. We show that the latent representations, learned by unsupervised training using the right objective function, significantly outperform the same architectures trained with purely supervised learning, especially when it comes to generalization.
Tasks	Relational Reasoning, Representation Learning, Unsupervised Representation Learning
Published	2018-11-12
URL	http://arxiv.org/abs/1811.04784v1
PDF	http://arxiv.org/pdf/1811.04784v1.pdf
PWC	https://paperswithcode.com/paper/improving-generalization-for-abstract
Repo
Framework

Speaker Diarization With Lexical Information


Title	Speaker Diarization With Lexical Information
Authors	Tae Jin Park, Kyu Han, Ian Lane, Panayiotis Georgiou
Abstract	This work presents a novel approach to leverage lexical information for speaker diarization. We introduce a speaker diarization system that can directly integrate lexical as well as acoustic information into a speaker clustering process. Thus, we propose an adjacency matrix integration technique to integrate word level speaker turn probabilities with speaker embeddings in a comprehensive way. Our proposed method works without any reference transcript. Words, and word boundary information are provided by an ASR system. We show that our proposed method improves a baseline speaker diarization system solely based on speaker embeddings, achieving a meaningful improvement on the CALLHOME American English Speech dataset.
Tasks	Speaker Diarization
Published	2018-11-27
URL	http://arxiv.org/abs/1811.10761v2
PDF	http://arxiv.org/pdf/1811.10761v2.pdf
PWC	https://paperswithcode.com/paper/speaker-diarization-with-lexical-information
Repo
Framework

Gradient-based Optimization for Regression in the Functional Tensor-Train Format


Title	Gradient-based Optimization for Regression in the Functional Tensor-Train Format
Authors	Alex A. Gorodetsky, John D. Jakeman
Abstract	We consider the task of low-multilinear-rank functional regression, i.e., learning a low-rank parametric representation of functions from scattered real-valued data. Our first contribution is the development and analysis of an efficient gradient computation that enables gradient-based optimization procedures, including stochastic gradient descent and quasi-Newton methods, for learning the parameters of a functional tensor-train (FT). The functional tensor-train uses the tensor-train (TT) representation of low-rank arrays as an ansatz for a class of low-multilinear-rank functions. The FT is represented by a set of matrix-valued functions that contain a set of univariate functions, and the regression task is to learn the parameters of these univariate functions. Our second contribution demonstrates that using nonlinearly parameterized univariate functions, e.g., symmetric kernels with moving centers, within each core can outperform the standard approach of using a linear expansion of basis functions. Our final contributions are new rank adaptation and group-sparsity regularization procedures to minimize overfitting. We use several benchmark problems to demonstrate at least an order of magnitude lower accuracy with gradient-based optimization methods than standard alternating least squares procedures in the low-sample number regime. We also demonstrate an order of magnitude reduction in accuracy on a test problem resulting from using nonlinear parameterizations over linear parameterizations. Finally we compare regression performance with 22 other nonparametric and parametric regression methods on 10 real-world data sets. We achieve top-five accuracy for seven of the data sets and best accuracy for two of the data sets. These rankings are the best amongst parametric models and competetive with the best non-parametric methods.
Tasks
Published	2018-01-03
URL	http://arxiv.org/abs/1801.00885v2
PDF	http://arxiv.org/pdf/1801.00885v2.pdf
PWC	https://paperswithcode.com/paper/gradient-based-optimization-for-regression-in
Repo
Framework

Joint Training of Candidate Extraction and Answer Selection for Reading Comprehension


Title	Joint Training of Candidate Extraction and Answer Selection for Reading Comprehension
Authors	Zhen Wang, Jiachen Liu, Xinyan Xiao, Yajuan Lyu, Tian Wu
Abstract	While sophisticated neural-based techniques have been developed in reading comprehension, most approaches model the answer in an independent manner, ignoring its relations with other answer candidates. This problem can be even worse in open-domain scenarios, where candidates from multiple passages should be combined to answer a single question. In this paper, we formulate reading comprehension as an extract-then-select two-stage procedure. We first extract answer candidates from passages, then select the final answer by combining information from all the candidates. Furthermore, we regard candidate extraction as a latent variable and train the two-stage process jointly with reinforcement learning. As a result, our approach has improved the state-of-the-art performance significantly on two challenging open-domain reading comprehension datasets. Further analysis demonstrates the effectiveness of our model components, especially the information fusion of all the candidates and the joint training of the extract-then-select procedure.
Tasks	Answer Selection, Reading Comprehension
Published	2018-05-16
URL	http://arxiv.org/abs/1805.06145v1
PDF	http://arxiv.org/pdf/1805.06145v1.pdf
PWC	https://paperswithcode.com/paper/joint-training-of-candidate-extraction-and
Repo
Framework

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits


Title	Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
Authors	Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Mohammad Ghavamzadeh, Tor Lattimore
Abstract	We propose a bandit algorithm that explores by randomizing its history of rewards. Specifically, it pulls the arm with the highest mean reward in a non-parametric bootstrap sample of its history with pseudo rewards. We design the pseudo rewards such that the bootstrap mean is optimistic with a sufficiently high probability. We call our algorithm Giro, which stands for garbage in, reward out. We analyze Giro in a Bernoulli bandit and derive a $O(K \Delta^{-1} \log n)$ bound on its $n$-round regret, where $\Delta$ is the difference in the expected rewards of the optimal and the best suboptimal arms, and $K$ is the number of arms. The main advantage of our exploration design is that it easily generalizes to structured problems. To show this, we propose contextual Giro with an arbitrary reward generalization model. We evaluate Giro and its contextual variant on multiple synthetic and real-world problems, and observe that it performs well.
Tasks	Multi-Armed Bandits
Published	2018-11-13
URL	https://arxiv.org/abs/1811.05154v3
PDF	https://arxiv.org/pdf/1811.05154v3.pdf
PWC	https://paperswithcode.com/paper/garbage-in-reward-out-bootstrapping
Repo
Framework