October 16, 2019

3158 words 15 mins read

Paper Group ANR 1064

An Interdisciplinary Comparison of Sequence Modeling Methods for Next-Element Prediction. Counting in Language with RNNs. PSICA: decision trees for probabilistic subgroup identification with categorical treatments. Parallel and Streaming Algorithms for K-Core Decomposition. RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imita …

An Interdisciplinary Comparison of Sequence Modeling Methods for Next-Element Prediction


Title	An Interdisciplinary Comparison of Sequence Modeling Methods for Next-Element Prediction
Authors	Niek Tax, Irene Teinemaa, Sebastiaan J. van Zelst
Abstract	Data of sequential nature arise in many application domains in forms of, e.g. textual data, DNA sequences, and software execution traces. Different research disciplines have developed methods to learn sequence models from such datasets: (i) in the machine learning field methods such as (hidden) Markov models and recurrent neural networks have been developed and successfully applied to a wide-range of tasks, (ii) in process mining process discovery techniques aim to generate human-interpretable descriptive models, and (iii) in the grammar inference field the focus is on finding descriptive models in the form of formal grammars. Despite their different focuses, these fields share a common goal - learning a model that accurately describes the behavior in the underlying data. Those sequence models are generative, i.e, they can predict what elements are likely to occur after a given unfinished sequence. So far, these fields have developed mainly in isolation from each other and no comparison exists. This paper presents an interdisciplinary experimental evaluation that compares sequence modeling techniques on the task of next-element prediction on four real-life sequence datasets. The results indicate that machine learning techniques that generally have no aim at interpretability in terms of accuracy outperform techniques from the process mining and grammar inference fields that aim to yield interpretable models.
Tasks
Published	2018-10-31
URL	http://arxiv.org/abs/1811.00062v1
PDF	http://arxiv.org/pdf/1811.00062v1.pdf
PWC	https://paperswithcode.com/paper/an-interdisciplinary-comparison-of-sequence
Repo
Framework

Counting in Language with RNNs


Title	Counting in Language with RNNs
Authors	Heng xin Fun, Sergiy V Bokhnyak, Francesco Saverio Zuppichini
Abstract	In this paper we examine a possible reason for the LSTM outperforming the GRU on language modeling and more specifically machine translation. We hypothesize that this has to do with counting. This is a consistent theme across the literature of long term dependence, counting, and language modeling for RNNs. Using the simplified forms of language – Context-Free and Context-Sensitive Languages – we show how exactly the LSTM performs its counting based on their cell states during inference and why the GRU cannot perform as well.
Tasks	Language Modelling, Machine Translation
Published	2018-10-29
URL	http://arxiv.org/abs/1810.12411v2
PDF	http://arxiv.org/pdf/1810.12411v2.pdf
PWC	https://paperswithcode.com/paper/counting-in-language-with-rnns
Repo
Framework

PSICA: decision trees for probabilistic subgroup identification with categorical treatments


Title	PSICA: decision trees for probabilistic subgroup identification with categorical treatments
Authors	Oleg Sysoev, Krzysztof Bartoszek, Eva-Charlotte Ekstrom, Katarina Ekholm Selling
Abstract	Personalized medicine aims at identifying best treatments for a patient with given characteristics. It has been shown in the literature that these methods can lead to great improvements in medicine compared to traditional methods prescribing the same treatment to all patients. Subgroup identification is a branch of personalized medicine which aims at finding subgroups of the patients with similar characteristics for which some of the investigated treatments have a better effect than the other treatments. A number of approaches based on decision trees has been proposed to identify such subgroups, but most of them focus on the two-arm trials (control/treatment) while a few methods consider quantitative treatments (defined by the dose). However, no subgroup identification method exists that can predict the best treatments in a scenario with a categorical set of treatments. We propose a novel method for subgroup identification in categorical treatment scenarios. This method outputs a decision tree showing the probabilities of a given treatment being the best for a given group of patients as well as labels showing the possible best treatments. The method is implemented in an R package \textbf{psica} available at CRAN. In addition to numerical simulations based on artificial data, we present an analysis of a community-based nutrition intervention trial that justifies the validity of our method.
Tasks
Published	2018-11-22
URL	http://arxiv.org/abs/1811.09065v1
PDF	http://arxiv.org/pdf/1811.09065v1.pdf
PWC	https://paperswithcode.com/paper/psica-decision-trees-for-probabilistic
Repo
Framework

Parallel and Streaming Algorithms for K-Core Decomposition


Title	Parallel and Streaming Algorithms for K-Core Decomposition
Authors	Hossein Esfandiari, Silvio Lattanzi, Vahab Mirrokni
Abstract	The $k$-core decomposition is a fundamental primitive in many machine learning and data mining applications. We present the first distributed and the first streaming algorithms to compute and maintain an approximate $k$-core decomposition with provable guarantees. Our algorithms achieve rigorous bounds on space complexity while bounding the number of passes or number of rounds of computation. We do so by presenting a new powerful sketching technique for $k$-core decomposition, and then by showing it can be computed efficiently in both streaming and MapReduce models. Finally, we confirm the effectiveness of our sketching technique empirically on a number of publicly available graphs.
Tasks
Published	2018-08-07
URL	http://arxiv.org/abs/1808.02546v2
PDF	http://arxiv.org/pdf/1808.02546v2.pdf
PWC	https://paperswithcode.com/paper/parallel-and-streaming-algorithms-for-k-core
Repo
Framework

RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation


Title	RoboTurk: A Crowdsourcing Platform for Robotic Skill Learning through Imitation
Authors	Ajay Mandlekar, Yuke Zhu, Animesh Garg, Jonathan Booher, Max Spero, Albert Tung, Julian Gao, John Emmons, Anchit Gupta, Emre Orbay, Silvio Savarese, Li Fei-Fei
Abstract	Imitation Learning has empowered recent advances in learning robotic manipulation tasks by addressing shortcomings of Reinforcement Learning such as exploration and reward specification. However, research in this area has been limited to modest-sized datasets due to the difficulty of collecting large quantities of task demonstrations through existing mechanisms. This work introduces RoboTurk to address this challenge. RoboTurk is a crowdsourcing platform for high quality 6-DoF trajectory based teleoperation through the use of widely available mobile devices (e.g. iPhone). We evaluate RoboTurk on three manipulation tasks of varying timescales (15-120s) and observe that our user interface is statistically similar to special purpose hardware such as virtual reality controllers in terms of task completion times. Furthermore, we observe that poor network conditions, such as low bandwidth and high delay links, do not substantially affect the remote users’ ability to perform task demonstrations successfully on RoboTurk. Lastly, we demonstrate the efficacy of RoboTurk through the collection of a pilot dataset; using RoboTurk, we collected 137.5 hours of manipulation data from remote workers, amounting to over 2200 successful task demonstrations in 22 hours of total system usage. We show that the data obtained through RoboTurk enables policy learning on multi-step manipulation tasks with sparse rewards and that using larger quantities of demonstrations during policy learning provides benefits in terms of both learning consistency and final performance. For additional results, videos, and to download our pilot dataset, visit $\href{http://roboturk.stanford.edu/}{\texttt{roboturk.stanford.edu}}$
Tasks	Imitation Learning
Published	2018-11-07
URL	http://arxiv.org/abs/1811.02790v1
PDF	http://arxiv.org/pdf/1811.02790v1.pdf
PWC	https://paperswithcode.com/paper/roboturk-a-crowdsourcing-platform-for-robotic
Repo
Framework

Visual Reasoning with Multi-hop Feature Modulation


Title	Visual Reasoning with Multi-hop Feature Modulation
Authors	Florian Strub, Mathieu Seurin, Ethan Perez, Harm de Vries, Jérémie Mary, Philippe Preux, Aaron Courville, Olivier Pietquin
Abstract	Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue. For such tasks, one successful approach is to condition image-based convolutional network computation on language via Feature-wise Linear Modulation (FiLM) layers, i.e., per-channel scaling and shifting. We propose to generate the parameters of FiLM layers going up the hierarchy of a convolutional network in a multi-hop fashion rather than all at once, as in prior work. By alternating between attending to the language input and generating FiLM layer parameters, this approach is better able to scale to settings with longer input sequences such as dialogue. We demonstrate that multi-hop FiLM generation achieves state-of-the-art for the short input sequence task ReferIt — on-par with single-hop FiLM generation — while also significantly outperforming prior state-of-the-art and single-hop FiLM generation on the GuessWhat?! visual dialogue task.
Tasks	Question Answering, Visual Dialog, Visual Question Answering, Visual Reasoning
Published	2018-08-03
URL	http://arxiv.org/abs/1808.04446v2
PDF	http://arxiv.org/pdf/1808.04446v2.pdf
PWC	https://paperswithcode.com/paper/visual-reasoning-with-multi-hop-feature
Repo
Framework


Title	Seeing Voices and Hearing Faces: Cross-modal biometric matching
Authors	Arsha Nagrani, Samuel Albanie, Andrew Zisserman
Abstract	We introduce a seemingly impossible task: given only an audio clip of someone speaking, decide which of two face images is the speaker. In this paper we study this, and a number of related cross-modal tasks, aimed at answering the question: how much can we infer from the voice about the face and vice versa? We study this task “in the wild”, employing the datasets that are now publicly available for face recognition from static images (VGGFace) and speaker identification from audio (VoxCeleb). These provide training and testing scenarios for both static and dynamic testing of cross-modal matching. We make the following contributions: (i) we introduce CNN architectures for both binary and multi-way cross-modal face and audio matching, (ii) we compare dynamic testing (where video information is available, but the audio is not from the same video) with static testing (where only a single still image is available), and (iii) we use human testing as a baseline to calibrate the difficulty of the task. We show that a CNN can indeed be trained to solve this task in both the static and dynamic scenarios, and is even well above chance on 10-way classification of the face given the voice. The CNN matches human performance on easy examples (e.g. different gender across faces) but exceeds human performance on more challenging examples (e.g. faces with the same gender, age and nationality).
Tasks	Face Recognition, Speaker Identification
Published	2018-04-01
URL	http://arxiv.org/abs/1804.00326v2
PDF	http://arxiv.org/pdf/1804.00326v2.pdf
PWC	https://paperswithcode.com/paper/seeing-voices-and-hearing-faces-cross-modal
Repo
Framework

Object-Level Representation Learning for Few-Shot Image Classification


Title	Object-Level Representation Learning for Few-Shot Image Classification
Authors	Liangqu Long, Wei Wang, Jun Wen, Meihui Zhang, Qian Lin, Beng Chin Ooi
Abstract	Few-shot learning that trains image classifiers over few labeled examples per category is a challenging task. In this paper, we propose to exploit an additional big dataset with different categories to improve the accuracy of few-shot learning over our target dataset. Our approach is based on the observation that images can be decomposed into objects, which may appear in images from both the additional dataset and our target dataset. We use the object-level relation learned from the additional dataset to infer the similarity of images in our target dataset with unseen categories. Nearest neighbor search is applied to do image classification, which is a non-parametric model and thus does not need fine-tuning. We evaluate our algorithm on two popular datasets, namely Omniglot and MiniImagenet. We obtain 8.5% and 2.7% absolute improvements for 5-way 1-shot and 5-way 5-shot experiments on MiniImagenet, respectively. Source code will be published upon acceptance.
Tasks	Few-Shot Image Classification, Few-Shot Learning, Image Classification, Omniglot, Representation Learning
Published	2018-05-28
URL	http://arxiv.org/abs/1805.10777v1
PDF	http://arxiv.org/pdf/1805.10777v1.pdf
PWC	https://paperswithcode.com/paper/object-level-representation-learning-for-few
Repo
Framework

Bilinear Parameterization For Differentiable Rank-Regularization


Title	Bilinear Parameterization For Differentiable Rank-Regularization
Authors	Marcus Valtonen Örnhag, Carl Olsson, Anders Heyden
Abstract	Low rank approximation is a commonly occurring problem in many computer vision and machine learning applications. There are two common ways of optimizing the resulting models. Either the set of matrices with a given rank can be explicitly parametrized using a bilinear factorization, or low rank can be implicitly enforced using regularization terms penalizing non-zero singular values. While the former approach results in differentiable problems that can be efficiently optimized using local quadratic approximation, the latter is typically not differentiable (sometimes even discontinuous) and requires first order subgradient or splitting methods. It is well known that gradient based methods exhibit slow convergence for ill-conditioned problems. In this paper we show how many non-differentiable regularization methods can be reformulated into smooth objectives using bilinear parameterization. This allows us to use standard second order methods, such as Levenberg–Marquardt (LM) and Variable Projection (VarPro), to achieve accurate solutions for ill-conditioned cases. We show on several real and synthetic experiments that our second order formulation converges to substantially more accurate solutions than competing state-of-the-art methods.
Tasks
Published	2018-11-27
URL	https://arxiv.org/abs/1811.11088v3
PDF	https://arxiv.org/pdf/1811.11088v3.pdf
PWC	https://paperswithcode.com/paper/bilinear-parameterization-for-differentiable
Repo
Framework

Asynchronous decentralized accelerated stochastic gradient descent


Title	Asynchronous decentralized accelerated stochastic gradient descent
Authors	Guanghui Lan, Yi Zhou
Abstract	In this work, we introduce an asynchronous decentralized accelerated stochastic gradient descent type of method for decentralized stochastic optimization, considering communication and synchronization are the major bottlenecks. We establish $\mathcal{O}(1/\epsilon)$ (resp., $\mathcal{O}(1/\sqrt{\epsilon})$) communication complexity and $\mathcal{O}(1/\epsilon^2)$ (resp., $\mathcal{O}(1/\epsilon)$) sampling complexity for solving general convex (resp., strongly convex) problems.
Tasks	Stochastic Optimization
Published	2018-09-24
URL	http://arxiv.org/abs/1809.09258v1
PDF	http://arxiv.org/pdf/1809.09258v1.pdf
PWC	https://paperswithcode.com/paper/asynchronous-decentralized-accelerated
Repo
Framework

Unsupervised Domain Adaptation using Regularized Hyper-graph Matching


Title	Unsupervised Domain Adaptation using Regularized Hyper-graph Matching
Authors	Debasmit Das, C. S. George Lee
Abstract	Domain adaptation (DA) addresses the real-world image classification problem of discrepancy between training (source) and testing (target) data distributions. We propose an unsupervised DA method that considers the presence of only unlabelled data in the target domain. Our approach centers on finding matches between samples of the source and target domains. The matches are obtained by treating the source and target domains as hyper-graphs and carrying out a class-regularized hyper-graph matching using first-, second- and third-order similarities between the graphs. We have also developed a computationally efficient algorithm by initially selecting a subset of the samples to construct a graph and then developing a customized optimization routine for graph-matching based on Conditional Gradient and Alternating Direction Multiplier Method. This allows the proposed method to be used widely. We also performed a set of experiments on standard object recognition datasets to validate the effectiveness of our framework over state-of-the-art approaches.
Tasks	Domain Adaptation, Graph Matching, Image Classification, Object Recognition, Unsupervised Domain Adaptation
Published	2018-05-22
URL	http://arxiv.org/abs/1805.08874v2
PDF	http://arxiv.org/pdf/1805.08874v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-domain-adaptation-using-2
Repo
Framework

Fusion of ANN and SVM Classifiers for Network Attack Detection


Title	Fusion of ANN and SVM Classifiers for Network Attack Detection
Authors	Takwa Omrani, Adel Dallali, Bilgacem Chibani Rhaimi, Jaouhar Fattahi
Abstract	With the progressive increase of network application and electronic devices (computers, mobile phones, android, etc.) attack and intrusion, detection has become a very challenging task in cybercrime detection area. in this context, most of the existing approaches of attack detection rely mainly on a finite set of attacks. These solutions are vulnerable, that is, they fail in detecting some attacks when sources of informations are ambiguous or imperfect. However, few approaches started investigating in this direction. This paper investigates the role of machine learning approach (ANN, SVM) in detecting a TCP connection traffic as a normal or a suspicious one. But, using ANN and SVM is an expensive technique individually. In this paper, combining two classifiers are proposed, where artificial neural network (ANN) classifier and support vector machine (SVM) are both employed. Additionally, our proposed solution allows to visualize obtained classification results. Accuracy of the proposed solution has been compared with other classifier results. Experiments have been conducted with different network connections selected from NSL-KDD DARPA dataset. Empirical results show that combining ANN and SVM techniques for attack detection is a promising direction.
Tasks	Intrusion Detection
Published	2018-01-09
URL	http://arxiv.org/abs/1801.02746v2
PDF	http://arxiv.org/pdf/1801.02746v2.pdf
PWC	https://paperswithcode.com/paper/fusion-of-ann-and-svm-classifiers-for-network
Repo
Framework

Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions


Title	Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions
Authors	Quynh Nguyen, Mahesh Chandra Mukkamala, Matthias Hein
Abstract	In the recent literature the important role of depth in deep learning has been emphasized. In this paper we argue that sufficient width of a feedforward network is equally important by answering the simple question under which conditions the decision regions of a neural network are connected. It turns out that for a class of activation functions including leaky ReLU, neural networks having a pyramidal structure, that is no layer has more hidden units than the input dimension, produce necessarily connected decision regions. This implies that a sufficiently wide hidden layer is necessary to guarantee that the network can produce disconnected decision regions. We discuss the implications of this result for the construction of neural networks, in particular the relation to the problem of adversarial manipulation of classifiers.
Tasks
Published	2018-02-28
URL	http://arxiv.org/abs/1803.00094v3
PDF	http://arxiv.org/pdf/1803.00094v3.pdf
PWC	https://paperswithcode.com/paper/neural-networks-should-be-wide-enough-to
Repo
Framework

A Scalable Heuristic for Fastest-Path Computation on Very Large Road Maps


Title	A Scalable Heuristic for Fastest-Path Computation on Very Large Road Maps
Authors	Renjie Chen, Craig Gotsman
Abstract	Fastest-path queries between two points in a very large road map is an increasingly important primitive in modern transportation and navigation systems, thus very efficient computation of these paths is critical for system performance and throughput. We present a method to compute an effective heuristic for the fastest path travel time between two points on a road map, which can be used to significantly accelerate the classical A* algorithm when computing fastest paths. Our method is based on two hierarchical sets of separators of the map represented by two binary trees. A preprocessing step computes a short vector of values per road junction based on the separator trees, which is then stored with the map and used to efficiently compute the heuristic at the online query stage. We demonstrate experimentally that this method scales well to any map size, providing a better quality heuristic, thus more efficient A* search, for fastest path queries between points at all distances - especially small and medium range - relative to other known heuristics.
Tasks
Published	2018-12-18
URL	http://arxiv.org/abs/1812.07441v1
PDF	http://arxiv.org/pdf/1812.07441v1.pdf
PWC	https://paperswithcode.com/paper/a-scalable-heuristic-for-fastest-path
Repo
Framework

Robust and Efficient Graph Correspondence Transfer for Person Re-identification


Title	Robust and Efficient Graph Correspondence Transfer for Person Re-identification
Authors	Qin Zhou, Heng Fan, Hua Yang, Hang Su, Shibao Zheng, Shuang Wu, Haibin Ling
Abstract	Spatial misalignment caused by variations in poses and viewpoints is one of the most critical issues that hinders the performance improvement in existing person re-identification (Re-ID) algorithms. To address this problem, in this paper, we present a robust and efficient graph correspondence transfer (REGCT) approach for explicit spatial alignment in Re-ID. Specifically, we propose to establish the patch-wise correspondences of positive training pairs via graph matching. By exploiting both spatial and visual contexts of human appearance in graph matching, meaningful semantic correspondences can be obtained. To circumvent the cumbersome \emph{on-line} graph matching in testing phase, we propose to transfer the \emph{off-line} learned patch-wise correspondences from the positive training pairs to test pairs. In detail, for each test pair, the training pairs with similar pose-pair configurations are selected as references. The matching patterns (i.e., the correspondences) of the selected references are then utilized to calculate the patch-wise feature distances of this test pair. To enhance the robustness of correspondence transfer, we design a novel pose context descriptor to accurately model human body configurations, and present an approach to measure the similarity between a pair of pose context descriptors. Meanwhile, to improve testing efficiency, we propose a correspondence template ensemble method using the voting mechanism, which significantly reduces the amount of patch-wise matchings involved in distance calculation. With aforementioned strategies, the REGCT model can effectively and efficiently handle the spatial misalignment problem in Re-ID. Extensive experiments on five challenging benchmarks, including VIPeR, Road, PRID450S, 3DPES and CUHK01, evidence the superior performance of REGCT over other state-of-the-art approaches.
Tasks	Graph Matching, Person Re-Identification
Published	2018-05-15
URL	http://arxiv.org/abs/1805.06323v1
PDF	http://arxiv.org/pdf/1805.06323v1.pdf
PWC	https://paperswithcode.com/paper/robust-and-efficient-graph-correspondence
Repo
Framework