May 6, 2019

2902 words 14 mins read

Paper Group ANR 381

Multiple-Instance Logistic Regression with LASSO Penalty. The THUMOS Challenge on Action Recognition for Videos “in the Wild”. Implicit LOD using points ordering for processing and visualisation in Point Cloud Servers. Using Sentence-Level LSTM Language Models for Script Inference. Compartmental analysis of dynamic nuclear medicine data: regulariza …

Multiple-Instance Logistic Regression with LASSO Penalty


Title	Multiple-Instance Logistic Regression with LASSO Penalty
Authors	Ray-Bing Chen, Kuang-Hung Cheng, Sheng-Mao Chang, Shuen-Lin Jeng, Ping-Yang Chen, Chun-Hao Yang, Chi-Chun Hsia
Abstract	In this work, we consider a manufactory process which can be described by a multiple-instance logistic regression model. In order to compute the maximum likelihood estimation of the unknown coefficient, an expectation-maximization algorithm is proposed, and the proposed modeling approach can be extended to identify the important covariates by adding the coefficient penalty term into the likelihood function. In addition to essential technical details, we demonstrate the usefulness of the proposed method by simulations and real examples.
Tasks
Published	2016-07-13
URL	http://arxiv.org/abs/1607.03615v1
PDF	http://arxiv.org/pdf/1607.03615v1.pdf
PWC	https://paperswithcode.com/paper/multiple-instance-logistic-regression-with
Repo
Framework

The THUMOS Challenge on Action Recognition for Videos “in the Wild”


Title	The THUMOS Challenge on Action Recognition for Videos “in the Wild”
Authors	Haroon Idrees, Amir R. Zamir, Yu-Gang Jiang, Alex Gorban, Ivan Laptev, Rahul Sukthankar, Mubarak Shah
Abstract	Automatically recognizing and localizing wide ranges of human actions has crucial importance for video understanding. Towards this goal, the THUMOS challenge was introduced in 2013 to serve as a benchmark for action recognition. Until then, video action recognition, including THUMOS challenge, had focused primarily on the classification of pre-segmented (i.e., trimmed) videos, which is an artificial task. In THUMOS 2014, we elevated action recognition to a more practical level by introducing temporally untrimmed videos. These also include `background videos’ which share similar scenes and backgrounds as action videos, but are devoid of the specific actions. The three editions of the challenge organized in 2013–2015 have made THUMOS a common benchmark for action classification and detection and the annual challenge is widely attended by teams from around the world. In this paper we describe the THUMOS benchmark in detail and give an overview of data collection and annotation procedures. We present the evaluation protocols used to quantify results in the two THUMOS tasks of action classification and temporal detection. We also present results of submissions to the THUMOS 2015 challenge and review the participating approaches. Additionally, we include a comprehensive empirical study evaluating the differences in action recognition between trimmed and untrimmed videos, and how well methods trained on trimmed videos generalize to untrimmed videos. We conclude by proposing several directions and improvements for future THUMOS challenges. \|
Tasks	Action Classification, Temporal Action Localization, Video Understanding
Published	2016-04-21
URL	http://arxiv.org/abs/1604.06182v1
PDF	http://arxiv.org/pdf/1604.06182v1.pdf
PWC	https://paperswithcode.com/paper/the-thumos-challenge-on-action-recognition
Repo
Framework

Implicit LOD using points ordering for processing and visualisation in Point Cloud Servers


Title	Implicit LOD using points ordering for processing and visualisation in Point Cloud Servers
Authors	Rémi Cura, Julien Perret, Nicolas Paparoditis
Abstract	Lidar datasets now commonly reach Billions of points and are very dense. Using these point cloud becomes challenging, as the high number of points is intractable for most applications and for visualisation.In this work we propose a new paradigm to easily get a portable geometric Level Of Details (LOD) inside a Point Cloud Server.The main idea is to not store the LOD information in an external additional file, but instead to store it implicitly by exploiting the order of the points.The point cloud is divided into groups (patches). These patches are ordered so that their order gradually provides more and more details on the patch. We demonstrate the interest of our method with several classical uses of LOD, such as visualisation of massive point cloud, algorithm acceleration, fast density peak detection and correction.
Tasks
Published	2016-02-22
URL	http://arxiv.org/abs/1602.06920v3
PDF	http://arxiv.org/pdf/1602.06920v3.pdf
PWC	https://paperswithcode.com/paper/implicit-lod-using-points-ordering-for
Repo
Framework

Using Sentence-Level LSTM Language Models for Script Inference


Title	Using Sentence-Level LSTM Language Models for Script Inference
Authors	Karl Pichotta, Raymond J. Mooney
Abstract	There is a small but growing body of research on statistical scripts, models of event sequences that allow probabilistic inference of implicit events from documents. These systems operate on structured verb-argument events produced by an NLP pipeline. We compare these systems with recent Recurrent Neural Net models that directly operate on raw tokens to predict sentences, finding the latter to be roughly comparable to the former in terms of predicting missing events in documents.
Tasks
Published	2016-04-11
URL	http://arxiv.org/abs/1604.02993v2
PDF	http://arxiv.org/pdf/1604.02993v2.pdf
PWC	https://paperswithcode.com/paper/using-sentence-level-lstm-language-models-for
Repo
Framework

Compartmental analysis of dynamic nuclear medicine data: regularization procedure and application to physiology


Title	Compartmental analysis of dynamic nuclear medicine data: regularization procedure and application to physiology
Authors	Delbary Fabrice, Garbarino Sara
Abstract	Compartmental models based on tracer mass balance are extensively used in clinical and pre-clinical nuclear medicine in order to obtain quantitative information on tracer metabolism in the biological tissue. This paper is the second of a series of two that deal with the problem of tracer coefficient estimation via compartmental modelling in an inverse problem framework. While the previous work was devoted to the discussion of identifiability issues for 2, 3 and n-dimension compartmental systems, here we discuss the problem of numerically determining the tracer coefficients by means of a general regularized Multivariate Gauss Newton scheme. In this paper, applications concerning cerebral, hepatic and renal functions are considered, involving experimental measurements on FDG-PET data on different set of murine models.
Tasks
Published	2016-08-05
URL	http://arxiv.org/abs/1608.01825v1
PDF	http://arxiv.org/pdf/1608.01825v1.pdf
PWC	https://paperswithcode.com/paper/compartmental-analysis-of-dynamic-nuclear
Repo
Framework

Learning to generalize to new compositions in image understanding


Title	Learning to generalize to new compositions in image understanding
Authors	Yuval Atzmon, Jonathan Berant, Vahid Kezami, Amir Globerson, Gal Chechik
Abstract	Recurrent neural networks have recently been used for learning to describe images using natural language. However, it has been observed that these models generalize poorly to scenes that were not observed during training, possibly depending too strongly on the statistics of the text in the training data. Here we propose to describe images using short structured representations, aiming to capture the crux of a description. These structured representations allow us to tease-out and evaluate separately two types of generalization: standard generalization to new images with similar scenes, and generalization to new combinations of known entities. We compare two learning approaches on the MS-COCO dataset: a state-of-the-art recurrent network based on an LSTM (Show, Attend and Tell), and a simple structured prediction model on top of a deep network. We find that the structured model generalizes to new compositions substantially better than the LSTM, ~7 times the accuracy of predicting structured representations. By providing a concrete method to quantify generalization for unseen combinations, we argue that structured representations and compositional splits are a useful benchmark for image captioning, and advocate compositional models that capture linguistic and visual structure.
Tasks	Image Captioning, Structured Prediction
Published	2016-08-27
URL	http://arxiv.org/abs/1608.07639v1
PDF	http://arxiv.org/pdf/1608.07639v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-generalize-to-new-compositions-in
Repo
Framework

Improvements in Sub-optimal Solving of the $(N^2-1)$-Puzzle via Joint Relocation of Pebbles and its Applications to Rule-based Cooperative Path-Finding


Title	Improvements in Sub-optimal Solving of the $(N^2-1)$-Puzzle via Joint Relocation of Pebbles and its Applications to Rule-based Cooperative Path-Finding
Authors	Pavel Surynek, Petr Michalík
Abstract	The problem of solving $(n^2-1)$-puzzle and cooperative path-finding (CPF) sub-optimally by rule based algorithms is addressed in this manuscript. The task in the puzzle is to rearrange $n^2-1$ pebbles on the square grid of the size of n x n using one vacant position to a desired goal configuration. An improvement to the existent polynomial-time algorithm is proposed and experimentally analyzed. The improved algorithm is trying to move pebbles in a more efficient way than the original algorithm by grouping them into so-called snakes and moving them jointly within the snake. An experimental evaluation showed that the algorithm using snakes produces solutions that are 8% to 9% shorter than solutions generated by the original algorithm. The snake-based relocation has been also integrated into rule-based algorithms for solving the CPF problem sub-optimally, which is a closely related task. The task in CPF is to relocate a group of abstract robots that move over an undirected graph to given goal vertices. Robots can move to unoccupied neighboring vertices and at most one robot can be placed in each vertex. The $(n^2-1)$-puzzle is a special case of CPF where the underlying graph is represented by a 4-connected grid and there is only one vacant vertex. Two major rule-based algorithms for CPF were included in our study - BIBOX and PUSH-and-SWAP (PUSH-and-ROTATE). Improvements gained by using snakes in the BIBOX algorithm were stable around 30% in $(n^2-1)$-puzzle solving and up to 50% in CPFs over bi-connected graphs with various ear decompositions and multiple vacant vertices. In the case of the PUSH-and-SWAP algorithm the improvement achieved by snakes was around 5% to 8%. However, the improvement was unstable and hardly predictable in the case of PUSH-and-SWAP.
Tasks
Published	2016-10-17
URL	http://arxiv.org/abs/1610.04964v1
PDF	http://arxiv.org/pdf/1610.04964v1.pdf
PWC	https://paperswithcode.com/paper/improvements-in-sub-optimal-solving-of-the-n2
Repo
Framework

Fully Convolutional Crowd Counting On Highly Congested Scenes


Title	Fully Convolutional Crowd Counting On Highly Congested Scenes
Authors	Mark Marsden, Kevin McGuinness, Suzanne Little, Noel E. O’Connor
Abstract	In this paper we advance the state-of-the-art for crowd counting in high density scenes by further exploring the idea of a fully convolutional crowd counting model introduced by (Zhang et al., 2016). Producing an accurate and robust crowd count estimator using computer vision techniques has attracted significant research interest in recent years. Applications for crowd counting systems exist in many diverse areas including city planning, retail, and of course general public safety. Developing a highly generalised counting model that can be deployed in any surveillance scenario with any camera perspective is the key objective for research in this area. Techniques developed in the past have generally performed poorly in highly congested scenes with several thousands of people in frame (Rodriguez et al., 2011). Our approach, influenced by the work of (Zhang et al., 2016), consists of the following contributions: (1) A training set augmentation scheme that minimises redundancy among training samples to improve model generalisation and overall counting performance; (2) a deep, single column, fully convolutional network (FCN) architecture; (3) a multi-scale averaging step during inference. The developed technique can analyse images of any resolution or aspect ratio and achieves state-of-the-art counting performance on the Shanghaitech Part B and UCF CC 50 datasets as well as competitive performance on Shanghaitech Part A.
Tasks	Crowd Counting
Published	2016-12-01
URL	http://arxiv.org/abs/1612.00220v2
PDF	http://arxiv.org/pdf/1612.00220v2.pdf
PWC	https://paperswithcode.com/paper/fully-convolutional-crowd-counting-on-highly
Repo
Framework


Title	Distributed Probabilistic Bisection Search using Social Learning
Authors	Athanasios Tsiligkaridis, Theodoros Tsiligkaridis
Abstract	We present a novel distributed probabilistic bisection algorithm using social learning with application to target localization. Each agent in the network first constructs a query about the target based on its local information and obtains a noisy response. Agents then perform a Bayesian update of their beliefs followed by an averaging of the log beliefs over local neighborhoods. This two stage algorithm consisting of repeated querying and averaging runs until convergence. We derive bounds on the rate of convergence of the beliefs at the correct target location. Numerical simulations show that our method outperforms current state of the art methods.
Tasks
Published	2016-08-21
URL	http://arxiv.org/abs/1608.06007v2
PDF	http://arxiv.org/pdf/1608.06007v2.pdf
PWC	https://paperswithcode.com/paper/distributed-probabilistic-bisection-search
Repo
Framework

Neural networks based EEG-Speech Models


Title	Neural networks based EEG-Speech Models
Authors	Pengfei Sun, Jun Qin
Abstract	In this paper, we propose an end-to-end neural network (NN) based EEG-speech (NES) modeling framework, in which three network structures are developed to map imagined EEG signals to phonemes. The proposed NES models incorporate a language model based EEG feature extraction layer, an acoustic feature mapping layer, and a restricted Boltzmann machine (RBM) based the feature learning layer. The NES models can jointly realize the representation of multichannel EEG signals and the projection of acoustic speech signals. Among three proposed NES models, two augmented networks utilize spoken EEG signals as either bias or gate information to strengthen the feature learning and translation of imagined EEG signals. Experimental results show that all three proposed NES models outperform the baseline support vector machine (SVM) method on EEG-speech classification. With respect to binary classification, our approach achieves comparable results relative to deep believe network approach.
Tasks	EEG, Language Modelling
Published	2016-12-16
URL	http://arxiv.org/abs/1612.05369v2
PDF	http://arxiv.org/pdf/1612.05369v2.pdf
PWC	https://paperswithcode.com/paper/neural-networks-based-eeg-speech-models
Repo
Framework

A Semidefinite Program for Structured Blockmodels


Title	A Semidefinite Program for Structured Blockmodels
Authors	David Choi
Abstract	Semidefinite programs have recently been developed for the problem of community detection, which may be viewed as a special case of the stochastic blockmodel. Here, we develop a semidefinite program that can be tailored to other instances of the blockmodel, such as non-assortative networks and overlapping communities. We establish label recovery in sparse settings, with conditions that are analogous to recent results for community detection. In settings where the data is not generated by a blockmodel, we give an oracle inequality that bounds excess risk relative to the best blockmodel approximation. Simulations are presented for community detection, for overlapping communities, and for latent space models.
Tasks	Community Detection
Published	2016-11-16
URL	http://arxiv.org/abs/1611.05407v1
PDF	http://arxiv.org/pdf/1611.05407v1.pdf
PWC	https://paperswithcode.com/paper/a-semidefinite-program-for-structured
Repo
Framework

Scene Invariant Crowd Segmentation and Counting Using Scale-Normalized Histogram of Moving Gradients (HoMG)


Title	Scene Invariant Crowd Segmentation and Counting Using Scale-Normalized Histogram of Moving Gradients (HoMG)
Authors	Parthipan Siva, Mohammad Javad Shafiee, Mike Jamieson, Alexander Wong
Abstract	The problem of automated crowd segmentation and counting has garnered significant interest in the field of video surveillance. This paper proposes a novel scene invariant crowd segmentation and counting algorithm designed with high accuracy yet low computational complexity in mind, which is key for widespread industrial adoption. A novel low-complexity, scale-normalized feature called Histogram of Moving Gradients (HoMG) is introduced for highly effective spatiotemporal representation of individuals and crowds within a video. Real-time crowd segmentation is achieved via boosted cascade of weak classifiers based on sliding-window HoMG features, while linear SVM regression of crowd-region HoMG features is employed for real-time crowd counting. Experimental results using multi-camera crowd datasets show that the proposed algorithm significantly outperform state-of-the-art crowd counting algorithms, as well as achieve very promising crowd segmentation results, thus demonstrating the efficacy of the proposed method for highly-accurate, real-time video-driven crowd analysis.
Tasks	Crowd Counting
Published	2016-02-01
URL	http://arxiv.org/abs/1602.00386v1
PDF	http://arxiv.org/pdf/1602.00386v1.pdf
PWC	https://paperswithcode.com/paper/scene-invariant-crowd-segmentation-and
Repo
Framework

Reducing the Model Order of Deep Neural Networks Using Information Theory


Title	Reducing the Model Order of Deep Neural Networks Using Information Theory
Authors	Ming Tu, Visar Berisha, Yu Cao, Jae-sun Seo
Abstract	Deep neural networks are typically represented by a much larger number of parameters than shallow models, making them prohibitive for small footprint devices. Recent research shows that there is considerable redundancy in the parameter space of deep neural networks. In this paper, we propose a method to compress deep neural networks by using the Fisher Information metric, which we estimate through a stochastic optimization method that keeps track of second-order information in the network. We first remove unimportant parameters and then use non-uniform fixed point quantization to assign more bits to parameters with higher Fisher Information estimates. We evaluate our method on a classification task with a convolutional neural network trained on the MNIST data set. Experimental results show that our method outperforms existing methods for both network pruning and quantization.
Tasks	Network Pruning, Quantization, Stochastic Optimization
Published	2016-05-16
URL	http://arxiv.org/abs/1605.04859v1
PDF	http://arxiv.org/pdf/1605.04859v1.pdf
PWC	https://paperswithcode.com/paper/reducing-the-model-order-of-deep-neural
Repo
Framework

Decentralized Collaborative Learning of Personalized Models over Networks


Title	Decentralized Collaborative Learning of Personalized Models over Networks
Authors	Paul Vanhaesebrouck, Aurélien Bellet, Marc Tommasi
Abstract	We consider a set of learning agents in a collaborative peer-to-peer network, where each agent learns a personalized model according to its own learning objective. The question addressed in this paper is: how can agents improve upon their locally trained model by communicating with other agents that have similar objectives? We introduce and analyze two asynchronous gossip algorithms running in a fully decentralized manner. Our first approach, inspired from label propagation, aims to smooth pre-trained local models over the network while accounting for the confidence that each agent has in its initial model. In our second approach, agents jointly learn and propagate their model by making iterative updates based on both their local dataset and the behavior of their neighbors. To optimize this challenging objective, our decentralized algorithm is based on ADMM.
Tasks
Published	2016-10-17
URL	http://arxiv.org/abs/1610.05202v2
PDF	http://arxiv.org/pdf/1610.05202v2.pdf
PWC	https://paperswithcode.com/paper/decentralized-collaborative-learning-of
Repo
Framework

Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search


Title	Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search
Authors	Ali Yahya, Adrian Li, Mrinal Kalakrishnan, Yevgen Chebotar, Sergey Levine
Abstract	In principle, reinforcement learning and policy search methods can enable robots to learn highly complex and general skills that may allow them to function amid the complexity and diversity of the real world. However, training a policy that generalizes well across a wide range of real-world conditions requires far greater quantity and diversity of experience than is practical to collect with a single robot. Fortunately, it is possible for multiple robots to share their experience with one another, and thereby, learn a policy collectively. In this work, we explore distributed and asynchronous policy learning as a means to achieve generalization and improved training times on challenging, real-world manipulation tasks. We propose a distributed and asynchronous version of Guided Policy Search and use it to demonstrate collective policy learning on a vision-based door opening task using four robots. We show that it achieves better generalization, utilization, and training times than the single robot alternative.
Tasks
Published	2016-10-03
URL	http://arxiv.org/abs/1610.00673v1
PDF	http://arxiv.org/pdf/1610.00673v1.pdf
PWC	https://paperswithcode.com/paper/collective-robot-reinforcement-learning-with
Repo
Framework