July 29, 2019

3036 words 15 mins read

Paper Group AWR 166

Paper Group AWR 166

Real-Time Action Detection in Video Surveillance using Sub-Action Descriptor with Multi-CNN. Two-Stream Convolutional Networks for Dynamic Texture Synthesis. Learning Texture Manifolds with the Periodic Spatial GAN. ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks. Robust Video-Based Eye Tracking Using …

Real-Time Action Detection in Video Surveillance using Sub-Action Descriptor with Multi-CNN

Title Real-Time Action Detection in Video Surveillance using Sub-Action Descriptor with Multi-CNN
Authors Cheng-Bin Jin, Shengzhe Li, Hakil Kim
Abstract When we say a person is texting, can you tell the person is walking or sitting? Emphatically, no. In order to solve this incomplete representation problem, this paper presents a sub-action descriptor for detailed action detection. The sub-action descriptor consists of three levels: the posture, the locomotion, and the gesture level. The three levels give three sub-action categories for one action to address the representation problem. The proposed action detection model simultaneously localizes and recognizes the actions of multiple individuals in video surveillance using appearance-based temporal features with multi-CNN. The proposed approach achieved a mean average precision (mAP) of 76.6% at the frame-based and 83.5% at the video-based measurement on the new large-scale ICVL video surveillance dataset that the authors introduce and make available to the community with this paper. Extensive experiments on the benchmark KTH dataset demonstrate that the proposed approach achieved better performance, which in turn boosts the action recognition performance over the state-of-the-art. The action detection model can run at around 25 fps on the ICVL and more than 80 fps on the KTH dataset, which is suitable for real-time surveillance applications.
Tasks Action Detection, Temporal Action Localization
Published 2017-10-10
URL http://arxiv.org/abs/1710.03383v1
PDF http://arxiv.org/pdf/1710.03383v1.pdf
PWC https://paperswithcode.com/paper/real-time-action-detection-in-video
Repo https://github.com/ChengBinJin/ActionViewer
Framework none

Two-Stream Convolutional Networks for Dynamic Texture Synthesis

Title Two-Stream Convolutional Networks for Dynamic Texture Synthesis
Authors Matthew Tesfaldet, Marcus A. Brubaker, Konstantinos G. Derpanis
Abstract We introduce a two-stream model for dynamic texture synthesis. Our model is based on pre-trained convolutional networks (ConvNets) that target two independent tasks: (i) object recognition, and (ii) optical flow prediction. Given an input dynamic texture, statistics of filter responses from the object recognition ConvNet encapsulate the per-frame appearance of the input texture, while statistics of filter responses from the optical flow ConvNet model its dynamics. To generate a novel texture, a randomly initialized input sequence is optimized to match the feature statistics from each stream of an example texture. Inspired by recent work on image style transfer and enabled by the two-stream model, we also apply the synthesis approach to combine the texture appearance from one texture with the dynamics of another to generate entirely novel dynamic textures. We show that our approach generates novel, high quality samples that match both the framewise appearance and temporal evolution of input texture. Finally, we quantitatively evaluate our texture synthesis approach with a thorough user study.
Tasks Object Recognition, Optical Flow Estimation, Style Transfer, Texture Synthesis
Published 2017-06-21
URL http://arxiv.org/abs/1706.06982v4
PDF http://arxiv.org/pdf/1706.06982v4.pdf
PWC https://paperswithcode.com/paper/two-stream-convolutional-networks-for-dynamic
Repo https://github.com/ryersonvisionlab/two-stream-dyntex-synth
Framework tf

Learning Texture Manifolds with the Periodic Spatial GAN

Title Learning Texture Manifolds with the Periodic Spatial GAN
Authors Urs Bergmann, Nikolay Jetchev, Roland Vollgraf
Abstract This paper introduces a novel approach to texture synthesis based on generative adversarial networks (GAN) (Goodfellow et al., 2014). We extend the structure of the input noise distribution by constructing tensors with different types of dimensions. We call this technique Periodic Spatial GAN (PSGAN). The PSGAN has several novel abilities which surpass the current state of the art in texture synthesis. First, we can learn multiple textures from datasets of one or more complex large images. Second, we show that the image generation with PSGANs has properties of a texture manifold: we can smoothly interpolate between samples in the structured noise space and generate novel samples, which lie perceptually between the textures of the original dataset. In addition, we can also accurately learn periodical textures. We make multiple experiments which show that PSGANs can flexibly handle diverse texture and image data sources. Our method is highly scalable and it can generate output images of arbitrary large size.
Tasks Image Generation, Texture Synthesis
Published 2017-05-18
URL http://arxiv.org/abs/1705.06566v2
PDF http://arxiv.org/pdf/1705.06566v2.pdf
PWC https://paperswithcode.com/paper/learning-texture-manifolds-with-the-periodic
Repo https://github.com/zalandoresearch/psgan
Framework pytorch

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

Title ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
Authors Denis A. Gudovskiy, Luca Rigazio
Abstract In this paper we introduce ShiftCNN, a generalized low-precision architecture for inference of multiplierless convolutional neural networks (CNNs). ShiftCNN is based on a power-of-two weight representation and, as a result, performs only shift and addition operations. Furthermore, ShiftCNN substantially reduces computational cost of convolutional layers by precomputing convolution terms. Such an optimization can be applied to any CNN architecture with a relatively small codebook of weights and allows to decrease the number of product operations by at least two orders of magnitude. The proposed architecture targets custom inference accelerators and can be realized on FPGAs or ASICs. Extensive evaluation on ImageNet shows that the state-of-the-art CNNs can be converted without retraining into ShiftCNN with less than 1% drop in accuracy when the proposed quantization algorithm is employed. RTL simulations, targeting modern FPGAs, show that power consumption of convolutional layers is reduced by a factor of 4 compared to conventional 8-bit fixed-point architectures.
Tasks Quantization
Published 2017-06-07
URL http://arxiv.org/abs/1706.02393v1
PDF http://arxiv.org/pdf/1706.02393v1.pdf
PWC https://paperswithcode.com/paper/shiftcnn-generalized-low-precision
Repo https://github.com/gudovskiy/ShiftCNN
Framework caffe2

Robust Video-Based Eye Tracking Using Recursive Estimation of Pupil Characteristics

Title Robust Video-Based Eye Tracking Using Recursive Estimation of Pupil Characteristics
Authors Terence Brouns
Abstract Video-based eye tracking is a valuable technique in various research fields. Numerous open-source eye tracking algorithms have been developed in recent years, primarily designed for general application with many different camera types. These algorithms do not, however, capitalize on the high frame rate of eye tracking cameras often employed in psychophysical studies. We present a pupil detection method that utilizes this high-speed property to obtain reliable predictions through recursive estimation about certain pupil characteristics in successive camera frames. These predictions are subsequently used to carry out novel image segmentation and classification routines to improve pupil detection performance. Based on results from hand-labelled eye images, our approach was found to have a greater detection rate, accuracy and speed compared to other recently published open-source pupil detection algorithms. The program’s source code, together with a graphical user interface, can be downloaded at https://github.com/tbrouns/eyestalker
Tasks Eye Tracking, Semantic Segmentation
Published 2017-06-25
URL http://arxiv.org/abs/1706.08189v2
PDF http://arxiv.org/pdf/1706.08189v2.pdf
PWC https://paperswithcode.com/paper/robust-video-based-eye-tracking-using
Repo https://github.com/tbrouns/eyestalker
Framework none

From Monte Carlo to Las Vegas: Improving Restricted Boltzmann Machine Training Through Stopping Sets

Title From Monte Carlo to Las Vegas: Improving Restricted Boltzmann Machine Training Through Stopping Sets
Authors Pedro H. P. Savarese, Mayank Kakodkar, Bruno Ribeiro
Abstract We propose a Las Vegas transformation of Markov Chain Monte Carlo (MCMC) estimators of Restricted Boltzmann Machines (RBMs). We denote our approach Markov Chain Las Vegas (MCLV). MCLV gives statistical guarantees in exchange for random running times. MCLV uses a stopping set built from the training data and has maximum number of Markov chain steps K (referred as MCLV-K). We present a MCLV-K gradient estimator (LVS-K) for RBMs and explore the correspondence and differences between LVS-K and Contrastive Divergence (CD-K), with LVS-K significantly outperforming CD-K training RBMs over the MNIST dataset, indicating MCLV to be a promising direction in learning generative models.
Tasks
Published 2017-11-22
URL http://arxiv.org/abs/1711.08442v1
PDF http://arxiv.org/pdf/1711.08442v1.pdf
PWC https://paperswithcode.com/paper/from-monte-carlo-to-las-vegas-improving
Repo https://github.com/PurdueMINDS/MCLV-RBM
Framework pytorch

Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations

Title Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations
Authors Weinan E, Jiequn Han, Arnulf Jentzen
Abstract We propose a new algorithm for solving parabolic partial differential equations (PDEs) and backward stochastic differential equations (BSDEs) in high dimension, by making an analogy between the BSDE and reinforcement learning with the gradient of the solution playing the role of the policy function, and the loss function given by the error between the prescribed terminal condition and the solution of the BSDE. The policy function is then approximated by a neural network, as is done in deep reinforcement learning. Numerical results using TensorFlow illustrate the efficiency and accuracy of the proposed algorithms for several 100-dimensional nonlinear PDEs from physics and finance such as the Allen-Cahn equation, the Hamilton-Jacobi-Bellman equation, and a nonlinear pricing model for financial derivatives.
Tasks
Published 2017-06-15
URL http://arxiv.org/abs/1706.04702v1
PDF http://arxiv.org/pdf/1706.04702v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-based-numerical-methods-for
Repo https://github.com/yiliu1/Machine-Learning-in-Finance
Framework none

PixelNet: Representation of the pixels, by the pixels, and for the pixels

Title PixelNet: Representation of the pixels, by the pixels, and for the pixels
Authors Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan
Abstract We explore design principles for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation. Convolutional predictors, such as the fully-convolutional network (FCN), have achieved remarkable success by exploiting the spatial redundancy of neighboring pixels through convolutional processing. Though computationally efficient, we point out that such approaches are not statistically efficient during learning precisely because spatial redundancy limits the information learned from neighboring pixels. We demonstrate that stratified sampling of pixels allows one to (1) add diversity during batch updates, speeding up learning; (2) explore complex nonlinear predictors, improving accuracy; and (3) efficiently train state-of-the-art models tabula rasa (i.e., “from scratch”) for diverse pixel-labeling tasks. Our single architecture produces state-of-the-art results for semantic segmentation on PASCAL-Context dataset, surface normal estimation on NYUDv2 depth dataset, and edge detection on BSDS.
Tasks Edge Detection, Semantic Segmentation
Published 2017-02-21
URL http://arxiv.org/abs/1702.06506v1
PDF http://arxiv.org/pdf/1702.06506v1.pdf
PWC https://paperswithcode.com/paper/pixelnet-representation-of-the-pixels-by-the
Repo https://github.com/bdecost/pixelnet
Framework tf

Uncertainty-Aware Learning from Demonstration using Mixture Density Networks with Sampling-Free Variance Modeling

Title Uncertainty-Aware Learning from Demonstration using Mixture Density Networks with Sampling-Free Variance Modeling
Authors Sungjoon Choi, Kyungjae Lee, Sungbin Lim, Songhwai Oh
Abstract In this paper, we propose an uncertainty-aware learning from demonstration method by presenting a novel uncertainty estimation method utilizing a mixture density network appropriate for modeling complex and noisy human behaviors. The proposed uncertainty acquisition can be done with a single forward path without Monte Carlo sampling and is suitable for real-time robotics applications. The properties of the proposed uncertainty measure are analyzed through three different synthetic examples, absence of data, heavy measurement noise, and composition of functions scenarios. We show that each case can be distinguished using the proposed uncertainty measure and presented an uncertainty-aware learn- ing from demonstration method of an autonomous driving using this property. The proposed uncertainty-aware learning from demonstration method outperforms other compared methods in terms of safety using a complex real-world driving dataset.
Tasks Autonomous Driving
Published 2017-09-03
URL http://arxiv.org/abs/1709.02249v2
PDF http://arxiv.org/pdf/1709.02249v2.pdf
PWC https://paperswithcode.com/paper/uncertainty-aware-learning-from-demonstration
Repo https://github.com/MayarLotfy/MCDO
Framework tf

Network classification with applications to brain connectomics

Title Network classification with applications to brain connectomics
Authors Jesús D. Arroyo-Relión, Daniel Kessler, Elizaveta Levina, Stephan F. Taylor
Abstract While statistical analysis of a single network has received a lot of attention in recent years, with a focus on social networks, analysis of a sample of networks presents its own challenges which require a different set of analytic tools. Here we study the problem of classification of networks with labeled nodes, motivated by applications in neuroimaging. Brain networks are constructed from imaging data to represent functional connectivity between regions of the brain, and previous work has shown the potential of such networks to distinguish between various brain disorders, giving rise to a network classification problem. Existing approaches tend to either treat all edge weights as a long vector, ignoring the network structure, or focus on graph topology as represented by summary measures while ignoring the edge weights. Our goal is to design a classification method that uses both the individual edge information and the network structure of the data in a computationally efficient way, and that can produce a parsimonious and interpretable representation of differences in brain connectivity patterns between classes. We propose a graph classification method that uses edge weights as predictors but incorporates the network nature of the data via penalties that promote sparsity in the number of nodes, in addition to the usual sparsity penalties that encourage selection of edges. We implement the method via efficient convex optimization and provide a detailed analysis of data from two fMRI studies of schizophrenia.
Tasks Graph Classification
Published 2017-01-27
URL http://arxiv.org/abs/1701.08140v3
PDF http://arxiv.org/pdf/1701.08140v3.pdf
PWC https://paperswithcode.com/paper/network-classification-with-applications-to
Repo https://github.com/jesusdaniel/graphclass
Framework none

Universal Reinforcement Learning Algorithms: Survey and Experiments

Title Universal Reinforcement Learning Algorithms: Survey and Experiments
Authors John Aslanides, Jan Leike, Marcus Hutter
Abstract Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open-source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.
Tasks
Published 2017-05-30
URL http://arxiv.org/abs/1705.10557v1
PDF http://arxiv.org/pdf/1705.10557v1.pdf
PWC https://paperswithcode.com/paper/universal-reinforcement-learning-algorithms
Repo https://github.com/aslanides/aixijs
Framework none

Genetic CNN

Title Genetic CNN
Authors Lingxi Xie, Alan Yuille
Abstract The deep Convolutional Neural Network (CNN) is the state-of-the-art solution for large-scale visual recognition. Following basic principles such as increasing the depth and constructing highway connections, researchers have manually designed a lot of fixed network structures and verified their effectiveness. In this paper, we discuss the possibility of learning deep network structures automatically. Note that the number of possible network structures increases exponentially with the number of layers in the network, which inspires us to adopt the genetic algorithm to efficiently traverse this large search space. We first propose an encoding method to represent each network structure in a fixed-length binary string, and initialize the genetic algorithm by generating a set of randomized individuals. In each generation, we define standard genetic operations, e.g., selection, mutation and crossover, to eliminate weak individuals and then generate more competitive ones. The competitiveness of each individual is defined as its recognition accuracy, which is obtained via training the network from scratch and evaluating it on a validation set. We run the genetic process on two small datasets, i.e., MNIST and CIFAR10, demonstrating its ability to evolve and find high-quality structures which are little studied before. These structures are also transferrable to the large-scale ILSVRC2012 dataset.
Tasks Object Recognition
Published 2017-03-04
URL http://arxiv.org/abs/1703.01513v1
PDF http://arxiv.org/pdf/1703.01513v1.pdf
PWC https://paperswithcode.com/paper/genetic-cnn
Repo https://github.com/gmontamat/gentun
Framework tf

Deep generative models of genetic variation capture mutation effects

Title Deep generative models of genetic variation capture mutation effects
Authors Adam J. Riesselman, John B. Ingraham, Debora S. Marks
Abstract The functions of proteins and RNAs are determined by a myriad of interactions between their constituent residues, but most quantitative models of how molecular phenotype depends on genotype must approximate this by simple additive effects. While recent models have relaxed this constraint to also account for pairwise interactions, these approaches do not provide a tractable path towards modeling higher-order dependencies. Here, we show how latent variable models with nonlinear dependencies can be applied to capture beyond-pairwise constraints in biomolecules. We present a new probabilistic model for sequence families, DeepSequence, that can predict the effects of mutations across a variety of deep mutational scanning experiments significantly better than site independent or pairwise models that are based on the same evolutionary data. The model, learned in an unsupervised manner solely from sequence information, is grounded with biologically motivated priors, reveals latent organization of sequence families, and can be used to extrapolate to new parts of sequence space
Tasks Latent Variable Models
Published 2017-12-18
URL http://arxiv.org/abs/1712.06527v1
PDF http://arxiv.org/pdf/1712.06527v1.pdf
PWC https://paperswithcode.com/paper/deep-generative-models-of-genetic-variation
Repo https://github.com/samsinai/VAE_protein_function
Framework none

Learning Convolutional Text Representations for Visual Question Answering

Title Learning Convolutional Text Representations for Visual Question Answering
Authors Zhengyang Wang, Shuiwang Ji
Abstract Visual question answering is a recently proposed artificial intelligence task that requires a deep understanding of both images and texts. In deep learning, images are typically modeled through convolutional neural networks, and texts are typically modeled through recurrent neural networks. While the requirement for modeling images is similar to traditional computer vision tasks, such as object recognition and image classification, visual question answering raises a different need for textual representation as compared to other natural language processing tasks. In this work, we perform a detailed analysis on natural language questions in visual question answering. Based on the analysis, we propose to rely on convolutional neural networks for learning textual representations. By exploring the various properties of convolutional neural networks specialized for text data, such as width and depth, we present our “CNN Inception + Gate” model. We show that our model improves question representations and thus the overall accuracy of visual question answering models. We also show that the text representation requirement in visual question answering is more complicated and comprehensive than that in conventional natural language processing tasks, making it a better task to evaluate textual representation methods. Shallow models like fastText, which can obtain comparable results with deep learning models in tasks like text classification, are not suitable in visual question answering.
Tasks Visual Question Answering
Published 2017-05-18
URL http://arxiv.org/abs/1705.06824v2
PDF http://arxiv.org/pdf/1705.06824v2.pdf
PWC https://paperswithcode.com/paper/learning-convolutional-text-representations
Repo https://github.com/divelab/vqa-text
Framework caffe2

Is This a Joke? Detecting Humor in Spanish Tweets

Title Is This a Joke? Detecting Humor in Spanish Tweets
Authors Santiago Castro, Matías Cubero, Diego Garat, Guillermo Moncecchi
Abstract While humor has been historically studied from a psychological, cognitive and linguistic standpoint, its study from a computational perspective is an area yet to be explored in Computational Linguistics. There exist some previous works, but a characterization of humor that allows its automatic recognition and generation is far from being specified. In this work we build a crowdsourced corpus of labeled tweets, annotated according to its humor value, letting the annotators subjectively decide which are humorous. A humor classifier for Spanish tweets is assembled based on supervised learning, reaching a precision of 84% and a recall of 69%.
Tasks Humor Detection
Published 2017-03-28
URL http://arxiv.org/abs/1703.09527v1
PDF http://arxiv.org/pdf/1703.09527v1.pdf
PWC https://paperswithcode.com/paper/is-this-a-joke-detecting-humor-in-spanish
Repo https://github.com/pln-fing-udelar/pghumor
Framework none
comments powered by Disqus