July 29, 2019

3036 words 15 mins read

Paper Group AWR 166

Real-Time Action Detection in Video Surveillance using Sub-Action Descriptor with Multi-CNN. Two-Stream Convolutional Networks for Dynamic Texture Synthesis. Learning Texture Manifolds with the Periodic Spatial GAN. ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks. Robust Video-Based Eye Tracking Using …

Real-Time Action Detection in Video Surveillance using Sub-Action Descriptor with Multi-CNN


Title	Real-Time Action Detection in Video Surveillance using Sub-Action Descriptor with Multi-CNN
Authors	Cheng-Bin Jin, Shengzhe Li, Hakil Kim
Abstract	When we say a person is texting, can you tell the person is walking or sitting? Emphatically, no. In order to solve this incomplete representation problem, this paper presents a sub-action descriptor for detailed action detection. The sub-action descriptor consists of three levels: the posture, the locomotion, and the gesture level. The three levels give three sub-action categories for one action to address the representation problem. The proposed action detection model simultaneously localizes and recognizes the actions of multiple individuals in video surveillance using appearance-based temporal features with multi-CNN. The proposed approach achieved a mean average precision (mAP) of 76.6% at the frame-based and 83.5% at the video-based measurement on the new large-scale ICVL video surveillance dataset that the authors introduce and make available to the community with this paper. Extensive experiments on the benchmark KTH dataset demonstrate that the proposed approach achieved better performance, which in turn boosts the action recognition performance over the state-of-the-art. The action detection model can run at around 25 fps on the ICVL and more than 80 fps on the KTH dataset, which is suitable for real-time surveillance applications.
Tasks	Action Detection, Temporal Action Localization
Published	2017-10-10
URL	http://arxiv.org/abs/1710.03383v1
PDF	http://arxiv.org/pdf/1710.03383v1.pdf
PWC	https://paperswithcode.com/paper/real-time-action-detection-in-video
Repo	https://github.com/ChengBinJin/ActionViewer
Framework	none

Two-Stream Convolutional Networks for Dynamic Texture Synthesis


Title	Two-Stream Convolutional Networks for Dynamic Texture Synthesis
Authors	Matthew Tesfaldet, Marcus A. Brubaker, Konstantinos G. Derpanis
Abstract	We introduce a two-stream model for dynamic texture synthesis. Our model is based on pre-trained convolutional networks (ConvNets) that target two independent tasks: (i) object recognition, and (ii) optical flow prediction. Given an input dynamic texture, statistics of filter responses from the object recognition ConvNet encapsulate the per-frame appearance of the input texture, while statistics of filter responses from the optical flow ConvNet model its dynamics. To generate a novel texture, a randomly initialized input sequence is optimized to match the feature statistics from each stream of an example texture. Inspired by recent work on image style transfer and enabled by the two-stream model, we also apply the synthesis approach to combine the texture appearance from one texture with the dynamics of another to generate entirely novel dynamic textures. We show that our approach generates novel, high quality samples that match both the framewise appearance and temporal evolution of input texture. Finally, we quantitatively evaluate our texture synthesis approach with a thorough user study.
Tasks	Object Recognition, Optical Flow Estimation, Style Transfer, Texture Synthesis
Published	2017-06-21
URL	http://arxiv.org/abs/1706.06982v4
PDF	http://arxiv.org/pdf/1706.06982v4.pdf
PWC	https://paperswithcode.com/paper/two-stream-convolutional-networks-for-dynamic
Repo	https://github.com/ryersonvisionlab/two-stream-dyntex-synth
Framework	tf

Learning Texture Manifolds with the Periodic Spatial GAN


Title	Learning Texture Manifolds with the Periodic Spatial GAN
Authors	Urs Bergmann, Nikolay Jetchev, Roland Vollgraf
Abstract	This paper introduces a novel approach to texture synthesis based on generative adversarial networks (GAN) (Goodfellow et al., 2014). We extend the structure of the input noise distribution by constructing tensors with different types of dimensions. We call this technique Periodic Spatial GAN (PSGAN). The PSGAN has several novel abilities which surpass the current state of the art in texture synthesis. First, we can learn multiple textures from datasets of one or more complex large images. Second, we show that the image generation with PSGANs has properties of a texture manifold: we can smoothly interpolate between samples in the structured noise space and generate novel samples, which lie perceptually between the textures of the original dataset. In addition, we can also accurately learn periodical textures. We make multiple experiments which show that PSGANs can flexibly handle diverse texture and image data sources. Our method is highly scalable and it can generate output images of arbitrary large size.
Tasks	Image Generation, Texture Synthesis
Published	2017-05-18
URL	http://arxiv.org/abs/1705.06566v2
PDF	http://arxiv.org/pdf/1705.06566v2.pdf
PWC	https://paperswithcode.com/paper/learning-texture-manifolds-with-the-periodic
Repo	https://github.com/zalandoresearch/psgan
Framework	pytorch

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks


Title	ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
Authors	Denis A. Gudovskiy, Luca Rigazio
Abstract	In this paper we introduce ShiftCNN, a generalized low-precision architecture for inference of multiplierless convolutional neural networks (CNNs). ShiftCNN is based on a power-of-two weight representation and, as a result, performs only shift and addition operations. Furthermore, ShiftCNN substantially reduces computational cost of convolutional layers by precomputing convolution terms. Such an optimization can be applied to any CNN architecture with a relatively small codebook of weights and allows to decrease the number of product operations by at least two orders of magnitude. The proposed architecture targets custom inference accelerators and can be realized on FPGAs or ASICs. Extensive evaluation on ImageNet shows that the state-of-the-art CNNs can be converted without retraining into ShiftCNN with less than 1% drop in accuracy when the proposed quantization algorithm is employed. RTL simulations, targeting modern FPGAs, show that power consumption of convolutional layers is reduced by a factor of 4 compared to conventional 8-bit fixed-point architectures.
Tasks	Quantization
Published	2017-06-07
URL	http://arxiv.org/abs/1706.02393v1
PDF	http://arxiv.org/pdf/1706.02393v1.pdf
PWC	https://paperswithcode.com/paper/shiftcnn-generalized-low-precision
Repo	https://github.com/gudovskiy/ShiftCNN
Framework	caffe2

Robust Video-Based Eye Tracking Using Recursive Estimation of Pupil Characteristics


Title	Robust Video-Based Eye Tracking Using Recursive Estimation of Pupil Characteristics
Authors	Terence Brouns
Abstract	Video-based eye tracking is a valuable technique in various research fields. Numerous open-source eye tracking algorithms have been developed in recent years, primarily designed for general application with many different camera types. These algorithms do not, however, capitalize on the high frame rate of eye tracking cameras often employed in psychophysical studies. We present a pupil detection method that utilizes this high-speed property to obtain reliable predictions through recursive estimation about certain pupil characteristics in successive camera frames. These predictions are subsequently used to carry out novel image segmentation and classification routines to improve pupil detection performance. Based on results from hand-labelled eye images, our approach was found to have a greater detection rate, accuracy and speed compared to other recently published open-source pupil detection algorithms. The program’s source code, together with a graphical user interface, can be downloaded at https://github.com/tbrouns/eyestalker
Tasks	Eye Tracking, Semantic Segmentation
Published	2017-06-25
URL	http://arxiv.org/abs/1706.08189v2
PDF	http://arxiv.org/pdf/1706.08189v2.pdf
PWC	https://paperswithcode.com/paper/robust-video-based-eye-tracking-using
Repo	https://github.com/tbrouns/eyestalker
Framework	none

From Monte Carlo to Las Vegas: Improving Restricted Boltzmann Machine Training Through Stopping Sets


Title	From Monte Carlo to Las Vegas: Improving Restricted Boltzmann Machine Training Through Stopping Sets
Authors	Pedro H. P. Savarese, Mayank Kakodkar, Bruno Ribeiro
Abstract	We propose a Las Vegas transformation of Markov Chain Monte Carlo (MCMC) estimators of Restricted Boltzmann Machines (RBMs). We denote our approach Markov Chain Las Vegas (MCLV). MCLV gives statistical guarantees in exchange for random running times. MCLV uses a stopping set built from the training data and has maximum number of Markov chain steps K (referred as MCLV-K). We present a MCLV-K gradient estimator (LVS-K) for RBMs and explore the correspondence and differences between LVS-K and Contrastive Divergence (CD-K), with LVS-K significantly outperforming CD-K training RBMs over the MNIST dataset, indicating MCLV to be a promising direction in learning generative models.
Tasks
Published	2017-11-22
URL	http://arxiv.org/abs/1711.08442v1
PDF	http://arxiv.org/pdf/1711.08442v1.pdf
PWC	https://paperswithcode.com/paper/from-monte-carlo-to-las-vegas-improving
Repo	https://github.com/PurdueMINDS/MCLV-RBM
Framework	pytorch

Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations


Title	Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations
Authors	Weinan E, Jiequn Han, Arnulf Jentzen
Abstract	We propose a new algorithm for solving parabolic partial differential equations (PDEs) and backward stochastic differential equations (BSDEs) in high dimension, by making an analogy between the BSDE and reinforcement learning with the gradient of the solution playing the role of the policy function, and the loss function given by the error between the prescribed terminal condition and the solution of the BSDE. The policy function is then approximated by a neural network, as is done in deep reinforcement learning. Numerical results using TensorFlow illustrate the efficiency and accuracy of the proposed algorithms for several 100-dimensional nonlinear PDEs from physics and finance such as the Allen-Cahn equation, the Hamilton-Jacobi-Bellman equation, and a nonlinear pricing model for financial derivatives.
Tasks
Published	2017-06-15
URL	http://arxiv.org/abs/1706.04702v1
PDF	http://arxiv.org/pdf/1706.04702v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-based-numerical-methods-for
Repo	https://github.com/yiliu1/Machine-Learning-in-Finance
Framework	none

PixelNet: Representation of the pixels, by the pixels, and for the pixels


Title	PixelNet: Representation of the pixels, by the pixels, and for the pixels
Authors	Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan
Abstract	We explore design principles for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation. Convolutional predictors, such as the fully-convolutional network (FCN), have achieved remarkable success by exploiting the spatial redundancy of neighboring pixels through convolutional processing. Though computationally efficient, we point out that such approaches are not statistically efficient during learning precisely because spatial redundancy limits the information learned from neighboring pixels. We demonstrate that stratified sampling of pixels allows one to (1) add diversity during batch updates, speeding up learning; (2) explore complex nonlinear predictors, improving accuracy; and (3) efficiently train state-of-the-art models tabula rasa (i.e., “from scratch”) for diverse pixel-labeling tasks. Our single architecture produces state-of-the-art results for semantic segmentation on PASCAL-Context dataset, surface normal estimation on NYUDv2 depth dataset, and edge detection on BSDS.
Tasks	Edge Detection, Semantic Segmentation
Published	2017-02-21
URL	http://arxiv.org/abs/1702.06506v1
PDF	http://arxiv.org/pdf/1702.06506v1.pdf
PWC	https://paperswithcode.com/paper/pixelnet-representation-of-the-pixels-by-the
Repo	https://github.com/bdecost/pixelnet
Framework	tf

Uncertainty-Aware Learning from Demonstration using Mixture Density Networks with Sampling-Free Variance Modeling


Title	Uncertainty-Aware Learning from Demonstration using Mixture Density Networks with Sampling-Free Variance Modeling
Authors	Sungjoon Choi, Kyungjae Lee, Sungbin Lim, Songhwai Oh
Abstract	In this paper, we propose an uncertainty-aware learning from demonstration method by presenting a novel uncertainty estimation method utilizing a mixture density network appropriate for modeling complex and noisy human behaviors. The proposed uncertainty acquisition can be done with a single forward path without Monte Carlo sampling and is suitable for real-time robotics applications. The properties of the proposed uncertainty measure are analyzed through three different synthetic examples, absence of data, heavy measurement noise, and composition of functions scenarios. We show that each case can be distinguished using the proposed uncertainty measure and presented an uncertainty-aware learn- ing from demonstration method of an autonomous driving using this property. The proposed uncertainty-aware learning from demonstration method outperforms other compared methods in terms of safety using a complex real-world driving dataset.
Tasks	Autonomous Driving
Published	2017-09-03
URL	http://arxiv.org/abs/1709.02249v2
PDF	http://arxiv.org/pdf/1709.02249v2.pdf
PWC	https://paperswithcode.com/paper/uncertainty-aware-learning-from-demonstration
Repo	https://github.com/MayarLotfy/MCDO
Framework	tf

Network classification with applications to brain connectomics


Title	Network classification with applications to brain connectomics
Authors	Jesús D. Arroyo-Relión, Daniel Kessler, Elizaveta Levina, Stephan F. Taylor
Abstract	While statistical analysis of a single network has received a lot of attention in recent years, with a focus on social networks, analysis of a sample of networks presents its own challenges which require a different set of analytic tools. Here we study the problem of classification of networks with labeled nodes, motivated by applications in neuroimaging. Brain networks are constructed from imaging data to represent functional connectivity between regions of the brain, and previous work has shown the potential of such networks to distinguish between various brain disorders, giving rise to a network classification problem. Existing approaches tend to either treat all edge weights as a long vector, ignoring the network structure, or focus on graph topology as represented by summary measures while ignoring the edge weights. Our goal is to design a classification method that uses both the individual edge information and the network structure of the data in a computationally efficient way, and that can produce a parsimonious and interpretable representation of differences in brain connectivity patterns between classes. We propose a graph classification method that uses edge weights as predictors but incorporates the network nature of the data via penalties that promote sparsity in the number of nodes, in addition to the usual sparsity penalties that encourage selection of edges. We implement the method via efficient convex optimization and provide a detailed analysis of data from two fMRI studies of schizophrenia.
Tasks	Graph Classification
Published	2017-01-27
URL	http://arxiv.org/abs/1701.08140v3
PDF	http://arxiv.org/pdf/1701.08140v3.pdf
PWC	https://paperswithcode.com/paper/network-classification-with-applications-to
Repo	https://github.com/jesusdaniel/graphclass
Framework	none

Universal Reinforcement Learning Algorithms: Survey and Experiments


Title	Universal Reinforcement Learning Algorithms: Survey and Experiments
Authors	John Aslanides, Jan Leike, Marcus Hutter
Abstract	Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open-source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas.
Tasks
Published	2017-05-30
URL	http://arxiv.org/abs/1705.10557v1
PDF	http://arxiv.org/pdf/1705.10557v1.pdf
PWC	https://paperswithcode.com/paper/universal-reinforcement-learning-algorithms
Repo	https://github.com/aslanides/aixijs
Framework	none

Genetic CNN


Title	Genetic CNN
Authors	Lingxi Xie, Alan Yuille
Abstract	The deep Convolutional Neural Network (CNN) is the state-of-the-art solution for large-scale visual recognition. Following basic principles such as increasing the depth and constructing highway connections, researchers have manually designed a lot of fixed network structures and verified their effectiveness. In this paper, we discuss the possibility of learning deep network structures automatically. Note that the number of possible network structures increases exponentially with the number of layers in the network, which inspires us to adopt the genetic algorithm to efficiently traverse this large search space. We first propose an encoding method to represent each network structure in a fixed-length binary string, and initialize the genetic algorithm by generating a set of randomized individuals. In each generation, we define standard genetic operations, e.g., selection, mutation and crossover, to eliminate weak individuals and then generate more competitive ones. The competitiveness of each individual is defined as its recognition accuracy, which is obtained via training the network from scratch and evaluating it on a validation set. We run the genetic process on two small datasets, i.e., MNIST and CIFAR10, demonstrating its ability to evolve and find high-quality structures which are little studied before. These structures are also transferrable to the large-scale ILSVRC2012 dataset.
Tasks	Object Recognition
Published	2017-03-04
URL	http://arxiv.org/abs/1703.01513v1
PDF	http://arxiv.org/pdf/1703.01513v1.pdf
PWC	https://paperswithcode.com/paper/genetic-cnn
Repo	https://github.com/gmontamat/gentun
Framework	tf

Deep generative models of genetic variation capture mutation effects


Title	Deep generative models of genetic variation capture mutation effects
Authors	Adam J. Riesselman, John B. Ingraham, Debora S. Marks
Abstract	The functions of proteins and RNAs are determined by a myriad of interactions between their constituent residues, but most quantitative models of how molecular phenotype depends on genotype must approximate this by simple additive effects. While recent models have relaxed this constraint to also account for pairwise interactions, these approaches do not provide a tractable path towards modeling higher-order dependencies. Here, we show how latent variable models with nonlinear dependencies can be applied to capture beyond-pairwise constraints in biomolecules. We present a new probabilistic model for sequence families, DeepSequence, that can predict the effects of mutations across a variety of deep mutational scanning experiments significantly better than site independent or pairwise models that are based on the same evolutionary data. The model, learned in an unsupervised manner solely from sequence information, is grounded with biologically motivated priors, reveals latent organization of sequence families, and can be used to extrapolate to new parts of sequence space
Tasks	Latent Variable Models
Published	2017-12-18
URL	http://arxiv.org/abs/1712.06527v1
PDF	http://arxiv.org/pdf/1712.06527v1.pdf
PWC	https://paperswithcode.com/paper/deep-generative-models-of-genetic-variation
Repo	https://github.com/samsinai/VAE_protein_function
Framework	none

Learning Convolutional Text Representations for Visual Question Answering


Title	Learning Convolutional Text Representations for Visual Question Answering
Authors	Zhengyang Wang, Shuiwang Ji
Abstract	Visual question answering is a recently proposed artificial intelligence task that requires a deep understanding of both images and texts. In deep learning, images are typically modeled through convolutional neural networks, and texts are typically modeled through recurrent neural networks. While the requirement for modeling images is similar to traditional computer vision tasks, such as object recognition and image classification, visual question answering raises a different need for textual representation as compared to other natural language processing tasks. In this work, we perform a detailed analysis on natural language questions in visual question answering. Based on the analysis, we propose to rely on convolutional neural networks for learning textual representations. By exploring the various properties of convolutional neural networks specialized for text data, such as width and depth, we present our “CNN Inception + Gate” model. We show that our model improves question representations and thus the overall accuracy of visual question answering models. We also show that the text representation requirement in visual question answering is more complicated and comprehensive than that in conventional natural language processing tasks, making it a better task to evaluate textual representation methods. Shallow models like fastText, which can obtain comparable results with deep learning models in tasks like text classification, are not suitable in visual question answering.
Tasks	Visual Question Answering
Published	2017-05-18
URL	http://arxiv.org/abs/1705.06824v2
PDF	http://arxiv.org/pdf/1705.06824v2.pdf
PWC	https://paperswithcode.com/paper/learning-convolutional-text-representations
Repo	https://github.com/divelab/vqa-text
Framework	caffe2

Is This a Joke? Detecting Humor in Spanish Tweets


Title	Is This a Joke? Detecting Humor in Spanish Tweets
Authors	Santiago Castro, Matías Cubero, Diego Garat, Guillermo Moncecchi
Abstract	While humor has been historically studied from a psychological, cognitive and linguistic standpoint, its study from a computational perspective is an area yet to be explored in Computational Linguistics. There exist some previous works, but a characterization of humor that allows its automatic recognition and generation is far from being specified. In this work we build a crowdsourced corpus of labeled tweets, annotated according to its humor value, letting the annotators subjectively decide which are humorous. A humor classifier for Spanish tweets is assembled based on supervised learning, reaching a precision of 84% and a recall of 69%.
Tasks	Humor Detection
Published	2017-03-28
URL	http://arxiv.org/abs/1703.09527v1
PDF	http://arxiv.org/pdf/1703.09527v1.pdf
PWC	https://paperswithcode.com/paper/is-this-a-joke-detecting-humor-in-spanish
Repo	https://github.com/pln-fing-udelar/pghumor
Framework	none