Paper Group AWR 166
Real-Time Action Detection in Video Surveillance using Sub-Action Descriptor with Multi-CNN. Two-Stream Convolutional Networks for Dynamic Texture Synthesis. Learning Texture Manifolds with the Periodic Spatial GAN. ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks. Robust Video-Based Eye Tracking Using …
Real-Time Action Detection in Video Surveillance using Sub-Action Descriptor with Multi-CNN
Title | Real-Time Action Detection in Video Surveillance using Sub-Action Descriptor with Multi-CNN |
Authors | Cheng-Bin Jin, Shengzhe Li, Hakil Kim |
Abstract | When we say a person is texting, can you tell the person is walking or sitting? Emphatically, no. In order to solve this incomplete representation problem, this paper presents a sub-action descriptor for detailed action detection. The sub-action descriptor consists of three levels: the posture, the locomotion, and the gesture level. The three levels give three sub-action categories for one action to address the representation problem. The proposed action detection model simultaneously localizes and recognizes the actions of multiple individuals in video surveillance using appearance-based temporal features with multi-CNN. The proposed approach achieved a mean average precision (mAP) of 76.6% at the frame-based and 83.5% at the video-based measurement on the new large-scale ICVL video surveillance dataset that the authors introduce and make available to the community with this paper. Extensive experiments on the benchmark KTH dataset demonstrate that the proposed approach achieved better performance, which in turn boosts the action recognition performance over the state-of-the-art. The action detection model can run at around 25 fps on the ICVL and more than 80 fps on the KTH dataset, which is suitable for real-time surveillance applications. |
Tasks | Action Detection, Temporal Action Localization |
Published | 2017-10-10 |
URL | http://arxiv.org/abs/1710.03383v1 |
http://arxiv.org/pdf/1710.03383v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-action-detection-in-video |
Repo | https://github.com/ChengBinJin/ActionViewer |
Framework | none |
Two-Stream Convolutional Networks for Dynamic Texture Synthesis
Title | Two-Stream Convolutional Networks for Dynamic Texture Synthesis |
Authors | Matthew Tesfaldet, Marcus A. Brubaker, Konstantinos G. Derpanis |
Abstract | We introduce a two-stream model for dynamic texture synthesis. Our model is based on pre-trained convolutional networks (ConvNets) that target two independent tasks: (i) object recognition, and (ii) optical flow prediction. Given an input dynamic texture, statistics of filter responses from the object recognition ConvNet encapsulate the per-frame appearance of the input texture, while statistics of filter responses from the optical flow ConvNet model its dynamics. To generate a novel texture, a randomly initialized input sequence is optimized to match the feature statistics from each stream of an example texture. Inspired by recent work on image style transfer and enabled by the two-stream model, we also apply the synthesis approach to combine the texture appearance from one texture with the dynamics of another to generate entirely novel dynamic textures. We show that our approach generates novel, high quality samples that match both the framewise appearance and temporal evolution of input texture. Finally, we quantitatively evaluate our texture synthesis approach with a thorough user study. |
Tasks | Object Recognition, Optical Flow Estimation, Style Transfer, Texture Synthesis |
Published | 2017-06-21 |
URL | http://arxiv.org/abs/1706.06982v4 |
http://arxiv.org/pdf/1706.06982v4.pdf | |
PWC | https://paperswithcode.com/paper/two-stream-convolutional-networks-for-dynamic |
Repo | https://github.com/ryersonvisionlab/two-stream-dyntex-synth |
Framework | tf |
Learning Texture Manifolds with the Periodic Spatial GAN
Title | Learning Texture Manifolds with the Periodic Spatial GAN |
Authors | Urs Bergmann, Nikolay Jetchev, Roland Vollgraf |
Abstract | This paper introduces a novel approach to texture synthesis based on generative adversarial networks (GAN) (Goodfellow et al., 2014). We extend the structure of the input noise distribution by constructing tensors with different types of dimensions. We call this technique Periodic Spatial GAN (PSGAN). The PSGAN has several novel abilities which surpass the current state of the art in texture synthesis. First, we can learn multiple textures from datasets of one or more complex large images. Second, we show that the image generation with PSGANs has properties of a texture manifold: we can smoothly interpolate between samples in the structured noise space and generate novel samples, which lie perceptually between the textures of the original dataset. In addition, we can also accurately learn periodical textures. We make multiple experiments which show that PSGANs can flexibly handle diverse texture and image data sources. Our method is highly scalable and it can generate output images of arbitrary large size. |
Tasks | Image Generation, Texture Synthesis |
Published | 2017-05-18 |
URL | http://arxiv.org/abs/1705.06566v2 |
http://arxiv.org/pdf/1705.06566v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-texture-manifolds-with-the-periodic |
Repo | https://github.com/zalandoresearch/psgan |
Framework | pytorch |
ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks
Title | ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks |
Authors | Denis A. Gudovskiy, Luca Rigazio |
Abstract | In this paper we introduce ShiftCNN, a generalized low-precision architecture for inference of multiplierless convolutional neural networks (CNNs). ShiftCNN is based on a power-of-two weight representation and, as a result, performs only shift and addition operations. Furthermore, ShiftCNN substantially reduces computational cost of convolutional layers by precomputing convolution terms. Such an optimization can be applied to any CNN architecture with a relatively small codebook of weights and allows to decrease the number of product operations by at least two orders of magnitude. The proposed architecture targets custom inference accelerators and can be realized on FPGAs or ASICs. Extensive evaluation on ImageNet shows that the state-of-the-art CNNs can be converted without retraining into ShiftCNN with less than 1% drop in accuracy when the proposed quantization algorithm is employed. RTL simulations, targeting modern FPGAs, show that power consumption of convolutional layers is reduced by a factor of 4 compared to conventional 8-bit fixed-point architectures. |
Tasks | Quantization |
Published | 2017-06-07 |
URL | http://arxiv.org/abs/1706.02393v1 |
http://arxiv.org/pdf/1706.02393v1.pdf | |
PWC | https://paperswithcode.com/paper/shiftcnn-generalized-low-precision |
Repo | https://github.com/gudovskiy/ShiftCNN |
Framework | caffe2 |
Robust Video-Based Eye Tracking Using Recursive Estimation of Pupil Characteristics
Title | Robust Video-Based Eye Tracking Using Recursive Estimation of Pupil Characteristics |
Authors | Terence Brouns |
Abstract | Video-based eye tracking is a valuable technique in various research fields. Numerous open-source eye tracking algorithms have been developed in recent years, primarily designed for general application with many different camera types. These algorithms do not, however, capitalize on the high frame rate of eye tracking cameras often employed in psychophysical studies. We present a pupil detection method that utilizes this high-speed property to obtain reliable predictions through recursive estimation about certain pupil characteristics in successive camera frames. These predictions are subsequently used to carry out novel image segmentation and classification routines to improve pupil detection performance. Based on results from hand-labelled eye images, our approach was found to have a greater detection rate, accuracy and speed compared to other recently published open-source pupil detection algorithms. The program’s source code, together with a graphical user interface, can be downloaded at https://github.com/tbrouns/eyestalker |
Tasks | Eye Tracking, Semantic Segmentation |
Published | 2017-06-25 |
URL | http://arxiv.org/abs/1706.08189v2 |
http://arxiv.org/pdf/1706.08189v2.pdf | |
PWC | https://paperswithcode.com/paper/robust-video-based-eye-tracking-using |
Repo | https://github.com/tbrouns/eyestalker |
Framework | none |
From Monte Carlo to Las Vegas: Improving Restricted Boltzmann Machine Training Through Stopping Sets
Title | From Monte Carlo to Las Vegas: Improving Restricted Boltzmann Machine Training Through Stopping Sets |
Authors | Pedro H. P. Savarese, Mayank Kakodkar, Bruno Ribeiro |
Abstract | We propose a Las Vegas transformation of Markov Chain Monte Carlo (MCMC) estimators of Restricted Boltzmann Machines (RBMs). We denote our approach Markov Chain Las Vegas (MCLV). MCLV gives statistical guarantees in exchange for random running times. MCLV uses a stopping set built from the training data and has maximum number of Markov chain steps K (referred as MCLV-K). We present a MCLV-K gradient estimator (LVS-K) for RBMs and explore the correspondence and differences between LVS-K and Contrastive Divergence (CD-K), with LVS-K significantly outperforming CD-K training RBMs over the MNIST dataset, indicating MCLV to be a promising direction in learning generative models. |
Tasks | |
Published | 2017-11-22 |
URL | http://arxiv.org/abs/1711.08442v1 |
http://arxiv.org/pdf/1711.08442v1.pdf | |
PWC | https://paperswithcode.com/paper/from-monte-carlo-to-las-vegas-improving |
Repo | https://github.com/PurdueMINDS/MCLV-RBM |
Framework | pytorch |
Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations
Title | Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations |
Authors | Weinan E, Jiequn Han, Arnulf Jentzen |
Abstract | We propose a new algorithm for solving parabolic partial differential equations (PDEs) and backward stochastic differential equations (BSDEs) in high dimension, by making an analogy between the BSDE and reinforcement learning with the gradient of the solution playing the role of the policy function, and the loss function given by the error between the prescribed terminal condition and the solution of the BSDE. The policy function is then approximated by a neural network, as is done in deep reinforcement learning. Numerical results using TensorFlow illustrate the efficiency and accuracy of the proposed algorithms for several 100-dimensional nonlinear PDEs from physics and finance such as the Allen-Cahn equation, the Hamilton-Jacobi-Bellman equation, and a nonlinear pricing model for financial derivatives. |
Tasks | |
Published | 2017-06-15 |
URL | http://arxiv.org/abs/1706.04702v1 |
http://arxiv.org/pdf/1706.04702v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-based-numerical-methods-for |
Repo | https://github.com/yiliu1/Machine-Learning-in-Finance |
Framework | none |
PixelNet: Representation of the pixels, by the pixels, and for the pixels
Title | PixelNet: Representation of the pixels, by the pixels, and for the pixels |
Authors | Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan |
Abstract | We explore design principles for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation. Convolutional predictors, such as the fully-convolutional network (FCN), have achieved remarkable success by exploiting the spatial redundancy of neighboring pixels through convolutional processing. Though computationally efficient, we point out that such approaches are not statistically efficient during learning precisely because spatial redundancy limits the information learned from neighboring pixels. We demonstrate that stratified sampling of pixels allows one to (1) add diversity during batch updates, speeding up learning; (2) explore complex nonlinear predictors, improving accuracy; and (3) efficiently train state-of-the-art models tabula rasa (i.e., “from scratch”) for diverse pixel-labeling tasks. Our single architecture produces state-of-the-art results for semantic segmentation on PASCAL-Context dataset, surface normal estimation on NYUDv2 depth dataset, and edge detection on BSDS. |
Tasks | Edge Detection, Semantic Segmentation |
Published | 2017-02-21 |
URL | http://arxiv.org/abs/1702.06506v1 |
http://arxiv.org/pdf/1702.06506v1.pdf | |
PWC | https://paperswithcode.com/paper/pixelnet-representation-of-the-pixels-by-the |
Repo | https://github.com/bdecost/pixelnet |
Framework | tf |
Uncertainty-Aware Learning from Demonstration using Mixture Density Networks with Sampling-Free Variance Modeling
Title | Uncertainty-Aware Learning from Demonstration using Mixture Density Networks with Sampling-Free Variance Modeling |
Authors | Sungjoon Choi, Kyungjae Lee, Sungbin Lim, Songhwai Oh |
Abstract | In this paper, we propose an uncertainty-aware learning from demonstration method by presenting a novel uncertainty estimation method utilizing a mixture density network appropriate for modeling complex and noisy human behaviors. The proposed uncertainty acquisition can be done with a single forward path without Monte Carlo sampling and is suitable for real-time robotics applications. The properties of the proposed uncertainty measure are analyzed through three different synthetic examples, absence of data, heavy measurement noise, and composition of functions scenarios. We show that each case can be distinguished using the proposed uncertainty measure and presented an uncertainty-aware learn- ing from demonstration method of an autonomous driving using this property. The proposed uncertainty-aware learning from demonstration method outperforms other compared methods in terms of safety using a complex real-world driving dataset. |
Tasks | Autonomous Driving |
Published | 2017-09-03 |
URL | http://arxiv.org/abs/1709.02249v2 |
http://arxiv.org/pdf/1709.02249v2.pdf | |
PWC | https://paperswithcode.com/paper/uncertainty-aware-learning-from-demonstration |
Repo | https://github.com/MayarLotfy/MCDO |
Framework | tf |
Network classification with applications to brain connectomics
Title | Network classification with applications to brain connectomics |
Authors | Jesús D. Arroyo-Relión, Daniel Kessler, Elizaveta Levina, Stephan F. Taylor |
Abstract | While statistical analysis of a single network has received a lot of attention in recent years, with a focus on social networks, analysis of a sample of networks presents its own challenges which require a different set of analytic tools. Here we study the problem of classification of networks with labeled nodes, motivated by applications in neuroimaging. Brain networks are constructed from imaging data to represent functional connectivity between regions of the brain, and previous work has shown the potential of such networks to distinguish between various brain disorders, giving rise to a network classification problem. Existing approaches tend to either treat all edge weights as a long vector, ignoring the network structure, or focus on graph topology as represented by summary measures while ignoring the edge weights. Our goal is to design a classification method that uses both the individual edge information and the network structure of the data in a computationally efficient way, and that can produce a parsimonious and interpretable representation of differences in brain connectivity patterns between classes. We propose a graph classification method that uses edge weights as predictors but incorporates the network nature of the data via penalties that promote sparsity in the number of nodes, in addition to the usual sparsity penalties that encourage selection of edges. We implement the method via efficient convex optimization and provide a detailed analysis of data from two fMRI studies of schizophrenia. |
Tasks | Graph Classification |
Published | 2017-01-27 |
URL | http://arxiv.org/abs/1701.08140v3 |
http://arxiv.org/pdf/1701.08140v3.pdf | |
PWC | https://paperswithcode.com/paper/network-classification-with-applications-to |
Repo | https://github.com/jesusdaniel/graphclass |
Framework | none |
Universal Reinforcement Learning Algorithms: Survey and Experiments
Title | Universal Reinforcement Learning Algorithms: Survey and Experiments |
Authors | John Aslanides, Jan Leike, Marcus Hutter |
Abstract | Many state-of-the-art reinforcement learning (RL) algorithms typically assume that the environment is an ergodic Markov Decision Process (MDP). In contrast, the field of universal reinforcement learning (URL) is concerned with algorithms that make as few assumptions as possible about the environment. The universal Bayesian agent AIXI and a family of related URL algorithms have been developed in this setting. While numerous theoretical optimality results have been proven for these agents, there has been no empirical investigation of their behavior to date. We present a short and accessible survey of these URL algorithms under a unified notation and framework, along with results of some experiments that qualitatively illustrate some properties of the resulting policies, and their relative performance on partially-observable gridworld environments. We also present an open-source reference implementation of the algorithms which we hope will facilitate further understanding of, and experimentation with, these ideas. |
Tasks | |
Published | 2017-05-30 |
URL | http://arxiv.org/abs/1705.10557v1 |
http://arxiv.org/pdf/1705.10557v1.pdf | |
PWC | https://paperswithcode.com/paper/universal-reinforcement-learning-algorithms |
Repo | https://github.com/aslanides/aixijs |
Framework | none |
Genetic CNN
Title | Genetic CNN |
Authors | Lingxi Xie, Alan Yuille |
Abstract | The deep Convolutional Neural Network (CNN) is the state-of-the-art solution for large-scale visual recognition. Following basic principles such as increasing the depth and constructing highway connections, researchers have manually designed a lot of fixed network structures and verified their effectiveness. In this paper, we discuss the possibility of learning deep network structures automatically. Note that the number of possible network structures increases exponentially with the number of layers in the network, which inspires us to adopt the genetic algorithm to efficiently traverse this large search space. We first propose an encoding method to represent each network structure in a fixed-length binary string, and initialize the genetic algorithm by generating a set of randomized individuals. In each generation, we define standard genetic operations, e.g., selection, mutation and crossover, to eliminate weak individuals and then generate more competitive ones. The competitiveness of each individual is defined as its recognition accuracy, which is obtained via training the network from scratch and evaluating it on a validation set. We run the genetic process on two small datasets, i.e., MNIST and CIFAR10, demonstrating its ability to evolve and find high-quality structures which are little studied before. These structures are also transferrable to the large-scale ILSVRC2012 dataset. |
Tasks | Object Recognition |
Published | 2017-03-04 |
URL | http://arxiv.org/abs/1703.01513v1 |
http://arxiv.org/pdf/1703.01513v1.pdf | |
PWC | https://paperswithcode.com/paper/genetic-cnn |
Repo | https://github.com/gmontamat/gentun |
Framework | tf |
Deep generative models of genetic variation capture mutation effects
Title | Deep generative models of genetic variation capture mutation effects |
Authors | Adam J. Riesselman, John B. Ingraham, Debora S. Marks |
Abstract | The functions of proteins and RNAs are determined by a myriad of interactions between their constituent residues, but most quantitative models of how molecular phenotype depends on genotype must approximate this by simple additive effects. While recent models have relaxed this constraint to also account for pairwise interactions, these approaches do not provide a tractable path towards modeling higher-order dependencies. Here, we show how latent variable models with nonlinear dependencies can be applied to capture beyond-pairwise constraints in biomolecules. We present a new probabilistic model for sequence families, DeepSequence, that can predict the effects of mutations across a variety of deep mutational scanning experiments significantly better than site independent or pairwise models that are based on the same evolutionary data. The model, learned in an unsupervised manner solely from sequence information, is grounded with biologically motivated priors, reveals latent organization of sequence families, and can be used to extrapolate to new parts of sequence space |
Tasks | Latent Variable Models |
Published | 2017-12-18 |
URL | http://arxiv.org/abs/1712.06527v1 |
http://arxiv.org/pdf/1712.06527v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-generative-models-of-genetic-variation |
Repo | https://github.com/samsinai/VAE_protein_function |
Framework | none |
Learning Convolutional Text Representations for Visual Question Answering
Title | Learning Convolutional Text Representations for Visual Question Answering |
Authors | Zhengyang Wang, Shuiwang Ji |
Abstract | Visual question answering is a recently proposed artificial intelligence task that requires a deep understanding of both images and texts. In deep learning, images are typically modeled through convolutional neural networks, and texts are typically modeled through recurrent neural networks. While the requirement for modeling images is similar to traditional computer vision tasks, such as object recognition and image classification, visual question answering raises a different need for textual representation as compared to other natural language processing tasks. In this work, we perform a detailed analysis on natural language questions in visual question answering. Based on the analysis, we propose to rely on convolutional neural networks for learning textual representations. By exploring the various properties of convolutional neural networks specialized for text data, such as width and depth, we present our “CNN Inception + Gate” model. We show that our model improves question representations and thus the overall accuracy of visual question answering models. We also show that the text representation requirement in visual question answering is more complicated and comprehensive than that in conventional natural language processing tasks, making it a better task to evaluate textual representation methods. Shallow models like fastText, which can obtain comparable results with deep learning models in tasks like text classification, are not suitable in visual question answering. |
Tasks | Visual Question Answering |
Published | 2017-05-18 |
URL | http://arxiv.org/abs/1705.06824v2 |
http://arxiv.org/pdf/1705.06824v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-convolutional-text-representations |
Repo | https://github.com/divelab/vqa-text |
Framework | caffe2 |
Is This a Joke? Detecting Humor in Spanish Tweets
Title | Is This a Joke? Detecting Humor in Spanish Tweets |
Authors | Santiago Castro, Matías Cubero, Diego Garat, Guillermo Moncecchi |
Abstract | While humor has been historically studied from a psychological, cognitive and linguistic standpoint, its study from a computational perspective is an area yet to be explored in Computational Linguistics. There exist some previous works, but a characterization of humor that allows its automatic recognition and generation is far from being specified. In this work we build a crowdsourced corpus of labeled tweets, annotated according to its humor value, letting the annotators subjectively decide which are humorous. A humor classifier for Spanish tweets is assembled based on supervised learning, reaching a precision of 84% and a recall of 69%. |
Tasks | Humor Detection |
Published | 2017-03-28 |
URL | http://arxiv.org/abs/1703.09527v1 |
http://arxiv.org/pdf/1703.09527v1.pdf | |
PWC | https://paperswithcode.com/paper/is-this-a-joke-detecting-humor-in-spanish |
Repo | https://github.com/pln-fing-udelar/pghumor |
Framework | none |