May 7, 2019

3164 words 15 mins read

Paper Group AWR 88

Semi-Supervised Learning via Sparse Label Propagation. Online Real-time Multiple Spatiotemporal Action Localisation and Prediction. Uncovering Causality from Multivariate Hawkes Integrated Cumulants. Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks. Bi-modal First Impressions Recognition using Temporally Ordere …

Semi-Supervised Learning via Sparse Label Propagation


Title	Semi-Supervised Learning via Sparse Label Propagation
Authors	Alexander Jung, Alfred O. Hero III, Alexandru Mara, Saeed Jahromi
Abstract	This work proposes a novel method for semi-supervised learning from partially labeled massive network-structured datasets, i.e., big data over networks. We model the underlying hypothesis, which relates data points to labels, as a graph signal, defined over some graph (network) structure intrinsic to the dataset. Following the key principle of supervised learning, i.e., similar inputs yield similar outputs, we require the graph signals induced by labels to have small total variation. Accordingly, we formulate the problem of learning the labels of data points as a non-smooth convex optimization problem which amounts to balancing between the empirical loss, i.e., the discrepancy with some partially available label information, and the smoothness quantified by the total variation of the learned graph signal. We solve this optimization problem by appealing to a recently proposed preconditioned variant of the popular primal-dual method by Pock and Chambolle, which results in a sparse label propagation algorithm. This learning algorithm allows for a highly scalable implementation as message passing over the underlying data graph. By applying concepts of compressed sensing to the learning problem, we are also able to provide a transparent sufficient condition on the underlying network structure such that accurate learning of the labels is possible. We also present an implementation of the message passing formulation allows for a highly scalable implementation in big data frameworks.
Tasks
Published	2016-12-05
URL	http://arxiv.org/abs/1612.01414v4
PDF	http://arxiv.org/pdf/1612.01414v4.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-learning-via-sparse-label
Repo	https://github.com/oleksii-a/sparse_label_propagation
Framework	none

Online Real-time Multiple Spatiotemporal Action Localisation and Prediction


Title	Online Real-time Multiple Spatiotemporal Action Localisation and Prediction
Authors	Gurkirt Singh, Suman Saha, Michael Sapienza, Philip Torr, Fabio Cuzzolin
Abstract	We present a deep-learning framework for real-time multiple spatio-temporal (S/T) action localisation, classification and early prediction. Current state-of-the-art approaches work offline and are too slow to be useful in real- world settings. To overcome their limitations we introduce two major developments. Firstly, we adopt real-time SSD (Single Shot MultiBox Detector) convolutional neural networks to regress and classify detection boxes in each video frame potentially containing an action of interest. Secondly, we design an original and efficient online algorithm to incrementally construct and label `action tubes’ from the SSD frame level detections. As a result, our system is not only capable of performing S/T detection in real time, but can also perform early action prediction in an online fashion. We achieve new state-of-the-art results in both S/T action localisation and early action prediction on the challenging UCF101-24 and J-HMDB-21 benchmarks, even when compared to the top offline competitors. To the best of our knowledge, ours is the first real-time (up to 40fps) system able to perform online S/T action localisation and early action prediction on the untrimmed videos of UCF101-24. \|
Tasks
Published	2016-11-25
URL	http://arxiv.org/abs/1611.08563v6
PDF	http://arxiv.org/pdf/1611.08563v6.pdf
PWC	https://paperswithcode.com/paper/online-real-time-multiple-spatiotemporal
Repo	https://github.com/gurkirt/corrected-UCF101-Annots
Framework	none

Uncovering Causality from Multivariate Hawkes Integrated Cumulants


Title	Uncovering Causality from Multivariate Hawkes Integrated Cumulants
Authors	Massil Achab, Emmanuel Bacry, Stéphane Gaïffas, Iacopo Mastromatteo, Jean-Francois Muzy
Abstract	We design a new nonparametric method that allows one to estimate the matrix of integrated kernels of a multivariate Hawkes process. This matrix not only encodes the mutual influences of each nodes of the process, but also disentangles the causality relationships between them. Our approach is the first that leads to an estimation of this matrix without any parametric modeling and estimation of the kernels themselves. A consequence is that it can give an estimation of causality relationships between nodes (or users), based on their activity timestamps (on a social network for instance), without knowing or estimating the shape of the activities lifetime. For that purpose, we introduce a moment matching method that fits the third-order integrated cumulants of the process. We show on numerical experiments that our approach is indeed very robust to the shape of the kernels, and gives appealing results on the MemeTracker database.
Tasks
Published	2016-07-21
URL	http://arxiv.org/abs/1607.06333v3
PDF	http://arxiv.org/pdf/1607.06333v3.pdf
PWC	https://paperswithcode.com/paper/uncovering-causality-from-multivariate-hawkes
Repo	https://github.com/achab/nphc
Framework	none

Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks


Title	Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks
Authors	Chuan Li, Michael Wand
Abstract	This paper proposes Markovian Generative Adversarial Networks (MGANs), a method for training generative neural networks for efficient texture synthesis. While deep neural network approaches have recently demonstrated remarkable results in terms of synthesis quality, they still come at considerable computational costs (minutes of run-time for low-res images). Our paper addresses this efficiency issue. Instead of a numerical deconvolution in previous work, we precompute a feed-forward, strided convolutional network that captures the feature statistics of Markovian patches and is able to directly generate outputs of arbitrary dimensions. Such network can directly decode brown noise to realistic texture, or photos to artistic paintings. With adversarial training, we obtain quality comparable to recent neural texture synthesis methods. As no optimization is required any longer at generation time, our run-time performance (0.25M pixel images at 25Hz) surpasses previous neural texture synthesizers by a significant margin (at least 500 times faster). We apply this idea to texture synthesis, style transfer, and video stylization.
Tasks	Style Transfer, Texture Synthesis
Published	2016-04-15
URL	http://arxiv.org/abs/1604.04382v1
PDF	http://arxiv.org/pdf/1604.04382v1.pdf
PWC	https://paperswithcode.com/paper/precomputed-real-time-texture-synthesis-with
Repo	https://github.com/chuanli11/MGANs
Framework	torch


Title	Bi-modal First Impressions Recognition using Temporally Ordered Deep Audio and Stochastic Visual Features
Authors	Arulkumar Subramaniam, Vismay Patel, Ashish Mishra, Prashanth Balasubramanian, Anurag Mittal
Abstract	We propose a novel approach for First Impressions Recognition in terms of the Big Five personality-traits from short videos. The Big Five personality traits is a model to describe human personality using five broad categories: Extraversion, Agreeableness, Conscientiousness, Neuroticism and Openness. We train two bi-modal end-to-end deep neural network architectures using temporally ordered audio and novel stochastic visual features from few frames, without over-fitting. We empirically show that the trained models perform exceptionally well, even after training from a small sub-portions of inputs. Our method is evaluated in ChaLearn LAP 2016 Apparent Personality Analysis (APA) competition using ChaLearn LAP APA2016 dataset and achieved excellent performance.
Tasks
Published	2016-10-31
URL	http://arxiv.org/abs/1610.10048v1
PDF	http://arxiv.org/pdf/1610.10048v1.pdf
PWC	https://paperswithcode.com/paper/bi-modal-first-impressions-recognition-using
Repo	https://github.com/InnovArul/first-impressions
Framework	torch

PoseTrack: Joint Multi-Person Pose Estimation and Tracking


Title	PoseTrack: Joint Multi-Person Pose Estimation and Tracking
Authors	Umar Iqbal, Anton Milan, Juergen Gall
Abstract	In this work, we introduce the challenging problem of joint multi-person pose estimation and tracking of an unknown number of persons in unconstrained videos. Existing methods for multi-person pose estimation in images cannot be applied directly to this problem, since it also requires to solve the problem of person association over time in addition to the pose estimation for each person. We therefore propose a novel method that jointly models multi-person pose estimation and tracking in a single formulation. To this end, we represent body joint detections in a video by a spatio-temporal graph and solve an integer linear program to partition the graph into sub-graphs that correspond to plausible body pose trajectories for each person. The proposed approach implicitly handles occlusion and truncation of persons. Since the problem has not been addressed quantitatively in the literature, we introduce a challenging “Multi-Person PoseTrack” dataset, and also propose a completely unconstrained evaluation protocol that does not make any assumptions about the scale, size, location or the number of persons. Finally, we evaluate the proposed approach and several baseline methods on our new dataset.
Tasks	Multi-Person Pose Estimation, Multi-Person Pose Estimation and Tracking, Pose Estimation
Published	2016-11-23
URL	http://arxiv.org/abs/1611.07727v3
PDF	http://arxiv.org/pdf/1611.07727v3.pdf
PWC	https://paperswithcode.com/paper/posetrack-joint-multi-person-pose-estimation
Repo	https://github.com/iqbalu/PoseTrack-CVPR2017
Framework	none

Semantic Scene Completion from a Single Depth Image


Title	Semantic Scene Completion from a Single Depth Image
Authors	Shuran Song, Fisher Yu, Andy Zeng, Angel X. Chang, Manolis Savva, Thomas Funkhouser
Abstract	This paper focuses on semantic scene completion, a task for producing a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation. Previous work has considered scene completion and semantic labeling of depth maps separately. However, we observe that these two problems are tightly intertwined. To leverage the coupled nature of these two tasks, we introduce the semantic scene completion network (SSCNet), an end-to-end 3D convolutional network that takes a single depth image as input and simultaneously outputs occupancy and semantic labels for all voxels in the camera view frustum. Our network uses a dilation-based 3D context module to efficiently expand the receptive field and enable 3D context learning. To train our network, we construct SUNCG - a manually created large-scale dataset of synthetic 3D scenes with dense volumetric annotations. Our experiments demonstrate that the joint model outperforms methods addressing each task in isolation and outperforms alternative approaches on the semantic scene completion task.
Tasks
Published	2016-11-28
URL	http://arxiv.org/abs/1611.08974v1
PDF	http://arxiv.org/pdf/1611.08974v1.pdf
PWC	https://paperswithcode.com/paper/semantic-scene-completion-from-a-single-depth
Repo	https://github.com/facebookresearch/House3D
Framework	none

Neural Machine Translation in Linear Time


Title	Neural Machine Translation in Linear Time
Authors	Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, Koray Kavukcuoglu
Abstract	We present a novel neural network for processing sequences. The ByteNet is a one-dimensional convolutional neural network that is composed of two parts, one to encode the source sequence and the other to decode the target sequence. The two network parts are connected by stacking the decoder on top of the encoder and preserving the temporal resolution of the sequences. To address the differing lengths of the source and the target, we introduce an efficient mechanism by which the decoder is dynamically unfolded over the representation of the encoder. The ByteNet uses dilation in the convolutional layers to increase its receptive field. The resulting network has two core properties: it runs in time that is linear in the length of the sequences and it sidesteps the need for excessive memorization. The ByteNet decoder attains state-of-the-art performance on character-level language modelling and outperforms the previous best results obtained with recurrent networks. The ByteNet also achieves state-of-the-art performance on character-to-character machine translation on the English-to-German WMT translation task, surpassing comparable neural translation models that are based on recurrent networks with attentional pooling and run in quadratic time. We find that the latent alignment structure contained in the representations reflects the expected alignment between the tokens.
Tasks	Language Modelling, Machine Translation
Published	2016-10-31
URL	http://arxiv.org/abs/1610.10099v2
PDF	http://arxiv.org/pdf/1610.10099v2.pdf
PWC	https://paperswithcode.com/paper/neural-machine-translation-in-linear-time
Repo	https://github.com/paarthneekhara/byteNet-tensorflow
Framework	tf

Estimating individual treatment effect: generalization bounds and algorithms


Title	Estimating individual treatment effect: generalization bounds and algorithms
Authors	Uri Shalit, Fredrik D. Johansson, David Sontag
Abstract	There is intense interest in applying machine learning to problems of causal inference in fields such as healthcare, economics and education. In particular, individual-level causal inference has important applications such as precision medicine. We give a new theoretical analysis and family of algorithms for predicting individual treatment effect (ITE) from observational data, under the assumption known as strong ignorability. The algorithms learn a “balanced” representation such that the induced treated and control distributions look similar. We give a novel, simple and intuitive generalization-error bound showing that the expected ITE estimation error of a representation is bounded by a sum of the standard generalization-error of that representation and the distance between the treated and control distributions induced by the representation. We use Integral Probability Metrics to measure distances between distributions, deriving explicit bounds for the Wasserstein and Maximum Mean Discrepancy (MMD) distances. Experiments on real and simulated data show the new algorithms match or outperform the state-of-the-art.
Tasks	Causal Inference
Published	2016-06-13
URL	http://arxiv.org/abs/1606.03976v5
PDF	http://arxiv.org/pdf/1606.03976v5.pdf
PWC	https://paperswithcode.com/paper/estimating-individual-treatment-effect
Repo	https://github.com/clinicalml/cfrnet
Framework	tf

Practical Black-Box Attacks against Machine Learning


Title	Practical Black-Box Attacks against Machine Learning
Authors	Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, Ananthram Swami
Abstract	Machine learning (ML) models, e.g., deep neural networks (DNNs), are vulnerable to adversarial examples: malicious inputs modified to yield erroneous model outputs, while appearing unmodified to human observers. Potential attacks include having malicious content like malware identified as legitimate or controlling vehicle behavior. Yet, all existing adversarial example attacks require knowledge of either the model internals or its training data. We introduce the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge. Indeed, the only capability of our black-box adversary is to observe labels given by the DNN to chosen inputs. Our attack strategy consists in training a local model to substitute for the target DNN, using inputs synthetically generated by an adversary and labeled by the target DNN. We use the local substitute to craft adversarial examples, and find that they are misclassified by the targeted DNN. To perform a real-world and properly-blinded evaluation, we attack a DNN hosted by MetaMind, an online deep learning API. We find that their DNN misclassifies 84.24% of the adversarial examples crafted with our substitute. We demonstrate the general applicability of our strategy to many ML techniques by conducting the same attack against models hosted by Amazon and Google, using logistic regression substitutes. They yield adversarial examples misclassified by Amazon and Google at rates of 96.19% and 88.94%. We also find that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder.
Tasks
Published	2016-02-08
URL	http://arxiv.org/abs/1602.02697v4
PDF	http://arxiv.org/pdf/1602.02697v4.pdf
PWC	https://paperswithcode.com/paper/practical-black-box-attacks-against-machine
Repo	https://github.com/adrian-botta/understanding_adversarial_examples
Framework	none

Language Modeling with Gated Convolutional Networks


Title	Language Modeling with Gated Convolutional Networks
Authors	Yann N. Dauphin, Angela Fan, Michael Auli, David Grangier
Abstract	The pre-dominant approach to language modeling to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. We propose a novel simplified gating mechanism that outperforms Oord et al (2016) and investigate the impact of key architectural decisions. The proposed approach achieves state-of-the-art on the WikiText-103 benchmark, even though it features long-term dependencies, as well as competitive results on the Google Billion Words benchmark. Our model reduces the latency to score a sentence by an order of magnitude compared to a recurrent baseline. To our knowledge, this is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.
Tasks	Language Modelling
Published	2016-12-23
URL	http://arxiv.org/abs/1612.08083v3
PDF	http://arxiv.org/pdf/1612.08083v3.pdf
PWC	https://paperswithcode.com/paper/language-modeling-with-gated-convolutional
Repo	https://github.com/ifrit98/layer-glu
Framework	none

Span-Based Constituency Parsing with a Structure-Label System and Provably Optimal Dynamic Oracles


Title	Span-Based Constituency Parsing with a Structure-Label System and Provably Optimal Dynamic Oracles
Authors	James Cross, Liang Huang
Abstract	Parsing accuracy using efficient greedy transition systems has improved dramatically in recent years thanks to neural networks. Despite striking results in dependency parsing, however, neural models have not surpassed state-of-the-art approaches in constituency parsing. To remedy this, we introduce a new shift-reduce system whose stack contains merely sentence spans, represented by a bare minimum of LSTM features. We also design the first provably optimal dynamic oracle for constituency parsing, which runs in amortized O(1) time, compared to O(n^3) oracles for standard dependency parsing. Training with this oracle, we achieve the best F1 scores on both English and French of any parser that does not use reranking or external data.
Tasks	Constituency Parsing, Dependency Parsing
Published	2016-12-20
URL	http://arxiv.org/abs/1612.06475v1
PDF	http://arxiv.org/pdf/1612.06475v1.pdf
PWC	https://paperswithcode.com/paper/span-based-constituency-parsing-with-a
Repo	https://github.com/jhcross/span-parser
Framework	none

Operational Calculus for Differentiable Programming


Title	Operational Calculus for Differentiable Programming
Authors	Žiga Sajovic, Martin Vuk
Abstract	In this work we present a theoretical model for differentiable programming. We construct an algebraic language that encapsulates formal semantics of differentiable programs by way of Operational Calculus. The algebraic nature of Operational Calculus can alter the properties of the programs that are expressed within the language and transform them into their solutions. In our model programs are elements of programming spaces and viewed as maps from the virtual memory space to itself. Virtual memory space is an algebra of programs, an algebraic data structure one can calculate with. We define the operator of differentiation ($\partial$) on programming spaces and, using its powers, implement the general shift operator and the operator of program composition. We provide the formula for the expansion of a differentiable program into an infinite tensor series in terms of the powers of $\partial$. We express the operator of program composition in terms of the generalized shift operator and $\partial$, which implements a differentiable composition in the language. Such operators serve as abstractions over the tensor series algebra, as main actors in our language. We demonstrate our models usefulness in differentiable programming by using it to analyse iterators, deriving fractional iterations and their iterating velocities, and explicitly solve the special case of ReduceSum.
Tasks
Published	2016-10-25
URL	http://arxiv.org/abs/1610.07690v6
PDF	http://arxiv.org/pdf/1610.07690v6.pdf
PWC	https://paperswithcode.com/paper/operational-calculus-for-differentiable
Repo	https://github.com/zigasajovic/dCpp
Framework	none

Neural Photo Editing with Introspective Adversarial Networks


Title	Neural Photo Editing with Introspective Adversarial Networks
Authors	Andrew Brock, Theodore Lim, J. M. Ritchie, Nick Weston
Abstract	The increasingly photorealistic sample quality of generative image models suggests their feasibility in applications beyond image generation. We present the Neural Photo Editor, an interface that leverages the power of generative neural networks to make large, semantically coherent changes to existing images. To tackle the challenge of achieving accurate reconstructions without loss of feature quality, we introduce the Introspective Adversarial Network, a novel hybridization of the VAE and GAN. Our model efficiently captures long-range dependencies through use of a computational block based on weight-shared dilated convolutions, and improves generalization performance with Orthogonal Regularization, a novel weight regularization method. We validate our contributions on CelebA, SVHN, and CIFAR-100, and produce samples and reconstructions with high visual fidelity.
Tasks	Image Generation
Published	2016-09-22
URL	http://arxiv.org/abs/1609.07093v3
PDF	http://arxiv.org/pdf/1609.07093v3.pdf
PWC	https://paperswithcode.com/paper/neural-photo-editing-with-introspective
Repo	https://github.com/ajbrock/Neural-Photo-Editor
Framework	tf

Full Resolution Image Compression with Recurrent Neural Networks


Title	Full Resolution Image Compression with Recurrent Neural Networks
Authors	George Toderici, Damien Vincent, Nick Johnston, Sung Jin Hwang, David Minnen, Joel Shor, Michele Covell
Abstract	This paper presents a set of full-resolution lossy image compression methods based on neural networks. Each of the architectures we describe can provide variable compression rates during deployment without requiring retraining of the network: each network need only be trained once. All of our architectures consist of a recurrent neural network (RNN)-based encoder and decoder, a binarizer, and a neural network for entropy coding. We compare RNN types (LSTM, associative LSTM) and introduce a new hybrid of GRU and ResNet. We also study “one-shot” versus additive reconstruction architectures and introduce a new scaled-additive framework. We compare to previous work, showing improvements of 4.3%-8.8% AUC (area under the rate-distortion curve), depending on the perceptual metric used. As far as we know, this is the first neural network architecture that is able to outperform JPEG at image compression across most bitrates on the rate-distortion curve on the Kodak dataset images, with and without the aid of entropy coding.
Tasks	Image Compression
Published	2016-08-18
URL	http://arxiv.org/abs/1608.05148v2
PDF	http://arxiv.org/pdf/1608.05148v2.pdf
PWC	https://paperswithcode.com/paper/full-resolution-image-compression-with
Repo	https://github.com/SimonTsungHanKuo/ImageCompzByGRU
Framework	pytorch