Paper Group AWR 88
Semi-Supervised Learning via Sparse Label Propagation. Online Real-time Multiple Spatiotemporal Action Localisation and Prediction. Uncovering Causality from Multivariate Hawkes Integrated Cumulants. Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks. Bi-modal First Impressions Recognition using Temporally Ordere …
Semi-Supervised Learning via Sparse Label Propagation
Title | Semi-Supervised Learning via Sparse Label Propagation |
Authors | Alexander Jung, Alfred O. Hero III, Alexandru Mara, Saeed Jahromi |
Abstract | This work proposes a novel method for semi-supervised learning from partially labeled massive network-structured datasets, i.e., big data over networks. We model the underlying hypothesis, which relates data points to labels, as a graph signal, defined over some graph (network) structure intrinsic to the dataset. Following the key principle of supervised learning, i.e., similar inputs yield similar outputs, we require the graph signals induced by labels to have small total variation. Accordingly, we formulate the problem of learning the labels of data points as a non-smooth convex optimization problem which amounts to balancing between the empirical loss, i.e., the discrepancy with some partially available label information, and the smoothness quantified by the total variation of the learned graph signal. We solve this optimization problem by appealing to a recently proposed preconditioned variant of the popular primal-dual method by Pock and Chambolle, which results in a sparse label propagation algorithm. This learning algorithm allows for a highly scalable implementation as message passing over the underlying data graph. By applying concepts of compressed sensing to the learning problem, we are also able to provide a transparent sufficient condition on the underlying network structure such that accurate learning of the labels is possible. We also present an implementation of the message passing formulation allows for a highly scalable implementation in big data frameworks. |
Tasks | |
Published | 2016-12-05 |
URL | http://arxiv.org/abs/1612.01414v4 |
http://arxiv.org/pdf/1612.01414v4.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-learning-via-sparse-label |
Repo | https://github.com/oleksii-a/sparse_label_propagation |
Framework | none |
Online Real-time Multiple Spatiotemporal Action Localisation and Prediction
Title | Online Real-time Multiple Spatiotemporal Action Localisation and Prediction |
Authors | Gurkirt Singh, Suman Saha, Michael Sapienza, Philip Torr, Fabio Cuzzolin |
Abstract | We present a deep-learning framework for real-time multiple spatio-temporal (S/T) action localisation, classification and early prediction. Current state-of-the-art approaches work offline and are too slow to be useful in real- world settings. To overcome their limitations we introduce two major developments. Firstly, we adopt real-time SSD (Single Shot MultiBox Detector) convolutional neural networks to regress and classify detection boxes in each video frame potentially containing an action of interest. Secondly, we design an original and efficient online algorithm to incrementally construct and label `action tubes’ from the SSD frame level detections. As a result, our system is not only capable of performing S/T detection in real time, but can also perform early action prediction in an online fashion. We achieve new state-of-the-art results in both S/T action localisation and early action prediction on the challenging UCF101-24 and J-HMDB-21 benchmarks, even when compared to the top offline competitors. To the best of our knowledge, ours is the first real-time (up to 40fps) system able to perform online S/T action localisation and early action prediction on the untrimmed videos of UCF101-24. | |
Tasks | |
Published | 2016-11-25 |
URL | http://arxiv.org/abs/1611.08563v6 |
http://arxiv.org/pdf/1611.08563v6.pdf | |
PWC | https://paperswithcode.com/paper/online-real-time-multiple-spatiotemporal |
Repo | https://github.com/gurkirt/corrected-UCF101-Annots |
Framework | none |
Uncovering Causality from Multivariate Hawkes Integrated Cumulants
Title | Uncovering Causality from Multivariate Hawkes Integrated Cumulants |
Authors | Massil Achab, Emmanuel Bacry, Stéphane Gaïffas, Iacopo Mastromatteo, Jean-Francois Muzy |
Abstract | We design a new nonparametric method that allows one to estimate the matrix of integrated kernels of a multivariate Hawkes process. This matrix not only encodes the mutual influences of each nodes of the process, but also disentangles the causality relationships between them. Our approach is the first that leads to an estimation of this matrix without any parametric modeling and estimation of the kernels themselves. A consequence is that it can give an estimation of causality relationships between nodes (or users), based on their activity timestamps (on a social network for instance), without knowing or estimating the shape of the activities lifetime. For that purpose, we introduce a moment matching method that fits the third-order integrated cumulants of the process. We show on numerical experiments that our approach is indeed very robust to the shape of the kernels, and gives appealing results on the MemeTracker database. |
Tasks | |
Published | 2016-07-21 |
URL | http://arxiv.org/abs/1607.06333v3 |
http://arxiv.org/pdf/1607.06333v3.pdf | |
PWC | https://paperswithcode.com/paper/uncovering-causality-from-multivariate-hawkes |
Repo | https://github.com/achab/nphc |
Framework | none |
Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks
Title | Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks |
Authors | Chuan Li, Michael Wand |
Abstract | This paper proposes Markovian Generative Adversarial Networks (MGANs), a method for training generative neural networks for efficient texture synthesis. While deep neural network approaches have recently demonstrated remarkable results in terms of synthesis quality, they still come at considerable computational costs (minutes of run-time for low-res images). Our paper addresses this efficiency issue. Instead of a numerical deconvolution in previous work, we precompute a feed-forward, strided convolutional network that captures the feature statistics of Markovian patches and is able to directly generate outputs of arbitrary dimensions. Such network can directly decode brown noise to realistic texture, or photos to artistic paintings. With adversarial training, we obtain quality comparable to recent neural texture synthesis methods. As no optimization is required any longer at generation time, our run-time performance (0.25M pixel images at 25Hz) surpasses previous neural texture synthesizers by a significant margin (at least 500 times faster). We apply this idea to texture synthesis, style transfer, and video stylization. |
Tasks | Style Transfer, Texture Synthesis |
Published | 2016-04-15 |
URL | http://arxiv.org/abs/1604.04382v1 |
http://arxiv.org/pdf/1604.04382v1.pdf | |
PWC | https://paperswithcode.com/paper/precomputed-real-time-texture-synthesis-with |
Repo | https://github.com/chuanli11/MGANs |
Framework | torch |
Bi-modal First Impressions Recognition using Temporally Ordered Deep Audio and Stochastic Visual Features
Title | Bi-modal First Impressions Recognition using Temporally Ordered Deep Audio and Stochastic Visual Features |
Authors | Arulkumar Subramaniam, Vismay Patel, Ashish Mishra, Prashanth Balasubramanian, Anurag Mittal |
Abstract | We propose a novel approach for First Impressions Recognition in terms of the Big Five personality-traits from short videos. The Big Five personality traits is a model to describe human personality using five broad categories: Extraversion, Agreeableness, Conscientiousness, Neuroticism and Openness. We train two bi-modal end-to-end deep neural network architectures using temporally ordered audio and novel stochastic visual features from few frames, without over-fitting. We empirically show that the trained models perform exceptionally well, even after training from a small sub-portions of inputs. Our method is evaluated in ChaLearn LAP 2016 Apparent Personality Analysis (APA) competition using ChaLearn LAP APA2016 dataset and achieved excellent performance. |
Tasks | |
Published | 2016-10-31 |
URL | http://arxiv.org/abs/1610.10048v1 |
http://arxiv.org/pdf/1610.10048v1.pdf | |
PWC | https://paperswithcode.com/paper/bi-modal-first-impressions-recognition-using |
Repo | https://github.com/InnovArul/first-impressions |
Framework | torch |
PoseTrack: Joint Multi-Person Pose Estimation and Tracking
Title | PoseTrack: Joint Multi-Person Pose Estimation and Tracking |
Authors | Umar Iqbal, Anton Milan, Juergen Gall |
Abstract | In this work, we introduce the challenging problem of joint multi-person pose estimation and tracking of an unknown number of persons in unconstrained videos. Existing methods for multi-person pose estimation in images cannot be applied directly to this problem, since it also requires to solve the problem of person association over time in addition to the pose estimation for each person. We therefore propose a novel method that jointly models multi-person pose estimation and tracking in a single formulation. To this end, we represent body joint detections in a video by a spatio-temporal graph and solve an integer linear program to partition the graph into sub-graphs that correspond to plausible body pose trajectories for each person. The proposed approach implicitly handles occlusion and truncation of persons. Since the problem has not been addressed quantitatively in the literature, we introduce a challenging “Multi-Person PoseTrack” dataset, and also propose a completely unconstrained evaluation protocol that does not make any assumptions about the scale, size, location or the number of persons. Finally, we evaluate the proposed approach and several baseline methods on our new dataset. |
Tasks | Multi-Person Pose Estimation, Multi-Person Pose Estimation and Tracking, Pose Estimation |
Published | 2016-11-23 |
URL | http://arxiv.org/abs/1611.07727v3 |
http://arxiv.org/pdf/1611.07727v3.pdf | |
PWC | https://paperswithcode.com/paper/posetrack-joint-multi-person-pose-estimation |
Repo | https://github.com/iqbalu/PoseTrack-CVPR2017 |
Framework | none |
Semantic Scene Completion from a Single Depth Image
Title | Semantic Scene Completion from a Single Depth Image |
Authors | Shuran Song, Fisher Yu, Andy Zeng, Angel X. Chang, Manolis Savva, Thomas Funkhouser |
Abstract | This paper focuses on semantic scene completion, a task for producing a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation. Previous work has considered scene completion and semantic labeling of depth maps separately. However, we observe that these two problems are tightly intertwined. To leverage the coupled nature of these two tasks, we introduce the semantic scene completion network (SSCNet), an end-to-end 3D convolutional network that takes a single depth image as input and simultaneously outputs occupancy and semantic labels for all voxels in the camera view frustum. Our network uses a dilation-based 3D context module to efficiently expand the receptive field and enable 3D context learning. To train our network, we construct SUNCG - a manually created large-scale dataset of synthetic 3D scenes with dense volumetric annotations. Our experiments demonstrate that the joint model outperforms methods addressing each task in isolation and outperforms alternative approaches on the semantic scene completion task. |
Tasks | |
Published | 2016-11-28 |
URL | http://arxiv.org/abs/1611.08974v1 |
http://arxiv.org/pdf/1611.08974v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-scene-completion-from-a-single-depth |
Repo | https://github.com/facebookresearch/House3D |
Framework | none |
Neural Machine Translation in Linear Time
Title | Neural Machine Translation in Linear Time |
Authors | Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, Koray Kavukcuoglu |
Abstract | We present a novel neural network for processing sequences. The ByteNet is a one-dimensional convolutional neural network that is composed of two parts, one to encode the source sequence and the other to decode the target sequence. The two network parts are connected by stacking the decoder on top of the encoder and preserving the temporal resolution of the sequences. To address the differing lengths of the source and the target, we introduce an efficient mechanism by which the decoder is dynamically unfolded over the representation of the encoder. The ByteNet uses dilation in the convolutional layers to increase its receptive field. The resulting network has two core properties: it runs in time that is linear in the length of the sequences and it sidesteps the need for excessive memorization. The ByteNet decoder attains state-of-the-art performance on character-level language modelling and outperforms the previous best results obtained with recurrent networks. The ByteNet also achieves state-of-the-art performance on character-to-character machine translation on the English-to-German WMT translation task, surpassing comparable neural translation models that are based on recurrent networks with attentional pooling and run in quadratic time. We find that the latent alignment structure contained in the representations reflects the expected alignment between the tokens. |
Tasks | Language Modelling, Machine Translation |
Published | 2016-10-31 |
URL | http://arxiv.org/abs/1610.10099v2 |
http://arxiv.org/pdf/1610.10099v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-machine-translation-in-linear-time |
Repo | https://github.com/paarthneekhara/byteNet-tensorflow |
Framework | tf |
Estimating individual treatment effect: generalization bounds and algorithms
Title | Estimating individual treatment effect: generalization bounds and algorithms |
Authors | Uri Shalit, Fredrik D. Johansson, David Sontag |
Abstract | There is intense interest in applying machine learning to problems of causal inference in fields such as healthcare, economics and education. In particular, individual-level causal inference has important applications such as precision medicine. We give a new theoretical analysis and family of algorithms for predicting individual treatment effect (ITE) from observational data, under the assumption known as strong ignorability. The algorithms learn a “balanced” representation such that the induced treated and control distributions look similar. We give a novel, simple and intuitive generalization-error bound showing that the expected ITE estimation error of a representation is bounded by a sum of the standard generalization-error of that representation and the distance between the treated and control distributions induced by the representation. We use Integral Probability Metrics to measure distances between distributions, deriving explicit bounds for the Wasserstein and Maximum Mean Discrepancy (MMD) distances. Experiments on real and simulated data show the new algorithms match or outperform the state-of-the-art. |
Tasks | Causal Inference |
Published | 2016-06-13 |
URL | http://arxiv.org/abs/1606.03976v5 |
http://arxiv.org/pdf/1606.03976v5.pdf | |
PWC | https://paperswithcode.com/paper/estimating-individual-treatment-effect |
Repo | https://github.com/clinicalml/cfrnet |
Framework | tf |
Practical Black-Box Attacks against Machine Learning
Title | Practical Black-Box Attacks against Machine Learning |
Authors | Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, Ananthram Swami |
Abstract | Machine learning (ML) models, e.g., deep neural networks (DNNs), are vulnerable to adversarial examples: malicious inputs modified to yield erroneous model outputs, while appearing unmodified to human observers. Potential attacks include having malicious content like malware identified as legitimate or controlling vehicle behavior. Yet, all existing adversarial example attacks require knowledge of either the model internals or its training data. We introduce the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge. Indeed, the only capability of our black-box adversary is to observe labels given by the DNN to chosen inputs. Our attack strategy consists in training a local model to substitute for the target DNN, using inputs synthetically generated by an adversary and labeled by the target DNN. We use the local substitute to craft adversarial examples, and find that they are misclassified by the targeted DNN. To perform a real-world and properly-blinded evaluation, we attack a DNN hosted by MetaMind, an online deep learning API. We find that their DNN misclassifies 84.24% of the adversarial examples crafted with our substitute. We demonstrate the general applicability of our strategy to many ML techniques by conducting the same attack against models hosted by Amazon and Google, using logistic regression substitutes. They yield adversarial examples misclassified by Amazon and Google at rates of 96.19% and 88.94%. We also find that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder. |
Tasks | |
Published | 2016-02-08 |
URL | http://arxiv.org/abs/1602.02697v4 |
http://arxiv.org/pdf/1602.02697v4.pdf | |
PWC | https://paperswithcode.com/paper/practical-black-box-attacks-against-machine |
Repo | https://github.com/adrian-botta/understanding_adversarial_examples |
Framework | none |
Language Modeling with Gated Convolutional Networks
Title | Language Modeling with Gated Convolutional Networks |
Authors | Yann N. Dauphin, Angela Fan, Michael Auli, David Grangier |
Abstract | The pre-dominant approach to language modeling to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. We propose a novel simplified gating mechanism that outperforms Oord et al (2016) and investigate the impact of key architectural decisions. The proposed approach achieves state-of-the-art on the WikiText-103 benchmark, even though it features long-term dependencies, as well as competitive results on the Google Billion Words benchmark. Our model reduces the latency to score a sentence by an order of magnitude compared to a recurrent baseline. To our knowledge, this is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks. |
Tasks | Language Modelling |
Published | 2016-12-23 |
URL | http://arxiv.org/abs/1612.08083v3 |
http://arxiv.org/pdf/1612.08083v3.pdf | |
PWC | https://paperswithcode.com/paper/language-modeling-with-gated-convolutional |
Repo | https://github.com/ifrit98/layer-glu |
Framework | none |
Span-Based Constituency Parsing with a Structure-Label System and Provably Optimal Dynamic Oracles
Title | Span-Based Constituency Parsing with a Structure-Label System and Provably Optimal Dynamic Oracles |
Authors | James Cross, Liang Huang |
Abstract | Parsing accuracy using efficient greedy transition systems has improved dramatically in recent years thanks to neural networks. Despite striking results in dependency parsing, however, neural models have not surpassed state-of-the-art approaches in constituency parsing. To remedy this, we introduce a new shift-reduce system whose stack contains merely sentence spans, represented by a bare minimum of LSTM features. We also design the first provably optimal dynamic oracle for constituency parsing, which runs in amortized O(1) time, compared to O(n^3) oracles for standard dependency parsing. Training with this oracle, we achieve the best F1 scores on both English and French of any parser that does not use reranking or external data. |
Tasks | Constituency Parsing, Dependency Parsing |
Published | 2016-12-20 |
URL | http://arxiv.org/abs/1612.06475v1 |
http://arxiv.org/pdf/1612.06475v1.pdf | |
PWC | https://paperswithcode.com/paper/span-based-constituency-parsing-with-a |
Repo | https://github.com/jhcross/span-parser |
Framework | none |
Operational Calculus for Differentiable Programming
Title | Operational Calculus for Differentiable Programming |
Authors | Žiga Sajovic, Martin Vuk |
Abstract | In this work we present a theoretical model for differentiable programming. We construct an algebraic language that encapsulates formal semantics of differentiable programs by way of Operational Calculus. The algebraic nature of Operational Calculus can alter the properties of the programs that are expressed within the language and transform them into their solutions. In our model programs are elements of programming spaces and viewed as maps from the virtual memory space to itself. Virtual memory space is an algebra of programs, an algebraic data structure one can calculate with. We define the operator of differentiation ($\partial$) on programming spaces and, using its powers, implement the general shift operator and the operator of program composition. We provide the formula for the expansion of a differentiable program into an infinite tensor series in terms of the powers of $\partial$. We express the operator of program composition in terms of the generalized shift operator and $\partial$, which implements a differentiable composition in the language. Such operators serve as abstractions over the tensor series algebra, as main actors in our language. We demonstrate our models usefulness in differentiable programming by using it to analyse iterators, deriving fractional iterations and their iterating velocities, and explicitly solve the special case of ReduceSum. |
Tasks | |
Published | 2016-10-25 |
URL | http://arxiv.org/abs/1610.07690v6 |
http://arxiv.org/pdf/1610.07690v6.pdf | |
PWC | https://paperswithcode.com/paper/operational-calculus-for-differentiable |
Repo | https://github.com/zigasajovic/dCpp |
Framework | none |
Neural Photo Editing with Introspective Adversarial Networks
Title | Neural Photo Editing with Introspective Adversarial Networks |
Authors | Andrew Brock, Theodore Lim, J. M. Ritchie, Nick Weston |
Abstract | The increasingly photorealistic sample quality of generative image models suggests their feasibility in applications beyond image generation. We present the Neural Photo Editor, an interface that leverages the power of generative neural networks to make large, semantically coherent changes to existing images. To tackle the challenge of achieving accurate reconstructions without loss of feature quality, we introduce the Introspective Adversarial Network, a novel hybridization of the VAE and GAN. Our model efficiently captures long-range dependencies through use of a computational block based on weight-shared dilated convolutions, and improves generalization performance with Orthogonal Regularization, a novel weight regularization method. We validate our contributions on CelebA, SVHN, and CIFAR-100, and produce samples and reconstructions with high visual fidelity. |
Tasks | Image Generation |
Published | 2016-09-22 |
URL | http://arxiv.org/abs/1609.07093v3 |
http://arxiv.org/pdf/1609.07093v3.pdf | |
PWC | https://paperswithcode.com/paper/neural-photo-editing-with-introspective |
Repo | https://github.com/ajbrock/Neural-Photo-Editor |
Framework | tf |
Full Resolution Image Compression with Recurrent Neural Networks
Title | Full Resolution Image Compression with Recurrent Neural Networks |
Authors | George Toderici, Damien Vincent, Nick Johnston, Sung Jin Hwang, David Minnen, Joel Shor, Michele Covell |
Abstract | This paper presents a set of full-resolution lossy image compression methods based on neural networks. Each of the architectures we describe can provide variable compression rates during deployment without requiring retraining of the network: each network need only be trained once. All of our architectures consist of a recurrent neural network (RNN)-based encoder and decoder, a binarizer, and a neural network for entropy coding. We compare RNN types (LSTM, associative LSTM) and introduce a new hybrid of GRU and ResNet. We also study “one-shot” versus additive reconstruction architectures and introduce a new scaled-additive framework. We compare to previous work, showing improvements of 4.3%-8.8% AUC (area under the rate-distortion curve), depending on the perceptual metric used. As far as we know, this is the first neural network architecture that is able to outperform JPEG at image compression across most bitrates on the rate-distortion curve on the Kodak dataset images, with and without the aid of entropy coding. |
Tasks | Image Compression |
Published | 2016-08-18 |
URL | http://arxiv.org/abs/1608.05148v2 |
http://arxiv.org/pdf/1608.05148v2.pdf | |
PWC | https://paperswithcode.com/paper/full-resolution-image-compression-with |
Repo | https://github.com/SimonTsungHanKuo/ImageCompzByGRU |
Framework | pytorch |