July 29, 2019

2664 words 13 mins read

Paper Group AWR 153

Augmentor: An Image Augmentation Library for Machine Learning. Weakly Supervised Action Localization by Sparse Temporal Pooling Network. Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset. Unsupervised Body Part Regression via Spatially Self-ordering Convolutional Neural Networks. Analyzing First-Person Stori …

Augmentor: An Image Augmentation Library for Machine Learning


Title	Augmentor: An Image Augmentation Library for Machine Learning
Authors	Marcus D. Bloice, Christof Stocker, Andreas Holzinger
Abstract	The generation of artificial data based on existing observations, known as data augmentation, is a technique used in machine learning to improve model accuracy, generalisation, and to control overfitting. Augmentor is a software package, available in both Python and Julia versions, that provides a high level API for the expansion of image data using a stochastic, pipeline-based approach which effectively allows for images to be sampled from a distribution of augmented images at runtime. Augmentor provides methods for most standard augmentation practices as well as several advanced features such as label-preserving, randomised elastic distortions, and provides many helper functions for typical augmentation tasks used in machine learning.
Tasks	Data Augmentation, Image Augmentation
Published	2017-08-11
URL	http://arxiv.org/abs/1708.04680v1
PDF	http://arxiv.org/pdf/1708.04680v1.pdf
PWC	https://paperswithcode.com/paper/augmentor-an-image-augmentation-library-for
Repo	https://github.com/JunHahn/image-augmentation-workspace
Framework	none

Weakly Supervised Action Localization by Sparse Temporal Pooling Network


Title	Weakly Supervised Action Localization by Sparse Temporal Pooling Network
Authors	Phuc Nguyen, Ting Liu, Gautam Prasad, Bohyung Han
Abstract	We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks. Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations. We design our network to identify a sparse subset of key segments associated with target actions in a video using an attention module and fuse the key segments through adaptive temporal pooling. Our loss function is comprised of two terms that minimize the video-level action classification error and enforce the sparsity of the segment selection. At inference time, we extract and score temporal proposals using temporal class activations and class-agnostic attentions to estimate the time intervals that correspond to target actions. The proposed algorithm attains state-of-the-art results on the THUMOS14 dataset and outstanding performance on ActivityNet1.3 even with its weak supervision.
Tasks	Action Classification, Action Localization, Temporal Action Localization, Temporal Localization, Weakly Supervised Action Localization, Weakly-supervised Temporal Action Localization
Published	2017-12-14
URL	http://arxiv.org/abs/1712.05080v2
PDF	http://arxiv.org/pdf/1712.05080v2.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-action-localization-by
Repo	https://github.com/demianzhang/weakly-action-localization
Framework	none

Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset


Title	Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset
Authors	Seil Na, Youngjae Yu, Sangho Lee, Jisung Kim, Gunhee Kim
Abstract	YouTube-8M is the largest video dataset for multi-label video classification. In order to tackle the multi-label classification on this challenging dataset, it is necessary to solve several issues such as temporal modeling of videos, label imbalances, and correlations between labels. We develop a deep neural network model, which consists of four components: the frame encoder, the classification layer, the label processing layer, and the loss function. We introduce our newly proposed methods and discusses how existing models operate in the YouTube-8M Classification Task, what insights they have, and why they succeed (or fail) to achieve good performance. Most of the models we proposed are very high compared to the baseline models, and the ensemble of the models we used is 8th in the Kaggle Competition.
Tasks	Multi-Label Classification, Video Classification
Published	2017-06-24
URL	http://arxiv.org/abs/1706.07960v2
PDF	http://arxiv.org/pdf/1706.07960v2.pdf
PWC	https://paperswithcode.com/paper/encoding-video-and-label-priors-for-multi
Repo	https://github.com/seilna/youtube-8m
Framework	tf

Unsupervised Body Part Regression via Spatially Self-ordering Convolutional Neural Networks


Title	Unsupervised Body Part Regression via Spatially Self-ordering Convolutional Neural Networks
Authors	Ke Yan, Le Lu, Ronald M. Summers
Abstract	Automatic body part recognition for CT slices can benefit various medical image applications. Recent deep learning methods demonstrate promising performance, with the requirement of large amounts of labeled images for training. The intrinsic structural or superior-inferior slice ordering information in CT volumes is not fully exploited. In this paper, we propose a convolutional neural network (CNN) based Unsupervised Body part Regression (UBR) algorithm to address this problem. A novel unsupervised learning method and two inter-sample CNN loss functions are presented. Distinct from previous work, UBR builds a coordinate system for the human body and outputs a continuous score for each axial slice, representing the normalized position of the body part in the slice. The training process of UBR resembles a self-organization process: slice scores are learned from inter-slice relationships. The training samples are unlabeled CT volumes that are abundant, thus no extra annotation effort is needed. UBR is simple, fast, and accurate. Quantitative and qualitative experiments validate its effectiveness. In addition, we show two applications of UBR in network initialization and anomaly detection.
Tasks	Anomaly Detection
Published	2017-07-12
URL	http://arxiv.org/abs/1707.03891v2
PDF	http://arxiv.org/pdf/1707.03891v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-body-part-regression-via
Repo	https://github.com/Gabsha/ssbr
Framework	none

Analyzing First-Person Stories Based on Socializing, Eating and Sedentary Patterns


Title	Analyzing First-Person Stories Based on Socializing, Eating and Sedentary Patterns
Authors	Pedro Herruzo, Laura Portell, Alberto Soto, Beatriz Remeseiro
Abstract	First-person stories can be analyzed by means of egocentric pictures acquired throughout the whole active day with wearable cameras. This manuscript presents an egocentric dataset with more than 45,000 pictures from four people in different environments such as working or studying. All the images were manually labeled to identify three patterns of interest regarding people’s lifestyle: socializing, eating and sedentary. Additionally, two different approaches are proposed to classify egocentric images into one of the 12 target categories defined to characterize these three patterns. The approaches are based on machine learning and deep learning techniques, including traditional classifiers and state-of-art convolutional neural networks. The experimental results obtained when applying these methods to the egocentric dataset demonstrated their adequacy for the problem at hand.
Tasks
Published	2017-07-25
URL	http://arxiv.org/abs/1707.07863v1
PDF	http://arxiv.org/pdf/1707.07863v1.pdf
PWC	https://paperswithcode.com/paper/analyzing-first-person-stories-based-on
Repo	https://github.com/alsoba13/LAP-Annotation-Tool
Framework	none

On Inductive Abilities of Latent Factor Models for Relational Learning


Title	On Inductive Abilities of Latent Factor Models for Relational Learning
Authors	Théo Trouillon, Éric Gaussier, Christopher R. Dance, Guillaume Bouchard
Abstract	Latent factor models are increasingly popular for modeling multi-relational knowledge graphs. By their vectorial nature, it is not only hard to interpret why this class of models works so well, but also to understand where they fail and how they might be improved. We conduct an experimental survey of state-of-the-art models, not towards a purely comparative end, but as a means to get insight about their inductive abilities. To assess the strengths and weaknesses of each model, we create simple tasks that exhibit first, atomic properties of binary relations, and then, common inter-relational inference through synthetic genealogies. Based on these experimental results, we propose new research directions to improve on existing models.
Tasks	Knowledge Graphs, Relational Reasoning
Published	2017-09-17
URL	http://arxiv.org/abs/1709.05666v1
PDF	http://arxiv.org/pdf/1709.05666v1.pdf
PWC	https://paperswithcode.com/paper/on-inductive-abilities-of-latent-factor
Repo	https://github.com/ttrouill/induction_experiments
Framework	none

Non-Stationary Spectral Kernels


Title	Non-Stationary Spectral Kernels
Authors	Sami Remes, Markus Heinonen, Samuel Kaski
Abstract	We propose non-stationary spectral kernels for Gaussian process regression. We propose to model the spectral density of a non-stationary kernel function as a mixture of input-dependent Gaussian process frequency density surfaces. We solve the generalised Fourier transform with such a model, and present a family of non-stationary and non-monotonic kernels that can learn input-dependent and potentially long-range, non-monotonic covariances between inputs. We derive efficient inference using model whitening and marginalized posterior, and show with case studies that these kernels are necessary when modelling even rather simple time series, image or geospatial data with non-stationary characteristics.
Tasks	Time Series
Published	2017-05-24
URL	http://arxiv.org/abs/1705.08736v1
PDF	http://arxiv.org/pdf/1705.08736v1.pdf
PWC	https://paperswithcode.com/paper/non-stationary-spectral-kernels
Repo	https://github.com/sremes/nonstationary-spectral-kernels
Framework	none

ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching


Title	ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching
Authors	Chunyuan Li, Hao Liu, Changyou Chen, Yunchen Pu, Liqun Chen, Ricardo Henao, Lawrence Carin
Abstract	We investigate the non-identifiability issues associated with bidirectional adversarial training for joint distribution matching. Within a framework of conditional entropy, we propose both adversarial and non-adversarial approaches to learn desirable matched joint distributions for unsupervised and supervised tasks. We unify a broad family of adversarial models as joint distribution matching problems. Our approach stabilizes learning of unsupervised bidirectional adversarial learning methods. Further, we introduce an extension for semi-supervised learning tasks. Theoretical results are validated in synthetic data and real-world applications.
Tasks
Published	2017-09-05
URL	http://arxiv.org/abs/1709.01215v2
PDF	http://arxiv.org/pdf/1709.01215v2.pdf
PWC	https://paperswithcode.com/paper/alice-towards-understanding-adversarial
Repo	https://github.com/zhenxuan00/graphical-gan
Framework	tf

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data


Title	Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data
Authors	Wei-Ning Hsu, Yu Zhang, James Glass
Abstract	We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision. Specifically, we exploit the multi-scale nature of information in sequential data by formulating it explicitly within a factorized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables. The model is evaluated on two speech corpora to demonstrate, qualitatively, its ability to transform speakers or linguistic content by manipulating different sets of latent variables; and quantitatively, its ability to outperform an i-vector baseline for speaker verification and reduce the word error rate by as much as 35% in mismatched train/test scenarios for automatic speech recognition tasks.
Tasks	Speaker Verification, Speech Recognition
Published	2017-09-22
URL	http://arxiv.org/abs/1709.07902v1
PDF	http://arxiv.org/pdf/1709.07902v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-learning-of-disentangled-and
Repo	https://github.com/wnhsu/ScalableFHVAE
Framework	tf

CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication


Title	CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication
Authors	Jin-Hwa Kim, Nikita Kitaev, Xinlei Chen, Marcus Rohrbach, Byoung-Tak Zhang, Yuandong Tian, Dhruv Batra, Devi Parikh
Abstract	In this work, we propose a goal-driven collaborative task that combines language, perception, and action. Specifically, we develop a Collaborative image-Drawing game between two agents, called CoDraw. Our game is grounded in a virtual world that contains movable clip art objects. The game involves two players: a Teller and a Drawer. The Teller sees an abstract scene containing multiple clip art pieces in a semantically meaningful configuration, while the Drawer tries to reconstruct the scene on an empty canvas using available clip art pieces. The two players communicate with each other using natural language. We collect the CoDraw dataset of ~10K dialogs consisting of ~138K messages exchanged between human players. We define protocols and metrics to evaluate learned agents in this testbed, highlighting the need for a novel “crosstalk” evaluation condition which pairs agents trained independently on disjoint subsets of the training data. We present models for our task and benchmark them using both fully automated evaluation and by having them play the game live with humans.
Tasks	Imitation Learning
Published	2017-12-15
URL	https://arxiv.org/abs/1712.05558v3
PDF	https://arxiv.org/pdf/1712.05558v3.pdf
PWC	https://paperswithcode.com/paper/codraw-collaborative-drawing-as-a-testbed-for
Repo	https://github.com/facebookresearch/codraw-models
Framework	pytorch

THAP: A Matlab Toolkit for Learning with Hawkes Processes


Title	THAP: A Matlab Toolkit for Learning with Hawkes Processes
Authors	Hongteng Xu, Hongyuan Zha
Abstract	As a powerful tool of asynchronous event sequence analysis, point processes have been studied for a long time and achieved numerous successes in different fields. Among various point process models, Hawkes process and its variants attract many researchers in statistics and computer science these years because they capture the self- and mutually-triggering patterns between different events in complicated sequences explicitly and quantitatively and are broadly applicable to many practical problems. In this paper, we describe an open-source toolkit implementing many learning algorithms and analysis tools for Hawkes process model and its variants. Our toolkit systematically summarizes recent state-of-the-art algorithms as well as most classic algorithms of Hawkes processes, which is beneficial for both academical education and research. Source code can be downloaded from https://github.com/HongtengXu/Hawkes-Process-Toolkit.
Tasks	Point Processes
Published	2017-08-28
URL	http://arxiv.org/abs/1708.09252v1
PDF	http://arxiv.org/pdf/1708.09252v1.pdf
PWC	https://paperswithcode.com/paper/thap-a-matlab-toolkit-for-learning-with
Repo	https://github.com/HongtengXu/Hawkes-Process-Toolkit
Framework	none

Failures of Gradient-Based Deep Learning


Title	Failures of Gradient-Based Deep Learning
Authors	Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah
Abstract	In recent years, Deep Learning has become the go-to solution for a broad range of applications, often outperforming state-of-the-art. However, it is important, for both theoreticians and practitioners, to gain a deeper understanding of the difficulties and limitations associated with common approaches and algorithms. We describe four types of simple problems, for which the gradient-based algorithms commonly used in deep learning either fail or suffer from significant difficulties. We illustrate the failures through practical experiments, and provide theoretical insights explaining their source, and how they might be remedied.
Tasks
Published	2017-03-23
URL	http://arxiv.org/abs/1703.07950v2
PDF	http://arxiv.org/pdf/1703.07950v2.pdf
PWC	https://paperswithcode.com/paper/failures-of-gradient-based-deep-learning
Repo	https://github.com/shakedshammah/failures_of_DL
Framework	tf

Scalable Gaussian Processes with Billions of Inducing Inputs via Tensor Train Decomposition


Title	Scalable Gaussian Processes with Billions of Inducing Inputs via Tensor Train Decomposition
Authors	Pavel Izmailov, Alexander Novikov, Dmitry Kropotov
Abstract	We propose a method (TT-GP) for approximate inference in Gaussian Process (GP) models. We build on previous scalable GP research including stochastic variational inference based on inducing inputs, kernel interpolation, and structure exploiting algebra. The key idea of our method is to use Tensor Train decomposition for variational parameters, which allows us to train GPs with billions of inducing inputs and achieve state-of-the-art results on several benchmarks. Further, our approach allows for training kernels based on deep neural networks without any modifications to the underlying GP model. A neural network learns a multidimensional embedding for the data, which is used by the GP to make the final prediction. We train GP and neural network parameters end-to-end without pretraining, through maximization of GP marginal likelihood. We show the efficiency of the proposed approach on several regression and classification benchmark datasets including MNIST, CIFAR-10, and Airline.
Tasks	Gaussian Processes
Published	2017-10-19
URL	http://arxiv.org/abs/1710.07324v2
PDF	http://arxiv.org/pdf/1710.07324v2.pdf
PWC	https://paperswithcode.com/paper/scalable-gaussian-processes-with-billions-of
Repo	https://github.com/izmailovpavel/TTGP
Framework	tf

Option Pricing and Hedging for Discrete Time Autoregressive Hidden Markov Model


Title	Option Pricing and Hedging for Discrete Time Autoregressive Hidden Markov Model
Authors	Massimo Caccia, Bruno Rémillard
Abstract	In this paper we solve the discrete time mean-variance hedging problem when asset returns follow a multivariate autoregressive hidden Markov model. Time dependent volatility and serial dependence are well established properties of financial time series and our model covers both. To illustrate the relevance of our proposed methodology, we first compare the proposed model with the well-known hidden Markov model via likelihood ratio tests and a novel goodness-of-fit test on the S&P 500 daily returns. Secondly, we present out-of-sample hedging results on S&P 500 vanilla options as well as a trading strategy based on theoretical prices, which we compare to simpler models including the classical Black-Scholes delta-hedging approach.
Tasks	Time Series
Published	2017-07-07
URL	http://arxiv.org/abs/1707.02019v1
PDF	http://arxiv.org/pdf/1707.02019v1.pdf
PWC	https://paperswithcode.com/paper/option-pricing-and-hedging-for-discrete-time
Repo	https://github.com/optimass/Optimal_hedging_ARHMM
Framework	none

No Fuss Distance Metric Learning using Proxies


Title	No Fuss Distance Metric Learning using Proxies
Authors	Yair Movshovitz-Attias, Alexander Toshev, Thomas K. Leung, Sergey Ioffe, Saurabh Singh
Abstract	We address the problem of distance metric learning (DML), defined as learning a distance consistent with a notion of semantic similarity. Traditionally, for this problem supervision is expressed in the form of sets of points that follow an ordinal relationship – an anchor point $x$ is similar to a set of positive points $Y$, and dissimilar to a set of negative points $Z$, and a loss defined over these distances is minimized. While the specifics of the optimization differ, in this work we collectively call this type of supervision Triplets and all methods that follow this pattern Triplet-Based methods. These methods are challenging to optimize. A main issue is the need for finding informative triplets, which is usually achieved by a variety of tricks such as increasing the batch size, hard or semi-hard triplet mining, etc. Even with these tricks, the convergence rate of such methods is slow. In this paper we propose to optimize the triplet loss on a different space of triplets, consisting of an anchor data point and similar and dissimilar proxy points which are learned as well. These proxies approximate the original data points, so that a triplet loss over the proxies is a tight upper bound of the original loss. This proxy-based loss is empirically better behaved. As a result, the proxy-loss improves on state-of-art results for three standard zero-shot learning datasets, by up to 15% points, while converging three times as fast as other triplet-based losses.
Tasks	Metric Learning, Semantic Similarity, Semantic Textual Similarity, Zero-Shot Learning
Published	2017-03-21
URL	http://arxiv.org/abs/1703.07464v3
PDF	http://arxiv.org/pdf/1703.07464v3.pdf
PWC	https://paperswithcode.com/paper/no-fuss-distance-metric-learning-using
Repo	https://github.com/Confusezius/Deep-Metric-Learning-Baselines
Framework	pytorch