July 29, 2019

2664 words 13 mins read

Paper Group AWR 153

Paper Group AWR 153

Augmentor: An Image Augmentation Library for Machine Learning. Weakly Supervised Action Localization by Sparse Temporal Pooling Network. Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset. Unsupervised Body Part Regression via Spatially Self-ordering Convolutional Neural Networks. Analyzing First-Person Stori …

Augmentor: An Image Augmentation Library for Machine Learning

Title Augmentor: An Image Augmentation Library for Machine Learning
Authors Marcus D. Bloice, Christof Stocker, Andreas Holzinger
Abstract The generation of artificial data based on existing observations, known as data augmentation, is a technique used in machine learning to improve model accuracy, generalisation, and to control overfitting. Augmentor is a software package, available in both Python and Julia versions, that provides a high level API for the expansion of image data using a stochastic, pipeline-based approach which effectively allows for images to be sampled from a distribution of augmented images at runtime. Augmentor provides methods for most standard augmentation practices as well as several advanced features such as label-preserving, randomised elastic distortions, and provides many helper functions for typical augmentation tasks used in machine learning.
Tasks Data Augmentation, Image Augmentation
Published 2017-08-11
URL http://arxiv.org/abs/1708.04680v1
PDF http://arxiv.org/pdf/1708.04680v1.pdf
PWC https://paperswithcode.com/paper/augmentor-an-image-augmentation-library-for
Repo https://github.com/JunHahn/image-augmentation-workspace
Framework none

Weakly Supervised Action Localization by Sparse Temporal Pooling Network

Title Weakly Supervised Action Localization by Sparse Temporal Pooling Network
Authors Phuc Nguyen, Ting Liu, Gautam Prasad, Bohyung Han
Abstract We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks. Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations. We design our network to identify a sparse subset of key segments associated with target actions in a video using an attention module and fuse the key segments through adaptive temporal pooling. Our loss function is comprised of two terms that minimize the video-level action classification error and enforce the sparsity of the segment selection. At inference time, we extract and score temporal proposals using temporal class activations and class-agnostic attentions to estimate the time intervals that correspond to target actions. The proposed algorithm attains state-of-the-art results on the THUMOS14 dataset and outstanding performance on ActivityNet1.3 even with its weak supervision.
Tasks Action Classification, Action Localization, Temporal Action Localization, Temporal Localization, Weakly Supervised Action Localization, Weakly-supervised Temporal Action Localization
Published 2017-12-14
URL http://arxiv.org/abs/1712.05080v2
PDF http://arxiv.org/pdf/1712.05080v2.pdf
PWC https://paperswithcode.com/paper/weakly-supervised-action-localization-by
Repo https://github.com/demianzhang/weakly-action-localization
Framework none

Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset

Title Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset
Authors Seil Na, Youngjae Yu, Sangho Lee, Jisung Kim, Gunhee Kim
Abstract YouTube-8M is the largest video dataset for multi-label video classification. In order to tackle the multi-label classification on this challenging dataset, it is necessary to solve several issues such as temporal modeling of videos, label imbalances, and correlations between labels. We develop a deep neural network model, which consists of four components: the frame encoder, the classification layer, the label processing layer, and the loss function. We introduce our newly proposed methods and discusses how existing models operate in the YouTube-8M Classification Task, what insights they have, and why they succeed (or fail) to achieve good performance. Most of the models we proposed are very high compared to the baseline models, and the ensemble of the models we used is 8th in the Kaggle Competition.
Tasks Multi-Label Classification, Video Classification
Published 2017-06-24
URL http://arxiv.org/abs/1706.07960v2
PDF http://arxiv.org/pdf/1706.07960v2.pdf
PWC https://paperswithcode.com/paper/encoding-video-and-label-priors-for-multi
Repo https://github.com/seilna/youtube-8m
Framework tf

Unsupervised Body Part Regression via Spatially Self-ordering Convolutional Neural Networks

Title Unsupervised Body Part Regression via Spatially Self-ordering Convolutional Neural Networks
Authors Ke Yan, Le Lu, Ronald M. Summers
Abstract Automatic body part recognition for CT slices can benefit various medical image applications. Recent deep learning methods demonstrate promising performance, with the requirement of large amounts of labeled images for training. The intrinsic structural or superior-inferior slice ordering information in CT volumes is not fully exploited. In this paper, we propose a convolutional neural network (CNN) based Unsupervised Body part Regression (UBR) algorithm to address this problem. A novel unsupervised learning method and two inter-sample CNN loss functions are presented. Distinct from previous work, UBR builds a coordinate system for the human body and outputs a continuous score for each axial slice, representing the normalized position of the body part in the slice. The training process of UBR resembles a self-organization process: slice scores are learned from inter-slice relationships. The training samples are unlabeled CT volumes that are abundant, thus no extra annotation effort is needed. UBR is simple, fast, and accurate. Quantitative and qualitative experiments validate its effectiveness. In addition, we show two applications of UBR in network initialization and anomaly detection.
Tasks Anomaly Detection
Published 2017-07-12
URL http://arxiv.org/abs/1707.03891v2
PDF http://arxiv.org/pdf/1707.03891v2.pdf
PWC https://paperswithcode.com/paper/unsupervised-body-part-regression-via
Repo https://github.com/Gabsha/ssbr
Framework none

Analyzing First-Person Stories Based on Socializing, Eating and Sedentary Patterns

Title Analyzing First-Person Stories Based on Socializing, Eating and Sedentary Patterns
Authors Pedro Herruzo, Laura Portell, Alberto Soto, Beatriz Remeseiro
Abstract First-person stories can be analyzed by means of egocentric pictures acquired throughout the whole active day with wearable cameras. This manuscript presents an egocentric dataset with more than 45,000 pictures from four people in different environments such as working or studying. All the images were manually labeled to identify three patterns of interest regarding people’s lifestyle: socializing, eating and sedentary. Additionally, two different approaches are proposed to classify egocentric images into one of the 12 target categories defined to characterize these three patterns. The approaches are based on machine learning and deep learning techniques, including traditional classifiers and state-of-art convolutional neural networks. The experimental results obtained when applying these methods to the egocentric dataset demonstrated their adequacy for the problem at hand.
Tasks
Published 2017-07-25
URL http://arxiv.org/abs/1707.07863v1
PDF http://arxiv.org/pdf/1707.07863v1.pdf
PWC https://paperswithcode.com/paper/analyzing-first-person-stories-based-on
Repo https://github.com/alsoba13/LAP-Annotation-Tool
Framework none

On Inductive Abilities of Latent Factor Models for Relational Learning

Title On Inductive Abilities of Latent Factor Models for Relational Learning
Authors Théo Trouillon, Éric Gaussier, Christopher R. Dance, Guillaume Bouchard
Abstract Latent factor models are increasingly popular for modeling multi-relational knowledge graphs. By their vectorial nature, it is not only hard to interpret why this class of models works so well, but also to understand where they fail and how they might be improved. We conduct an experimental survey of state-of-the-art models, not towards a purely comparative end, but as a means to get insight about their inductive abilities. To assess the strengths and weaknesses of each model, we create simple tasks that exhibit first, atomic properties of binary relations, and then, common inter-relational inference through synthetic genealogies. Based on these experimental results, we propose new research directions to improve on existing models.
Tasks Knowledge Graphs, Relational Reasoning
Published 2017-09-17
URL http://arxiv.org/abs/1709.05666v1
PDF http://arxiv.org/pdf/1709.05666v1.pdf
PWC https://paperswithcode.com/paper/on-inductive-abilities-of-latent-factor
Repo https://github.com/ttrouill/induction_experiments
Framework none

Non-Stationary Spectral Kernels

Title Non-Stationary Spectral Kernels
Authors Sami Remes, Markus Heinonen, Samuel Kaski
Abstract We propose non-stationary spectral kernels for Gaussian process regression. We propose to model the spectral density of a non-stationary kernel function as a mixture of input-dependent Gaussian process frequency density surfaces. We solve the generalised Fourier transform with such a model, and present a family of non-stationary and non-monotonic kernels that can learn input-dependent and potentially long-range, non-monotonic covariances between inputs. We derive efficient inference using model whitening and marginalized posterior, and show with case studies that these kernels are necessary when modelling even rather simple time series, image or geospatial data with non-stationary characteristics.
Tasks Time Series
Published 2017-05-24
URL http://arxiv.org/abs/1705.08736v1
PDF http://arxiv.org/pdf/1705.08736v1.pdf
PWC https://paperswithcode.com/paper/non-stationary-spectral-kernels
Repo https://github.com/sremes/nonstationary-spectral-kernels
Framework none

ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching

Title ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching
Authors Chunyuan Li, Hao Liu, Changyou Chen, Yunchen Pu, Liqun Chen, Ricardo Henao, Lawrence Carin
Abstract We investigate the non-identifiability issues associated with bidirectional adversarial training for joint distribution matching. Within a framework of conditional entropy, we propose both adversarial and non-adversarial approaches to learn desirable matched joint distributions for unsupervised and supervised tasks. We unify a broad family of adversarial models as joint distribution matching problems. Our approach stabilizes learning of unsupervised bidirectional adversarial learning methods. Further, we introduce an extension for semi-supervised learning tasks. Theoretical results are validated in synthetic data and real-world applications.
Tasks
Published 2017-09-05
URL http://arxiv.org/abs/1709.01215v2
PDF http://arxiv.org/pdf/1709.01215v2.pdf
PWC https://paperswithcode.com/paper/alice-towards-understanding-adversarial
Repo https://github.com/zhenxuan00/graphical-gan
Framework tf

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

Title Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data
Authors Wei-Ning Hsu, Yu Zhang, James Glass
Abstract We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision. Specifically, we exploit the multi-scale nature of information in sequential data by formulating it explicitly within a factorized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables. The model is evaluated on two speech corpora to demonstrate, qualitatively, its ability to transform speakers or linguistic content by manipulating different sets of latent variables; and quantitatively, its ability to outperform an i-vector baseline for speaker verification and reduce the word error rate by as much as 35% in mismatched train/test scenarios for automatic speech recognition tasks.
Tasks Speaker Verification, Speech Recognition
Published 2017-09-22
URL http://arxiv.org/abs/1709.07902v1
PDF http://arxiv.org/pdf/1709.07902v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-learning-of-disentangled-and
Repo https://github.com/wnhsu/ScalableFHVAE
Framework tf

CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication

Title CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication
Authors Jin-Hwa Kim, Nikita Kitaev, Xinlei Chen, Marcus Rohrbach, Byoung-Tak Zhang, Yuandong Tian, Dhruv Batra, Devi Parikh
Abstract In this work, we propose a goal-driven collaborative task that combines language, perception, and action. Specifically, we develop a Collaborative image-Drawing game between two agents, called CoDraw. Our game is grounded in a virtual world that contains movable clip art objects. The game involves two players: a Teller and a Drawer. The Teller sees an abstract scene containing multiple clip art pieces in a semantically meaningful configuration, while the Drawer tries to reconstruct the scene on an empty canvas using available clip art pieces. The two players communicate with each other using natural language. We collect the CoDraw dataset of ~10K dialogs consisting of ~138K messages exchanged between human players. We define protocols and metrics to evaluate learned agents in this testbed, highlighting the need for a novel “crosstalk” evaluation condition which pairs agents trained independently on disjoint subsets of the training data. We present models for our task and benchmark them using both fully automated evaluation and by having them play the game live with humans.
Tasks Imitation Learning
Published 2017-12-15
URL https://arxiv.org/abs/1712.05558v3
PDF https://arxiv.org/pdf/1712.05558v3.pdf
PWC https://paperswithcode.com/paper/codraw-collaborative-drawing-as-a-testbed-for
Repo https://github.com/facebookresearch/codraw-models
Framework pytorch

THAP: A Matlab Toolkit for Learning with Hawkes Processes

Title THAP: A Matlab Toolkit for Learning with Hawkes Processes
Authors Hongteng Xu, Hongyuan Zha
Abstract As a powerful tool of asynchronous event sequence analysis, point processes have been studied for a long time and achieved numerous successes in different fields. Among various point process models, Hawkes process and its variants attract many researchers in statistics and computer science these years because they capture the self- and mutually-triggering patterns between different events in complicated sequences explicitly and quantitatively and are broadly applicable to many practical problems. In this paper, we describe an open-source toolkit implementing many learning algorithms and analysis tools for Hawkes process model and its variants. Our toolkit systematically summarizes recent state-of-the-art algorithms as well as most classic algorithms of Hawkes processes, which is beneficial for both academical education and research. Source code can be downloaded from https://github.com/HongtengXu/Hawkes-Process-Toolkit.
Tasks Point Processes
Published 2017-08-28
URL http://arxiv.org/abs/1708.09252v1
PDF http://arxiv.org/pdf/1708.09252v1.pdf
PWC https://paperswithcode.com/paper/thap-a-matlab-toolkit-for-learning-with
Repo https://github.com/HongtengXu/Hawkes-Process-Toolkit
Framework none

Failures of Gradient-Based Deep Learning

Title Failures of Gradient-Based Deep Learning
Authors Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah
Abstract In recent years, Deep Learning has become the go-to solution for a broad range of applications, often outperforming state-of-the-art. However, it is important, for both theoreticians and practitioners, to gain a deeper understanding of the difficulties and limitations associated with common approaches and algorithms. We describe four types of simple problems, for which the gradient-based algorithms commonly used in deep learning either fail or suffer from significant difficulties. We illustrate the failures through practical experiments, and provide theoretical insights explaining their source, and how they might be remedied.
Tasks
Published 2017-03-23
URL http://arxiv.org/abs/1703.07950v2
PDF http://arxiv.org/pdf/1703.07950v2.pdf
PWC https://paperswithcode.com/paper/failures-of-gradient-based-deep-learning
Repo https://github.com/shakedshammah/failures_of_DL
Framework tf

Scalable Gaussian Processes with Billions of Inducing Inputs via Tensor Train Decomposition

Title Scalable Gaussian Processes with Billions of Inducing Inputs via Tensor Train Decomposition
Authors Pavel Izmailov, Alexander Novikov, Dmitry Kropotov
Abstract We propose a method (TT-GP) for approximate inference in Gaussian Process (GP) models. We build on previous scalable GP research including stochastic variational inference based on inducing inputs, kernel interpolation, and structure exploiting algebra. The key idea of our method is to use Tensor Train decomposition for variational parameters, which allows us to train GPs with billions of inducing inputs and achieve state-of-the-art results on several benchmarks. Further, our approach allows for training kernels based on deep neural networks without any modifications to the underlying GP model. A neural network learns a multidimensional embedding for the data, which is used by the GP to make the final prediction. We train GP and neural network parameters end-to-end without pretraining, through maximization of GP marginal likelihood. We show the efficiency of the proposed approach on several regression and classification benchmark datasets including MNIST, CIFAR-10, and Airline.
Tasks Gaussian Processes
Published 2017-10-19
URL http://arxiv.org/abs/1710.07324v2
PDF http://arxiv.org/pdf/1710.07324v2.pdf
PWC https://paperswithcode.com/paper/scalable-gaussian-processes-with-billions-of
Repo https://github.com/izmailovpavel/TTGP
Framework tf

Option Pricing and Hedging for Discrete Time Autoregressive Hidden Markov Model

Title Option Pricing and Hedging for Discrete Time Autoregressive Hidden Markov Model
Authors Massimo Caccia, Bruno Rémillard
Abstract In this paper we solve the discrete time mean-variance hedging problem when asset returns follow a multivariate autoregressive hidden Markov model. Time dependent volatility and serial dependence are well established properties of financial time series and our model covers both. To illustrate the relevance of our proposed methodology, we first compare the proposed model with the well-known hidden Markov model via likelihood ratio tests and a novel goodness-of-fit test on the S&P 500 daily returns. Secondly, we present out-of-sample hedging results on S&P 500 vanilla options as well as a trading strategy based on theoretical prices, which we compare to simpler models including the classical Black-Scholes delta-hedging approach.
Tasks Time Series
Published 2017-07-07
URL http://arxiv.org/abs/1707.02019v1
PDF http://arxiv.org/pdf/1707.02019v1.pdf
PWC https://paperswithcode.com/paper/option-pricing-and-hedging-for-discrete-time
Repo https://github.com/optimass/Optimal_hedging_ARHMM
Framework none

No Fuss Distance Metric Learning using Proxies

Title No Fuss Distance Metric Learning using Proxies
Authors Yair Movshovitz-Attias, Alexander Toshev, Thomas K. Leung, Sergey Ioffe, Saurabh Singh
Abstract We address the problem of distance metric learning (DML), defined as learning a distance consistent with a notion of semantic similarity. Traditionally, for this problem supervision is expressed in the form of sets of points that follow an ordinal relationship – an anchor point $x$ is similar to a set of positive points $Y$, and dissimilar to a set of negative points $Z$, and a loss defined over these distances is minimized. While the specifics of the optimization differ, in this work we collectively call this type of supervision Triplets and all methods that follow this pattern Triplet-Based methods. These methods are challenging to optimize. A main issue is the need for finding informative triplets, which is usually achieved by a variety of tricks such as increasing the batch size, hard or semi-hard triplet mining, etc. Even with these tricks, the convergence rate of such methods is slow. In this paper we propose to optimize the triplet loss on a different space of triplets, consisting of an anchor data point and similar and dissimilar proxy points which are learned as well. These proxies approximate the original data points, so that a triplet loss over the proxies is a tight upper bound of the original loss. This proxy-based loss is empirically better behaved. As a result, the proxy-loss improves on state-of-art results for three standard zero-shot learning datasets, by up to 15% points, while converging three times as fast as other triplet-based losses.
Tasks Metric Learning, Semantic Similarity, Semantic Textual Similarity, Zero-Shot Learning
Published 2017-03-21
URL http://arxiv.org/abs/1703.07464v3
PDF http://arxiv.org/pdf/1703.07464v3.pdf
PWC https://paperswithcode.com/paper/no-fuss-distance-metric-learning-using
Repo https://github.com/Confusezius/Deep-Metric-Learning-Baselines
Framework pytorch
comments powered by Disqus