Paper Group AWR 153
Augmentor: An Image Augmentation Library for Machine Learning. Weakly Supervised Action Localization by Sparse Temporal Pooling Network. Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset. Unsupervised Body Part Regression via Spatially Self-ordering Convolutional Neural Networks. Analyzing First-Person Stori …
Augmentor: An Image Augmentation Library for Machine Learning
Title | Augmentor: An Image Augmentation Library for Machine Learning |
Authors | Marcus D. Bloice, Christof Stocker, Andreas Holzinger |
Abstract | The generation of artificial data based on existing observations, known as data augmentation, is a technique used in machine learning to improve model accuracy, generalisation, and to control overfitting. Augmentor is a software package, available in both Python and Julia versions, that provides a high level API for the expansion of image data using a stochastic, pipeline-based approach which effectively allows for images to be sampled from a distribution of augmented images at runtime. Augmentor provides methods for most standard augmentation practices as well as several advanced features such as label-preserving, randomised elastic distortions, and provides many helper functions for typical augmentation tasks used in machine learning. |
Tasks | Data Augmentation, Image Augmentation |
Published | 2017-08-11 |
URL | http://arxiv.org/abs/1708.04680v1 |
http://arxiv.org/pdf/1708.04680v1.pdf | |
PWC | https://paperswithcode.com/paper/augmentor-an-image-augmentation-library-for |
Repo | https://github.com/JunHahn/image-augmentation-workspace |
Framework | none |
Weakly Supervised Action Localization by Sparse Temporal Pooling Network
Title | Weakly Supervised Action Localization by Sparse Temporal Pooling Network |
Authors | Phuc Nguyen, Ting Liu, Gautam Prasad, Bohyung Han |
Abstract | We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks. Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations. We design our network to identify a sparse subset of key segments associated with target actions in a video using an attention module and fuse the key segments through adaptive temporal pooling. Our loss function is comprised of two terms that minimize the video-level action classification error and enforce the sparsity of the segment selection. At inference time, we extract and score temporal proposals using temporal class activations and class-agnostic attentions to estimate the time intervals that correspond to target actions. The proposed algorithm attains state-of-the-art results on the THUMOS14 dataset and outstanding performance on ActivityNet1.3 even with its weak supervision. |
Tasks | Action Classification, Action Localization, Temporal Action Localization, Temporal Localization, Weakly Supervised Action Localization, Weakly-supervised Temporal Action Localization |
Published | 2017-12-14 |
URL | http://arxiv.org/abs/1712.05080v2 |
http://arxiv.org/pdf/1712.05080v2.pdf | |
PWC | https://paperswithcode.com/paper/weakly-supervised-action-localization-by |
Repo | https://github.com/demianzhang/weakly-action-localization |
Framework | none |
Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset
Title | Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset |
Authors | Seil Na, Youngjae Yu, Sangho Lee, Jisung Kim, Gunhee Kim |
Abstract | YouTube-8M is the largest video dataset for multi-label video classification. In order to tackle the multi-label classification on this challenging dataset, it is necessary to solve several issues such as temporal modeling of videos, label imbalances, and correlations between labels. We develop a deep neural network model, which consists of four components: the frame encoder, the classification layer, the label processing layer, and the loss function. We introduce our newly proposed methods and discusses how existing models operate in the YouTube-8M Classification Task, what insights they have, and why they succeed (or fail) to achieve good performance. Most of the models we proposed are very high compared to the baseline models, and the ensemble of the models we used is 8th in the Kaggle Competition. |
Tasks | Multi-Label Classification, Video Classification |
Published | 2017-06-24 |
URL | http://arxiv.org/abs/1706.07960v2 |
http://arxiv.org/pdf/1706.07960v2.pdf | |
PWC | https://paperswithcode.com/paper/encoding-video-and-label-priors-for-multi |
Repo | https://github.com/seilna/youtube-8m |
Framework | tf |
Unsupervised Body Part Regression via Spatially Self-ordering Convolutional Neural Networks
Title | Unsupervised Body Part Regression via Spatially Self-ordering Convolutional Neural Networks |
Authors | Ke Yan, Le Lu, Ronald M. Summers |
Abstract | Automatic body part recognition for CT slices can benefit various medical image applications. Recent deep learning methods demonstrate promising performance, with the requirement of large amounts of labeled images for training. The intrinsic structural or superior-inferior slice ordering information in CT volumes is not fully exploited. In this paper, we propose a convolutional neural network (CNN) based Unsupervised Body part Regression (UBR) algorithm to address this problem. A novel unsupervised learning method and two inter-sample CNN loss functions are presented. Distinct from previous work, UBR builds a coordinate system for the human body and outputs a continuous score for each axial slice, representing the normalized position of the body part in the slice. The training process of UBR resembles a self-organization process: slice scores are learned from inter-slice relationships. The training samples are unlabeled CT volumes that are abundant, thus no extra annotation effort is needed. UBR is simple, fast, and accurate. Quantitative and qualitative experiments validate its effectiveness. In addition, we show two applications of UBR in network initialization and anomaly detection. |
Tasks | Anomaly Detection |
Published | 2017-07-12 |
URL | http://arxiv.org/abs/1707.03891v2 |
http://arxiv.org/pdf/1707.03891v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-body-part-regression-via |
Repo | https://github.com/Gabsha/ssbr |
Framework | none |
Analyzing First-Person Stories Based on Socializing, Eating and Sedentary Patterns
Title | Analyzing First-Person Stories Based on Socializing, Eating and Sedentary Patterns |
Authors | Pedro Herruzo, Laura Portell, Alberto Soto, Beatriz Remeseiro |
Abstract | First-person stories can be analyzed by means of egocentric pictures acquired throughout the whole active day with wearable cameras. This manuscript presents an egocentric dataset with more than 45,000 pictures from four people in different environments such as working or studying. All the images were manually labeled to identify three patterns of interest regarding people’s lifestyle: socializing, eating and sedentary. Additionally, two different approaches are proposed to classify egocentric images into one of the 12 target categories defined to characterize these three patterns. The approaches are based on machine learning and deep learning techniques, including traditional classifiers and state-of-art convolutional neural networks. The experimental results obtained when applying these methods to the egocentric dataset demonstrated their adequacy for the problem at hand. |
Tasks | |
Published | 2017-07-25 |
URL | http://arxiv.org/abs/1707.07863v1 |
http://arxiv.org/pdf/1707.07863v1.pdf | |
PWC | https://paperswithcode.com/paper/analyzing-first-person-stories-based-on |
Repo | https://github.com/alsoba13/LAP-Annotation-Tool |
Framework | none |
On Inductive Abilities of Latent Factor Models for Relational Learning
Title | On Inductive Abilities of Latent Factor Models for Relational Learning |
Authors | Théo Trouillon, Éric Gaussier, Christopher R. Dance, Guillaume Bouchard |
Abstract | Latent factor models are increasingly popular for modeling multi-relational knowledge graphs. By their vectorial nature, it is not only hard to interpret why this class of models works so well, but also to understand where they fail and how they might be improved. We conduct an experimental survey of state-of-the-art models, not towards a purely comparative end, but as a means to get insight about their inductive abilities. To assess the strengths and weaknesses of each model, we create simple tasks that exhibit first, atomic properties of binary relations, and then, common inter-relational inference through synthetic genealogies. Based on these experimental results, we propose new research directions to improve on existing models. |
Tasks | Knowledge Graphs, Relational Reasoning |
Published | 2017-09-17 |
URL | http://arxiv.org/abs/1709.05666v1 |
http://arxiv.org/pdf/1709.05666v1.pdf | |
PWC | https://paperswithcode.com/paper/on-inductive-abilities-of-latent-factor |
Repo | https://github.com/ttrouill/induction_experiments |
Framework | none |
Non-Stationary Spectral Kernels
Title | Non-Stationary Spectral Kernels |
Authors | Sami Remes, Markus Heinonen, Samuel Kaski |
Abstract | We propose non-stationary spectral kernels for Gaussian process regression. We propose to model the spectral density of a non-stationary kernel function as a mixture of input-dependent Gaussian process frequency density surfaces. We solve the generalised Fourier transform with such a model, and present a family of non-stationary and non-monotonic kernels that can learn input-dependent and potentially long-range, non-monotonic covariances between inputs. We derive efficient inference using model whitening and marginalized posterior, and show with case studies that these kernels are necessary when modelling even rather simple time series, image or geospatial data with non-stationary characteristics. |
Tasks | Time Series |
Published | 2017-05-24 |
URL | http://arxiv.org/abs/1705.08736v1 |
http://arxiv.org/pdf/1705.08736v1.pdf | |
PWC | https://paperswithcode.com/paper/non-stationary-spectral-kernels |
Repo | https://github.com/sremes/nonstationary-spectral-kernels |
Framework | none |
ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching
Title | ALICE: Towards Understanding Adversarial Learning for Joint Distribution Matching |
Authors | Chunyuan Li, Hao Liu, Changyou Chen, Yunchen Pu, Liqun Chen, Ricardo Henao, Lawrence Carin |
Abstract | We investigate the non-identifiability issues associated with bidirectional adversarial training for joint distribution matching. Within a framework of conditional entropy, we propose both adversarial and non-adversarial approaches to learn desirable matched joint distributions for unsupervised and supervised tasks. We unify a broad family of adversarial models as joint distribution matching problems. Our approach stabilizes learning of unsupervised bidirectional adversarial learning methods. Further, we introduce an extension for semi-supervised learning tasks. Theoretical results are validated in synthetic data and real-world applications. |
Tasks | |
Published | 2017-09-05 |
URL | http://arxiv.org/abs/1709.01215v2 |
http://arxiv.org/pdf/1709.01215v2.pdf | |
PWC | https://paperswithcode.com/paper/alice-towards-understanding-adversarial |
Repo | https://github.com/zhenxuan00/graphical-gan |
Framework | tf |
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data
Title | Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data |
Authors | Wei-Ning Hsu, Yu Zhang, James Glass |
Abstract | We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision. Specifically, we exploit the multi-scale nature of information in sequential data by formulating it explicitly within a factorized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables. The model is evaluated on two speech corpora to demonstrate, qualitatively, its ability to transform speakers or linguistic content by manipulating different sets of latent variables; and quantitatively, its ability to outperform an i-vector baseline for speaker verification and reduce the word error rate by as much as 35% in mismatched train/test scenarios for automatic speech recognition tasks. |
Tasks | Speaker Verification, Speech Recognition |
Published | 2017-09-22 |
URL | http://arxiv.org/abs/1709.07902v1 |
http://arxiv.org/pdf/1709.07902v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-of-disentangled-and |
Repo | https://github.com/wnhsu/ScalableFHVAE |
Framework | tf |
CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication
Title | CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication |
Authors | Jin-Hwa Kim, Nikita Kitaev, Xinlei Chen, Marcus Rohrbach, Byoung-Tak Zhang, Yuandong Tian, Dhruv Batra, Devi Parikh |
Abstract | In this work, we propose a goal-driven collaborative task that combines language, perception, and action. Specifically, we develop a Collaborative image-Drawing game between two agents, called CoDraw. Our game is grounded in a virtual world that contains movable clip art objects. The game involves two players: a Teller and a Drawer. The Teller sees an abstract scene containing multiple clip art pieces in a semantically meaningful configuration, while the Drawer tries to reconstruct the scene on an empty canvas using available clip art pieces. The two players communicate with each other using natural language. We collect the CoDraw dataset of ~10K dialogs consisting of ~138K messages exchanged between human players. We define protocols and metrics to evaluate learned agents in this testbed, highlighting the need for a novel “crosstalk” evaluation condition which pairs agents trained independently on disjoint subsets of the training data. We present models for our task and benchmark them using both fully automated evaluation and by having them play the game live with humans. |
Tasks | Imitation Learning |
Published | 2017-12-15 |
URL | https://arxiv.org/abs/1712.05558v3 |
https://arxiv.org/pdf/1712.05558v3.pdf | |
PWC | https://paperswithcode.com/paper/codraw-collaborative-drawing-as-a-testbed-for |
Repo | https://github.com/facebookresearch/codraw-models |
Framework | pytorch |
THAP: A Matlab Toolkit for Learning with Hawkes Processes
Title | THAP: A Matlab Toolkit for Learning with Hawkes Processes |
Authors | Hongteng Xu, Hongyuan Zha |
Abstract | As a powerful tool of asynchronous event sequence analysis, point processes have been studied for a long time and achieved numerous successes in different fields. Among various point process models, Hawkes process and its variants attract many researchers in statistics and computer science these years because they capture the self- and mutually-triggering patterns between different events in complicated sequences explicitly and quantitatively and are broadly applicable to many practical problems. In this paper, we describe an open-source toolkit implementing many learning algorithms and analysis tools for Hawkes process model and its variants. Our toolkit systematically summarizes recent state-of-the-art algorithms as well as most classic algorithms of Hawkes processes, which is beneficial for both academical education and research. Source code can be downloaded from https://github.com/HongtengXu/Hawkes-Process-Toolkit. |
Tasks | Point Processes |
Published | 2017-08-28 |
URL | http://arxiv.org/abs/1708.09252v1 |
http://arxiv.org/pdf/1708.09252v1.pdf | |
PWC | https://paperswithcode.com/paper/thap-a-matlab-toolkit-for-learning-with |
Repo | https://github.com/HongtengXu/Hawkes-Process-Toolkit |
Framework | none |
Failures of Gradient-Based Deep Learning
Title | Failures of Gradient-Based Deep Learning |
Authors | Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah |
Abstract | In recent years, Deep Learning has become the go-to solution for a broad range of applications, often outperforming state-of-the-art. However, it is important, for both theoreticians and practitioners, to gain a deeper understanding of the difficulties and limitations associated with common approaches and algorithms. We describe four types of simple problems, for which the gradient-based algorithms commonly used in deep learning either fail or suffer from significant difficulties. We illustrate the failures through practical experiments, and provide theoretical insights explaining their source, and how they might be remedied. |
Tasks | |
Published | 2017-03-23 |
URL | http://arxiv.org/abs/1703.07950v2 |
http://arxiv.org/pdf/1703.07950v2.pdf | |
PWC | https://paperswithcode.com/paper/failures-of-gradient-based-deep-learning |
Repo | https://github.com/shakedshammah/failures_of_DL |
Framework | tf |
Scalable Gaussian Processes with Billions of Inducing Inputs via Tensor Train Decomposition
Title | Scalable Gaussian Processes with Billions of Inducing Inputs via Tensor Train Decomposition |
Authors | Pavel Izmailov, Alexander Novikov, Dmitry Kropotov |
Abstract | We propose a method (TT-GP) for approximate inference in Gaussian Process (GP) models. We build on previous scalable GP research including stochastic variational inference based on inducing inputs, kernel interpolation, and structure exploiting algebra. The key idea of our method is to use Tensor Train decomposition for variational parameters, which allows us to train GPs with billions of inducing inputs and achieve state-of-the-art results on several benchmarks. Further, our approach allows for training kernels based on deep neural networks without any modifications to the underlying GP model. A neural network learns a multidimensional embedding for the data, which is used by the GP to make the final prediction. We train GP and neural network parameters end-to-end without pretraining, through maximization of GP marginal likelihood. We show the efficiency of the proposed approach on several regression and classification benchmark datasets including MNIST, CIFAR-10, and Airline. |
Tasks | Gaussian Processes |
Published | 2017-10-19 |
URL | http://arxiv.org/abs/1710.07324v2 |
http://arxiv.org/pdf/1710.07324v2.pdf | |
PWC | https://paperswithcode.com/paper/scalable-gaussian-processes-with-billions-of |
Repo | https://github.com/izmailovpavel/TTGP |
Framework | tf |
Option Pricing and Hedging for Discrete Time Autoregressive Hidden Markov Model
Title | Option Pricing and Hedging for Discrete Time Autoregressive Hidden Markov Model |
Authors | Massimo Caccia, Bruno Rémillard |
Abstract | In this paper we solve the discrete time mean-variance hedging problem when asset returns follow a multivariate autoregressive hidden Markov model. Time dependent volatility and serial dependence are well established properties of financial time series and our model covers both. To illustrate the relevance of our proposed methodology, we first compare the proposed model with the well-known hidden Markov model via likelihood ratio tests and a novel goodness-of-fit test on the S&P 500 daily returns. Secondly, we present out-of-sample hedging results on S&P 500 vanilla options as well as a trading strategy based on theoretical prices, which we compare to simpler models including the classical Black-Scholes delta-hedging approach. |
Tasks | Time Series |
Published | 2017-07-07 |
URL | http://arxiv.org/abs/1707.02019v1 |
http://arxiv.org/pdf/1707.02019v1.pdf | |
PWC | https://paperswithcode.com/paper/option-pricing-and-hedging-for-discrete-time |
Repo | https://github.com/optimass/Optimal_hedging_ARHMM |
Framework | none |
No Fuss Distance Metric Learning using Proxies
Title | No Fuss Distance Metric Learning using Proxies |
Authors | Yair Movshovitz-Attias, Alexander Toshev, Thomas K. Leung, Sergey Ioffe, Saurabh Singh |
Abstract | We address the problem of distance metric learning (DML), defined as learning a distance consistent with a notion of semantic similarity. Traditionally, for this problem supervision is expressed in the form of sets of points that follow an ordinal relationship – an anchor point $x$ is similar to a set of positive points $Y$, and dissimilar to a set of negative points $Z$, and a loss defined over these distances is minimized. While the specifics of the optimization differ, in this work we collectively call this type of supervision Triplets and all methods that follow this pattern Triplet-Based methods. These methods are challenging to optimize. A main issue is the need for finding informative triplets, which is usually achieved by a variety of tricks such as increasing the batch size, hard or semi-hard triplet mining, etc. Even with these tricks, the convergence rate of such methods is slow. In this paper we propose to optimize the triplet loss on a different space of triplets, consisting of an anchor data point and similar and dissimilar proxy points which are learned as well. These proxies approximate the original data points, so that a triplet loss over the proxies is a tight upper bound of the original loss. This proxy-based loss is empirically better behaved. As a result, the proxy-loss improves on state-of-art results for three standard zero-shot learning datasets, by up to 15% points, while converging three times as fast as other triplet-based losses. |
Tasks | Metric Learning, Semantic Similarity, Semantic Textual Similarity, Zero-Shot Learning |
Published | 2017-03-21 |
URL | http://arxiv.org/abs/1703.07464v3 |
http://arxiv.org/pdf/1703.07464v3.pdf | |
PWC | https://paperswithcode.com/paper/no-fuss-distance-metric-learning-using |
Repo | https://github.com/Confusezius/Deep-Metric-Learning-Baselines |
Framework | pytorch |