February 1, 2020

2800 words 14 mins read

Paper Group AWR 346

Paper Group AWR 346

Harry Potter and the Action Prediction Challenge from Natural Language. Multi-Modal Fusion for End-to-End RGB-T Tracking. ReinBo: Machine Learning pipeline search and configuration with Bayesian Optimization embedded Reinforcement Learning. Real-world attack on MTCNN face detection system. Interpretable machine learning: definitions, methods, and a …

Harry Potter and the Action Prediction Challenge from Natural Language

Title Harry Potter and the Action Prediction Challenge from Natural Language
Authors David Vilares, Carlos Gómez-Rodríguez
Abstract We explore the challenge of action prediction from textual descriptions of scenes, a testbed to approximate whether text inference can be used to predict upcoming actions. As a case of study, we consider the world of the Harry Potter fantasy novels and inferring what spell will be cast next given a fragment of a story. Spells act as keywords that abstract actions (e.g. ‘Alohomora’ to open a door) and denote a response to the environment. This idea is used to automatically build HPAC, a corpus containing 82,836 samples and 85 actions. We then evaluate different baselines. Among the tested models, an LSTM-based approach obtains the best performance for frequent actions and large scene descriptions, but approaches such as logistic regression behave well on infrequent actions.
Tasks
Published 2019-05-27
URL https://arxiv.org/abs/1905.11037v1
PDF https://arxiv.org/pdf/1905.11037v1.pdf
PWC https://paperswithcode.com/paper/harry-potter-and-the-action-prediction
Repo https://github.com/aghie/hpac
Framework tf

Multi-Modal Fusion for End-to-End RGB-T Tracking

Title Multi-Modal Fusion for End-to-End RGB-T Tracking
Authors Lichao Zhang, Martin Danelljan, Abel Gonzalez-Garcia, Joost van de Weijer, Fahad Shahbaz Khan
Abstract We propose an end-to-end tracking framework for fusing the RGB and TIR modalities in RGB-T tracking. Our baseline tracker is DiMP (Discriminative Model Prediction), which employs a carefully designed target prediction network trained end-to-end using a discriminative loss. We analyze the effectiveness of modality fusion in each of the main components in DiMP, i.e. feature extractor, target estimation network, and classifier. We consider several fusion mechanisms acting at different levels of the framework, including pixel-level, feature-level and response-level. Our tracker is trained in an end-to-end manner, enabling the components to learn how to fuse the information from both modalities. As data to train our model, we generate a large-scale RGB-T dataset by considering an annotated RGB tracking dataset (GOT-10k) and synthesizing paired TIR images using an image-to-image translation approach. We perform extensive experiments on VOT-RGBT2019 dataset and RGBT210 dataset, evaluating each type of modality fusing on each model component. The results show that the proposed fusion mechanisms improve the performance of the single modality counterparts. We obtain our best results when fusing at the feature-level on both the IoU-Net and the model predictor, obtaining an EAO score of 0.391 on VOT-RGBT2019 dataset. With this fusion mechanism we achieve the state-of-the-art performance on RGBT210 dataset.
Tasks Image-to-Image Translation, Rgb-T Tracking
Published 2019-08-30
URL https://arxiv.org/abs/1908.11714v1
PDF https://arxiv.org/pdf/1908.11714v1.pdf
PWC https://paperswithcode.com/paper/multi-modal-fusion-for-end-to-end-rgb-t
Repo https://github.com/zhanglichao/end2end_rgbt_tracking
Framework pytorch

ReinBo: Machine Learning pipeline search and configuration with Bayesian Optimization embedded Reinforcement Learning

Title ReinBo: Machine Learning pipeline search and configuration with Bayesian Optimization embedded Reinforcement Learning
Authors Xudong Sun, Jiali Lin, Bernd Bischl
Abstract Machine learning pipeline potentially consists of several stages of operations like data preprocessing, feature engineering and machine learning model training. Each operation has a set of hyper-parameters, which can become irrelevant for the pipeline when the operation is not selected. This gives rise to a hierarchical conditional hyper-parameter space. To optimize this mixed continuous and discrete conditional hierarchical hyper-parameter space, we propose an efficient pipeline search and configuration algorithm which combines the power of Reinforcement Learning and Bayesian Optimization. Empirical results show that our method performs favorably compared to state of the art methods like Auto-sklearn , TPOT, Tree Parzen Window, and Random Search.
Tasks Feature Engineering
Published 2019-04-10
URL http://arxiv.org/abs/1904.05381v1
PDF http://arxiv.org/pdf/1904.05381v1.pdf
PWC https://paperswithcode.com/paper/reinbo-machine-learning-pipeline-search-and
Repo https://github.com/smilesun/reinbo
Framework none

Real-world attack on MTCNN face detection system

Title Real-world attack on MTCNN face detection system
Authors Edgar Kaziakhmedov, Klim Kireev, Grigorii Melnikov, Mikhail Pautov, Aleksandr Petiushko
Abstract Recent studies proved that deep learning approaches achieve remarkable results on face detection task. On the other hand, the advances gave rise to a new problem associated with the security of the deep convolutional neural network models unveiling potential risks of DCNNs based applications. Even minor input changes in the digital domain can result in the network being fooled. It was shown then that some deep learning-based face detectors are prone to adversarial attacks not only in a digital domain but also in the real world. In the paper, we investigate the security of the well-known cascade CNN face detection system - MTCNN and introduce an easily reproducible and a robust way to attack it. We propose different face attributes printed on an ordinary white and black printer and attached either to the medical face mask or to the face directly. Our approach is capable of breaking the MTCNN detector in a real-world scenario.
Tasks Face Detection
Published 2019-10-14
URL https://arxiv.org/abs/1910.06261v1
PDF https://arxiv.org/pdf/1910.06261v1.pdf
PWC https://paperswithcode.com/paper/real-world-attack-on-mtcnn-face-detection
Repo https://github.com/edosedgar/mtcnnattack
Framework tf

Interpretable machine learning: definitions, methods, and applications

Title Interpretable machine learning: definitions, methods, and applications
Authors W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, Bin Yu
Abstract Official code for using / reproducing ACD (ICLR 2019) from the paper “Hierarchical interpretations for neural network predictions” https://arxiv.org/abs/1806.05337
Tasks Feature Importance, Interpretable Machine Learning
Published 2019-01-14
URL http://arxiv.org/abs/1901.04592v1
PDF http://arxiv.org/pdf/1901.04592v1.pdf
PWC https://paperswithcode.com/paper/interpretable-machine-learning-definitions
Repo https://github.com/sumbose/iRF
Framework none

DocBERT: BERT for Document Classification

Title DocBERT: BERT for Document Classification
Authors Ashutosh Adhikari, Achyudh Ram, Raphael Tang, Jimmy Lin
Abstract We present, to our knowledge, the first application of BERT to document classification. A few characteristics of the task might lead one to think that BERT is not the most appropriate model: syntactic structures matter less for content categories, documents can often be longer than typical BERT input, and documents often have multiple labels. Nevertheless, we show that a straightforward classification model using BERT is able to achieve the state of the art across four popular datasets. To address the computational expense associated with BERT inference, we distill knowledge from BERT-large to small bidirectional LSTMs, reaching BERT-base parity on multiple datasets using 30x fewer parameters. The primary contribution of our paper is improved baselines that can provide the foundation for future work.
Tasks Document Classification, Sentiment Analysis
Published 2019-04-17
URL https://arxiv.org/abs/1904.08398v3
PDF https://arxiv.org/pdf/1904.08398v3.pdf
PWC https://paperswithcode.com/paper/docbert-bert-for-document-classification
Repo https://github.com/castorini/hedwig
Framework pytorch

Spike-based primitives for graph algorithms

Title Spike-based primitives for graph algorithms
Authors Kathleen E. Hamilton, Tiffany M. Mintz, Catherine D. Schuman
Abstract In this paper we consider graph algorithms and graphical analysis as a new application for neuromorphic computing platforms. We demonstrate how the nonlinear dynamics of spiking neurons can be used to implement low-level graph operations. Our results are hardware agnostic, and we present multiple versions of routines that can utilize static synapses or require synapse plasticity.
Tasks
Published 2019-03-25
URL http://arxiv.org/abs/1903.10574v1
PDF http://arxiv.org/pdf/1903.10574v1.pdf
PWC https://paperswithcode.com/paper/spike-based-primitives-for-graph-algorithms
Repo https://github.com/abasak24/ece594Neuromorphic
Framework none

DeepPBM: Deep Probabilistic Background Model Estimation from Video Sequences

Title DeepPBM: Deep Probabilistic Background Model Estimation from Video Sequences
Authors Amirreza Farnoosh, Behnaz Rezaei, Sarah Ostadabbas
Abstract This paper presents a novel unsupervised probabilistic model estimation of visual background in video sequences using a variational autoencoder framework. Due to the redundant nature of the backgrounds in surveillance videos, visual information of the background can be compressed into a low-dimensional subspace in the encoder part of the variational autoencoder, while the highly variant information of its moving foreground gets filtered throughout its encoding-decoding process. Our deep probabilistic background model (DeepPBM) estimation approach is enabled by the power of deep neural networks in learning compressed representations of video frames and reconstructing them back to the original domain. We evaluated the performance of our DeepPBM in background subtraction on 9 surveillance videos from the background model challenge (BMC2012) dataset, and compared that with a standard subspace learning technique, robust principle component analysis (RPCA), which similarly estimates a deterministic low dimensional representation of the background in videos and is widely used for this application. Our method outperforms RPCA on BMC2012 dataset with 23% in average in F-measure score, emphasizing that background subtraction using the trained model can be done in more than 10 times faster.
Tasks
Published 2019-02-03
URL http://arxiv.org/abs/1902.00820v1
PDF http://arxiv.org/pdf/1902.00820v1.pdf
PWC https://paperswithcode.com/paper/deeppbm-deep-probabilistic-background-model
Repo https://github.com/ostadabbas/DeepPBM
Framework pytorch

Constrained domain adaptation for segmentation

Title Constrained domain adaptation for segmentation
Authors Mathilde Bateson, Jose Dolz, Hoel Kervadec, Hervé Lombaert, Ismail Ben Ayed
Abstract We propose to adapt segmentation networks with a constrained formulation, which embeds domain-invariant prior knowledge about the segmentation regions. Such knowledge may take the form of simple anatomical information, e.g., structure size or shape, estimated from source samples or known a priori. Our method imposes domain-invariant inequality constraints on the network outputs of unlabeled target samples. It implicitly matches prediction statistics between target and source domains with permitted uncertainty of prior knowledge. We address our constrained problem with a differentiable penalty, fully suited for standard stochastic gradient descent approaches, removing the need for computationally expensive Lagrangian optimization with dual projections. Unlike current two-step adversarial training, our formulation is based on a single loss in a single network, which simplifies adaptation by avoiding extra adversarial steps, while improving convergence and quality of training. The comparison of our approach with state-of-the-art adversarial methods reveals substantially better performance on the challenging task of adapting spine segmentation across different MRI modalities. Our results also show a robustness to imprecision of size priors, approaching the accuracy of a fully supervised model trained directly in a target domain.Our method can be readily used for various constraints and segmentation problems.
Tasks Domain Adaptation
Published 2019-08-08
URL https://arxiv.org/abs/1908.02996v1
PDF https://arxiv.org/pdf/1908.02996v1.pdf
PWC https://paperswithcode.com/paper/constrained-domain-adaptation-for
Repo https://github.com/CDAMICCAI2019/CDA
Framework pytorch

Cross-task weakly supervised learning from instructional videos

Title Cross-task weakly supervised learning from instructional videos
Authors Dimitri Zhukov, Jean-Baptiste Alayrac, Ramazan Gokberk Cinbis, David Fouhey, Ivan Laptev, Josef Sivic
Abstract In this paper we investigate learning visual models for the steps of ordinary tasks using weak supervision via instructional narrations and an ordered list of steps instead of strong supervision via temporal annotations. At the heart of our approach is the observation that weakly supervised learning may be easier if a model shares components while learning different steps: pour egg' should be trained jointly with other tasks involving pour’ and `egg’. We formalize this in a component model for recognizing steps and a weakly supervised learning framework that can learn this model under temporal constraints from narration and the list of steps. Past data does not permit systematic studying of sharing and so we also gather a new dataset, CrossTask, aimed at assessing cross-task sharing. Our experiments demonstrate that sharing across tasks improves performance, especially when done at the component level and that our component model can parse previously unseen tasks by virtue of its compositionality. |
Tasks
Published 2019-03-19
URL http://arxiv.org/abs/1903.08225v2
PDF http://arxiv.org/pdf/1903.08225v2.pdf
PWC https://paperswithcode.com/paper/cross-task-weakly-supervised-learning-from
Repo https://github.com/DmZhukov/CrossTask
Framework pytorch

Deep Random Splines for Point Process Intensity Estimation of Neural Population Data

Title Deep Random Splines for Point Process Intensity Estimation of Neural Population Data
Authors Gabriel Loaiza-Ganem, Sean M. Perkins, Karen E. Schroeder, Mark M. Churchland, John P. Cunningham
Abstract Gaussian processes are the leading class of distributions on random functions, but they suffer from well known issues including difficulty scaling and inflexibility with respect to certain shape constraints (such as nonnegativity). Here we propose Deep Random Splines, a flexible class of random functions obtained by transforming Gaussian noise through a deep neural network whose output are the parameters of a spline. Unlike Gaussian processes, Deep Random Splines allow us to readily enforce shape constraints while inheriting the richness and tractability of deep generative models. We also present an observational model for point process data which uses Deep Random Splines to model the intensity function of each point process and apply it to neural population data to obtain a low-dimensional representation of spiking activity. Inference is performed via a variational autoencoder that uses a novel recurrent encoder architecture that can handle multiple point processes as input. We use a newly collected dataset where a primate completes a pedaling task, and observe better dimensionality reduction with our model than with competing alternatives.
Tasks Dimensionality Reduction, Gaussian Processes, Point Processes
Published 2019-03-06
URL https://arxiv.org/abs/1903.02610v6
PDF https://arxiv.org/pdf/1903.02610v6.pdf
PWC https://paperswithcode.com/paper/deep-random-splines-for-point-process
Repo https://github.com/gabloa/drs
Framework tf

Mixed-curvature Variational Autoencoders

Title Mixed-curvature Variational Autoencoders
Authors Ondrej Skopek, Octavian-Eugen Ganea, Gary Bécigneul
Abstract Euclidean geometry has historically been the typical “workhorse” for machine learning applications due to its power and simplicity. However, it has recently been shown that geometric spaces with constant non-zero curvature improve representations and performance on a variety of data types and downstream tasks. Consequently, generative models like Variational Autoencoders (VAEs) have been successfully generalized to elliptical and hyperbolic latent spaces. While these approaches work well on data with particular kinds of biases e.g. tree-like data for a hyperbolic VAE, there exists no generic approach unifying and leveraging all three models. We develop a Mixed-curvature Variational Autoencoder, an efficient way to train a VAE whose latent space is a product of constant curvature Riemannian manifolds, where the per-component curvature is fixed or learnable. This generalizes the Euclidean VAE to curved latent spaces and recovers it when curvatures of all latent space components go to 0.
Tasks Latent Variable Models
Published 2019-11-19
URL https://arxiv.org/abs/1911.08411v2
PDF https://arxiv.org/pdf/1911.08411v2.pdf
PWC https://paperswithcode.com/paper/mixed-curvature-variational-autoencoders-1
Repo https://github.com/oskopek/mvae
Framework pytorch

MVB: A Large-Scale Dataset for Baggage Re-Identification and Merged Siamese Networks

Title MVB: A Large-Scale Dataset for Baggage Re-Identification and Merged Siamese Networks
Authors Zhulin Zhang, Dong Li, Jinhua Wu, Yunda Sun, Li Zhang
Abstract In this paper, we present a novel dataset named MVB (Multi View Baggage) for baggage ReID task which has some essential differences from person ReID. The features of MVB are three-fold. First, MVB is the first publicly released large-scale dataset that contains 4519 baggage identities and 22660 annotated baggage images as well as its surface material labels. Second, all baggage images are captured by specially-designed multi-view camera system to handle pose variation and occlusion, in order to obtain the 3D information of baggage surface as complete as possible. Third, MVB has remarkable inter-class similarity and intra-class dissimilarity, considering the fact that baggage might have very similar appearance while the data is collected in two real airport environments, where imaging factors varies significantly from each other. Moreover, we proposed a merged Siamese network as baseline model and evaluated its performance. Experiments and case study are conducted on MVB.
Tasks
Published 2019-07-26
URL https://arxiv.org/abs/1907.11366v1
PDF https://arxiv.org/pdf/1907.11366v1.pdf
PWC https://paperswithcode.com/paper/mvb-a-large-scale-dataset-for-baggage-re
Repo https://github.com/qq326823564/LSDNN
Framework pytorch

Importance Weighted Hierarchical Variational Inference

Title Importance Weighted Hierarchical Variational Inference
Authors Artem Sobolev, Dmitry Vetrov
Abstract Variational Inference is a powerful tool in the Bayesian modeling toolkit, however, its effectiveness is determined by the expressivity of the utilized variational distributions in terms of their ability to match the true posterior distribution. In turn, the expressivity of the variational family is largely limited by the requirement of having a tractable density function. To overcome this roadblock, we introduce a new family of variational upper bounds on a marginal log density in the case of hierarchical models (also known as latent variable models). We then give an upper bound on the Kullback-Leibler divergence and derive a family of increasingly tighter variational lower bounds on the otherwise intractable standard evidence lower bound for hierarchical variational distributions, enabling the use of more expressive approximate posteriors. We show that previously known methods, such as Hierarchical Variational Models, Semi-Implicit Variational Inference and Doubly Semi-Implicit Variational Inference can be seen as special cases of the proposed approach, and empirically demonstrate superior performance of the proposed method in a set of experiments.
Tasks Latent Variable Models
Published 2019-05-08
URL https://arxiv.org/abs/1905.03290v1
PDF https://arxiv.org/pdf/1905.03290v1.pdf
PWC https://paperswithcode.com/paper/190503290
Repo https://github.com/artsobolev/IWHVI
Framework tf

Low-Resource Response Generation with Template Prior

Title Low-Resource Response Generation with Template Prior
Authors Ze Yang, Wei Wu, Jian Yang, Can Xu, Zhoujun Li
Abstract We study open domain response generation with limited message-response pairs. The problem exists in real-world applications but is less explored by the existing work. Since the paired data now is no longer enough to train a neural generation model, we consider leveraging the large scale of unpaired data that are much easier to obtain, and propose response generation with both paired and unpaired data. The generation model is defined by an encoder-decoder architecture with templates as prior, where the templates are estimated from the unpaired data as a neural hidden semi-markov model. By this means, response generation learned from the small paired data can be aided by the semantic and syntactic knowledge in the large unpaired data. To balance the effect of the prior and the input message to response generation, we propose learning the whole generation model with an adversarial approach. Empirical studies on question response generation and sentiment response generation indicate that when only a few pairs are available, our model can significantly outperform several state-of-the-art response generation models in terms of both automatic and human evaluation.
Tasks
Published 2019-09-26
URL https://arxiv.org/abs/1909.11968v3
PDF https://arxiv.org/pdf/1909.11968v3.pdf
PWC https://paperswithcode.com/paper/low-resource-response-generation-with
Repo https://github.com/TobeyYang/S2S_Temp
Framework pytorch
comments powered by Disqus