February 1, 2020

2800 words 14 mins read

Paper Group AWR 346

Harry Potter and the Action Prediction Challenge from Natural Language. Multi-Modal Fusion for End-to-End RGB-T Tracking. ReinBo: Machine Learning pipeline search and configuration with Bayesian Optimization embedded Reinforcement Learning. Real-world attack on MTCNN face detection system. Interpretable machine learning: definitions, methods, and a …

Harry Potter and the Action Prediction Challenge from Natural Language


Title	Harry Potter and the Action Prediction Challenge from Natural Language
Authors	David Vilares, Carlos Gómez-Rodríguez
Abstract	We explore the challenge of action prediction from textual descriptions of scenes, a testbed to approximate whether text inference can be used to predict upcoming actions. As a case of study, we consider the world of the Harry Potter fantasy novels and inferring what spell will be cast next given a fragment of a story. Spells act as keywords that abstract actions (e.g. ‘Alohomora’ to open a door) and denote a response to the environment. This idea is used to automatically build HPAC, a corpus containing 82,836 samples and 85 actions. We then evaluate different baselines. Among the tested models, an LSTM-based approach obtains the best performance for frequent actions and large scene descriptions, but approaches such as logistic regression behave well on infrequent actions.
Tasks
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11037v1
PDF	https://arxiv.org/pdf/1905.11037v1.pdf
PWC	https://paperswithcode.com/paper/harry-potter-and-the-action-prediction
Repo	https://github.com/aghie/hpac
Framework	tf


Title	Multi-Modal Fusion for End-to-End RGB-T Tracking
Authors	Lichao Zhang, Martin Danelljan, Abel Gonzalez-Garcia, Joost van de Weijer, Fahad Shahbaz Khan
Abstract	We propose an end-to-end tracking framework for fusing the RGB and TIR modalities in RGB-T tracking. Our baseline tracker is DiMP (Discriminative Model Prediction), which employs a carefully designed target prediction network trained end-to-end using a discriminative loss. We analyze the effectiveness of modality fusion in each of the main components in DiMP, i.e. feature extractor, target estimation network, and classifier. We consider several fusion mechanisms acting at different levels of the framework, including pixel-level, feature-level and response-level. Our tracker is trained in an end-to-end manner, enabling the components to learn how to fuse the information from both modalities. As data to train our model, we generate a large-scale RGB-T dataset by considering an annotated RGB tracking dataset (GOT-10k) and synthesizing paired TIR images using an image-to-image translation approach. We perform extensive experiments on VOT-RGBT2019 dataset and RGBT210 dataset, evaluating each type of modality fusing on each model component. The results show that the proposed fusion mechanisms improve the performance of the single modality counterparts. We obtain our best results when fusing at the feature-level on both the IoU-Net and the model predictor, obtaining an EAO score of 0.391 on VOT-RGBT2019 dataset. With this fusion mechanism we achieve the state-of-the-art performance on RGBT210 dataset.
Tasks	Image-to-Image Translation, Rgb-T Tracking
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11714v1
PDF	https://arxiv.org/pdf/1908.11714v1.pdf
PWC	https://paperswithcode.com/paper/multi-modal-fusion-for-end-to-end-rgb-t
Repo	https://github.com/zhanglichao/end2end_rgbt_tracking
Framework	pytorch

ReinBo: Machine Learning pipeline search and configuration with Bayesian Optimization embedded Reinforcement Learning


Title	ReinBo: Machine Learning pipeline search and configuration with Bayesian Optimization embedded Reinforcement Learning
Authors	Xudong Sun, Jiali Lin, Bernd Bischl
Abstract	Machine learning pipeline potentially consists of several stages of operations like data preprocessing, feature engineering and machine learning model training. Each operation has a set of hyper-parameters, which can become irrelevant for the pipeline when the operation is not selected. This gives rise to a hierarchical conditional hyper-parameter space. To optimize this mixed continuous and discrete conditional hierarchical hyper-parameter space, we propose an efficient pipeline search and configuration algorithm which combines the power of Reinforcement Learning and Bayesian Optimization. Empirical results show that our method performs favorably compared to state of the art methods like Auto-sklearn , TPOT, Tree Parzen Window, and Random Search.
Tasks	Feature Engineering
Published	2019-04-10
URL	http://arxiv.org/abs/1904.05381v1
PDF	http://arxiv.org/pdf/1904.05381v1.pdf
PWC	https://paperswithcode.com/paper/reinbo-machine-learning-pipeline-search-and
Repo	https://github.com/smilesun/reinbo
Framework	none

Real-world attack on MTCNN face detection system


Title	Real-world attack on MTCNN face detection system
Authors	Edgar Kaziakhmedov, Klim Kireev, Grigorii Melnikov, Mikhail Pautov, Aleksandr Petiushko
Abstract	Recent studies proved that deep learning approaches achieve remarkable results on face detection task. On the other hand, the advances gave rise to a new problem associated with the security of the deep convolutional neural network models unveiling potential risks of DCNNs based applications. Even minor input changes in the digital domain can result in the network being fooled. It was shown then that some deep learning-based face detectors are prone to adversarial attacks not only in a digital domain but also in the real world. In the paper, we investigate the security of the well-known cascade CNN face detection system - MTCNN and introduce an easily reproducible and a robust way to attack it. We propose different face attributes printed on an ordinary white and black printer and attached either to the medical face mask or to the face directly. Our approach is capable of breaking the MTCNN detector in a real-world scenario.
Tasks	Face Detection
Published	2019-10-14
URL	https://arxiv.org/abs/1910.06261v1
PDF	https://arxiv.org/pdf/1910.06261v1.pdf
PWC	https://paperswithcode.com/paper/real-world-attack-on-mtcnn-face-detection
Repo	https://github.com/edosedgar/mtcnnattack
Framework	tf

Interpretable machine learning: definitions, methods, and applications


Title	Interpretable machine learning: definitions, methods, and applications
Authors	W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, Bin Yu
Abstract	Official code for using / reproducing ACD (ICLR 2019) from the paper “Hierarchical interpretations for neural network predictions” https://arxiv.org/abs/1806.05337
Tasks	Feature Importance, Interpretable Machine Learning
Published	2019-01-14
URL	http://arxiv.org/abs/1901.04592v1
PDF	http://arxiv.org/pdf/1901.04592v1.pdf
PWC	https://paperswithcode.com/paper/interpretable-machine-learning-definitions
Repo	https://github.com/sumbose/iRF
Framework	none

DocBERT: BERT for Document Classification


Title	DocBERT: BERT for Document Classification
Authors	Ashutosh Adhikari, Achyudh Ram, Raphael Tang, Jimmy Lin
Abstract	We present, to our knowledge, the first application of BERT to document classification. A few characteristics of the task might lead one to think that BERT is not the most appropriate model: syntactic structures matter less for content categories, documents can often be longer than typical BERT input, and documents often have multiple labels. Nevertheless, we show that a straightforward classification model using BERT is able to achieve the state of the art across four popular datasets. To address the computational expense associated with BERT inference, we distill knowledge from BERT-large to small bidirectional LSTMs, reaching BERT-base parity on multiple datasets using 30x fewer parameters. The primary contribution of our paper is improved baselines that can provide the foundation for future work.
Tasks	Document Classification, Sentiment Analysis
Published	2019-04-17
URL	https://arxiv.org/abs/1904.08398v3
PDF	https://arxiv.org/pdf/1904.08398v3.pdf
PWC	https://paperswithcode.com/paper/docbert-bert-for-document-classification
Repo	https://github.com/castorini/hedwig
Framework	pytorch

Spike-based primitives for graph algorithms


Title	Spike-based primitives for graph algorithms
Authors	Kathleen E. Hamilton, Tiffany M. Mintz, Catherine D. Schuman
Abstract	In this paper we consider graph algorithms and graphical analysis as a new application for neuromorphic computing platforms. We demonstrate how the nonlinear dynamics of spiking neurons can be used to implement low-level graph operations. Our results are hardware agnostic, and we present multiple versions of routines that can utilize static synapses or require synapse plasticity.
Tasks
Published	2019-03-25
URL	http://arxiv.org/abs/1903.10574v1
PDF	http://arxiv.org/pdf/1903.10574v1.pdf
PWC	https://paperswithcode.com/paper/spike-based-primitives-for-graph-algorithms
Repo	https://github.com/abasak24/ece594Neuromorphic
Framework	none

DeepPBM: Deep Probabilistic Background Model Estimation from Video Sequences


Title	DeepPBM: Deep Probabilistic Background Model Estimation from Video Sequences
Authors	Amirreza Farnoosh, Behnaz Rezaei, Sarah Ostadabbas
Abstract	This paper presents a novel unsupervised probabilistic model estimation of visual background in video sequences using a variational autoencoder framework. Due to the redundant nature of the backgrounds in surveillance videos, visual information of the background can be compressed into a low-dimensional subspace in the encoder part of the variational autoencoder, while the highly variant information of its moving foreground gets filtered throughout its encoding-decoding process. Our deep probabilistic background model (DeepPBM) estimation approach is enabled by the power of deep neural networks in learning compressed representations of video frames and reconstructing them back to the original domain. We evaluated the performance of our DeepPBM in background subtraction on 9 surveillance videos from the background model challenge (BMC2012) dataset, and compared that with a standard subspace learning technique, robust principle component analysis (RPCA), which similarly estimates a deterministic low dimensional representation of the background in videos and is widely used for this application. Our method outperforms RPCA on BMC2012 dataset with 23% in average in F-measure score, emphasizing that background subtraction using the trained model can be done in more than 10 times faster.
Tasks
Published	2019-02-03
URL	http://arxiv.org/abs/1902.00820v1
PDF	http://arxiv.org/pdf/1902.00820v1.pdf
PWC	https://paperswithcode.com/paper/deeppbm-deep-probabilistic-background-model
Repo	https://github.com/ostadabbas/DeepPBM
Framework	pytorch

Constrained domain adaptation for segmentation


Title	Constrained domain adaptation for segmentation
Authors	Mathilde Bateson, Jose Dolz, Hoel Kervadec, Hervé Lombaert, Ismail Ben Ayed
Abstract	We propose to adapt segmentation networks with a constrained formulation, which embeds domain-invariant prior knowledge about the segmentation regions. Such knowledge may take the form of simple anatomical information, e.g., structure size or shape, estimated from source samples or known a priori. Our method imposes domain-invariant inequality constraints on the network outputs of unlabeled target samples. It implicitly matches prediction statistics between target and source domains with permitted uncertainty of prior knowledge. We address our constrained problem with a differentiable penalty, fully suited for standard stochastic gradient descent approaches, removing the need for computationally expensive Lagrangian optimization with dual projections. Unlike current two-step adversarial training, our formulation is based on a single loss in a single network, which simplifies adaptation by avoiding extra adversarial steps, while improving convergence and quality of training. The comparison of our approach with state-of-the-art adversarial methods reveals substantially better performance on the challenging task of adapting spine segmentation across different MRI modalities. Our results also show a robustness to imprecision of size priors, approaching the accuracy of a fully supervised model trained directly in a target domain.Our method can be readily used for various constraints and segmentation problems.
Tasks	Domain Adaptation
Published	2019-08-08
URL	https://arxiv.org/abs/1908.02996v1
PDF	https://arxiv.org/pdf/1908.02996v1.pdf
PWC	https://paperswithcode.com/paper/constrained-domain-adaptation-for
Repo	https://github.com/CDAMICCAI2019/CDA
Framework	pytorch

Cross-task weakly supervised learning from instructional videos


Title	Cross-task weakly supervised learning from instructional videos
Authors	Dimitri Zhukov, Jean-Baptiste Alayrac, Ramazan Gokberk Cinbis, David Fouhey, Ivan Laptev, Josef Sivic
Abstract	In this paper we investigate learning visual models for the steps of ordinary tasks using weak supervision via instructional narrations and an ordered list of steps instead of strong supervision via temporal annotations. At the heart of our approach is the observation that weakly supervised learning may be easier if a model shares components while learning different steps: `pour egg' should be trained jointly with other tasks involving` pour’ and `egg’. We formalize this in a component model for recognizing steps and a weakly supervised learning framework that can learn this model under temporal constraints from narration and the list of steps. Past data does not permit systematic studying of sharing and so we also gather a new dataset, CrossTask, aimed at assessing cross-task sharing. Our experiments demonstrate that sharing across tasks improves performance, especially when done at the component level and that our component model can parse previously unseen tasks by virtue of its compositionality. \|
Tasks
Published	2019-03-19
URL	http://arxiv.org/abs/1903.08225v2
PDF	http://arxiv.org/pdf/1903.08225v2.pdf
PWC	https://paperswithcode.com/paper/cross-task-weakly-supervised-learning-from
Repo	https://github.com/DmZhukov/CrossTask
Framework	pytorch

Deep Random Splines for Point Process Intensity Estimation of Neural Population Data


Title	Deep Random Splines for Point Process Intensity Estimation of Neural Population Data
Authors	Gabriel Loaiza-Ganem, Sean M. Perkins, Karen E. Schroeder, Mark M. Churchland, John P. Cunningham
Abstract	Gaussian processes are the leading class of distributions on random functions, but they suffer from well known issues including difficulty scaling and inflexibility with respect to certain shape constraints (such as nonnegativity). Here we propose Deep Random Splines, a flexible class of random functions obtained by transforming Gaussian noise through a deep neural network whose output are the parameters of a spline. Unlike Gaussian processes, Deep Random Splines allow us to readily enforce shape constraints while inheriting the richness and tractability of deep generative models. We also present an observational model for point process data which uses Deep Random Splines to model the intensity function of each point process and apply it to neural population data to obtain a low-dimensional representation of spiking activity. Inference is performed via a variational autoencoder that uses a novel recurrent encoder architecture that can handle multiple point processes as input. We use a newly collected dataset where a primate completes a pedaling task, and observe better dimensionality reduction with our model than with competing alternatives.
Tasks	Dimensionality Reduction, Gaussian Processes, Point Processes
Published	2019-03-06
URL	https://arxiv.org/abs/1903.02610v6
PDF	https://arxiv.org/pdf/1903.02610v6.pdf
PWC	https://paperswithcode.com/paper/deep-random-splines-for-point-process
Repo	https://github.com/gabloa/drs
Framework	tf

Mixed-curvature Variational Autoencoders


Title	Mixed-curvature Variational Autoencoders
Authors	Ondrej Skopek, Octavian-Eugen Ganea, Gary Bécigneul
Abstract	Euclidean geometry has historically been the typical “workhorse” for machine learning applications due to its power and simplicity. However, it has recently been shown that geometric spaces with constant non-zero curvature improve representations and performance on a variety of data types and downstream tasks. Consequently, generative models like Variational Autoencoders (VAEs) have been successfully generalized to elliptical and hyperbolic latent spaces. While these approaches work well on data with particular kinds of biases e.g. tree-like data for a hyperbolic VAE, there exists no generic approach unifying and leveraging all three models. We develop a Mixed-curvature Variational Autoencoder, an efficient way to train a VAE whose latent space is a product of constant curvature Riemannian manifolds, where the per-component curvature is fixed or learnable. This generalizes the Euclidean VAE to curved latent spaces and recovers it when curvatures of all latent space components go to 0.
Tasks	Latent Variable Models
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08411v2
PDF	https://arxiv.org/pdf/1911.08411v2.pdf
PWC	https://paperswithcode.com/paper/mixed-curvature-variational-autoencoders-1
Repo	https://github.com/oskopek/mvae
Framework	pytorch

MVB: A Large-Scale Dataset for Baggage Re-Identification and Merged Siamese Networks


Title	MVB: A Large-Scale Dataset for Baggage Re-Identification and Merged Siamese Networks
Authors	Zhulin Zhang, Dong Li, Jinhua Wu, Yunda Sun, Li Zhang
Abstract	In this paper, we present a novel dataset named MVB (Multi View Baggage) for baggage ReID task which has some essential differences from person ReID. The features of MVB are three-fold. First, MVB is the first publicly released large-scale dataset that contains 4519 baggage identities and 22660 annotated baggage images as well as its surface material labels. Second, all baggage images are captured by specially-designed multi-view camera system to handle pose variation and occlusion, in order to obtain the 3D information of baggage surface as complete as possible. Third, MVB has remarkable inter-class similarity and intra-class dissimilarity, considering the fact that baggage might have very similar appearance while the data is collected in two real airport environments, where imaging factors varies significantly from each other. Moreover, we proposed a merged Siamese network as baseline model and evaluated its performance. Experiments and case study are conducted on MVB.
Tasks
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11366v1
PDF	https://arxiv.org/pdf/1907.11366v1.pdf
PWC	https://paperswithcode.com/paper/mvb-a-large-scale-dataset-for-baggage-re
Repo	https://github.com/qq326823564/LSDNN
Framework	pytorch

Importance Weighted Hierarchical Variational Inference


Title	Importance Weighted Hierarchical Variational Inference
Authors	Artem Sobolev, Dmitry Vetrov
Abstract	Variational Inference is a powerful tool in the Bayesian modeling toolkit, however, its effectiveness is determined by the expressivity of the utilized variational distributions in terms of their ability to match the true posterior distribution. In turn, the expressivity of the variational family is largely limited by the requirement of having a tractable density function. To overcome this roadblock, we introduce a new family of variational upper bounds on a marginal log density in the case of hierarchical models (also known as latent variable models). We then give an upper bound on the Kullback-Leibler divergence and derive a family of increasingly tighter variational lower bounds on the otherwise intractable standard evidence lower bound for hierarchical variational distributions, enabling the use of more expressive approximate posteriors. We show that previously known methods, such as Hierarchical Variational Models, Semi-Implicit Variational Inference and Doubly Semi-Implicit Variational Inference can be seen as special cases of the proposed approach, and empirically demonstrate superior performance of the proposed method in a set of experiments.
Tasks	Latent Variable Models
Published	2019-05-08
URL	https://arxiv.org/abs/1905.03290v1
PDF	https://arxiv.org/pdf/1905.03290v1.pdf
PWC	https://paperswithcode.com/paper/190503290
Repo	https://github.com/artsobolev/IWHVI
Framework	tf

Low-Resource Response Generation with Template Prior


Title	Low-Resource Response Generation with Template Prior
Authors	Ze Yang, Wei Wu, Jian Yang, Can Xu, Zhoujun Li
Abstract	We study open domain response generation with limited message-response pairs. The problem exists in real-world applications but is less explored by the existing work. Since the paired data now is no longer enough to train a neural generation model, we consider leveraging the large scale of unpaired data that are much easier to obtain, and propose response generation with both paired and unpaired data. The generation model is defined by an encoder-decoder architecture with templates as prior, where the templates are estimated from the unpaired data as a neural hidden semi-markov model. By this means, response generation learned from the small paired data can be aided by the semantic and syntactic knowledge in the large unpaired data. To balance the effect of the prior and the input message to response generation, we propose learning the whole generation model with an adversarial approach. Empirical studies on question response generation and sentiment response generation indicate that when only a few pairs are available, our model can significantly outperform several state-of-the-art response generation models in terms of both automatic and human evaluation.
Tasks
Published	2019-09-26
URL	https://arxiv.org/abs/1909.11968v3
PDF	https://arxiv.org/pdf/1909.11968v3.pdf
PWC	https://paperswithcode.com/paper/low-resource-response-generation-with
Repo	https://github.com/TobeyYang/S2S_Temp
Framework	pytorch