Paper Group AWR 346
Harry Potter and the Action Prediction Challenge from Natural Language. Multi-Modal Fusion for End-to-End RGB-T Tracking. ReinBo: Machine Learning pipeline search and configuration with Bayesian Optimization embedded Reinforcement Learning. Real-world attack on MTCNN face detection system. Interpretable machine learning: definitions, methods, and a …
Harry Potter and the Action Prediction Challenge from Natural Language
Title | Harry Potter and the Action Prediction Challenge from Natural Language |
Authors | David Vilares, Carlos Gómez-Rodríguez |
Abstract | We explore the challenge of action prediction from textual descriptions of scenes, a testbed to approximate whether text inference can be used to predict upcoming actions. As a case of study, we consider the world of the Harry Potter fantasy novels and inferring what spell will be cast next given a fragment of a story. Spells act as keywords that abstract actions (e.g. ‘Alohomora’ to open a door) and denote a response to the environment. This idea is used to automatically build HPAC, a corpus containing 82,836 samples and 85 actions. We then evaluate different baselines. Among the tested models, an LSTM-based approach obtains the best performance for frequent actions and large scene descriptions, but approaches such as logistic regression behave well on infrequent actions. |
Tasks | |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.11037v1 |
https://arxiv.org/pdf/1905.11037v1.pdf | |
PWC | https://paperswithcode.com/paper/harry-potter-and-the-action-prediction |
Repo | https://github.com/aghie/hpac |
Framework | tf |
Multi-Modal Fusion for End-to-End RGB-T Tracking
Title | Multi-Modal Fusion for End-to-End RGB-T Tracking |
Authors | Lichao Zhang, Martin Danelljan, Abel Gonzalez-Garcia, Joost van de Weijer, Fahad Shahbaz Khan |
Abstract | We propose an end-to-end tracking framework for fusing the RGB and TIR modalities in RGB-T tracking. Our baseline tracker is DiMP (Discriminative Model Prediction), which employs a carefully designed target prediction network trained end-to-end using a discriminative loss. We analyze the effectiveness of modality fusion in each of the main components in DiMP, i.e. feature extractor, target estimation network, and classifier. We consider several fusion mechanisms acting at different levels of the framework, including pixel-level, feature-level and response-level. Our tracker is trained in an end-to-end manner, enabling the components to learn how to fuse the information from both modalities. As data to train our model, we generate a large-scale RGB-T dataset by considering an annotated RGB tracking dataset (GOT-10k) and synthesizing paired TIR images using an image-to-image translation approach. We perform extensive experiments on VOT-RGBT2019 dataset and RGBT210 dataset, evaluating each type of modality fusing on each model component. The results show that the proposed fusion mechanisms improve the performance of the single modality counterparts. We obtain our best results when fusing at the feature-level on both the IoU-Net and the model predictor, obtaining an EAO score of 0.391 on VOT-RGBT2019 dataset. With this fusion mechanism we achieve the state-of-the-art performance on RGBT210 dataset. |
Tasks | Image-to-Image Translation, Rgb-T Tracking |
Published | 2019-08-30 |
URL | https://arxiv.org/abs/1908.11714v1 |
https://arxiv.org/pdf/1908.11714v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-modal-fusion-for-end-to-end-rgb-t |
Repo | https://github.com/zhanglichao/end2end_rgbt_tracking |
Framework | pytorch |
ReinBo: Machine Learning pipeline search and configuration with Bayesian Optimization embedded Reinforcement Learning
Title | ReinBo: Machine Learning pipeline search and configuration with Bayesian Optimization embedded Reinforcement Learning |
Authors | Xudong Sun, Jiali Lin, Bernd Bischl |
Abstract | Machine learning pipeline potentially consists of several stages of operations like data preprocessing, feature engineering and machine learning model training. Each operation has a set of hyper-parameters, which can become irrelevant for the pipeline when the operation is not selected. This gives rise to a hierarchical conditional hyper-parameter space. To optimize this mixed continuous and discrete conditional hierarchical hyper-parameter space, we propose an efficient pipeline search and configuration algorithm which combines the power of Reinforcement Learning and Bayesian Optimization. Empirical results show that our method performs favorably compared to state of the art methods like Auto-sklearn , TPOT, Tree Parzen Window, and Random Search. |
Tasks | Feature Engineering |
Published | 2019-04-10 |
URL | http://arxiv.org/abs/1904.05381v1 |
http://arxiv.org/pdf/1904.05381v1.pdf | |
PWC | https://paperswithcode.com/paper/reinbo-machine-learning-pipeline-search-and |
Repo | https://github.com/smilesun/reinbo |
Framework | none |
Real-world attack on MTCNN face detection system
Title | Real-world attack on MTCNN face detection system |
Authors | Edgar Kaziakhmedov, Klim Kireev, Grigorii Melnikov, Mikhail Pautov, Aleksandr Petiushko |
Abstract | Recent studies proved that deep learning approaches achieve remarkable results on face detection task. On the other hand, the advances gave rise to a new problem associated with the security of the deep convolutional neural network models unveiling potential risks of DCNNs based applications. Even minor input changes in the digital domain can result in the network being fooled. It was shown then that some deep learning-based face detectors are prone to adversarial attacks not only in a digital domain but also in the real world. In the paper, we investigate the security of the well-known cascade CNN face detection system - MTCNN and introduce an easily reproducible and a robust way to attack it. We propose different face attributes printed on an ordinary white and black printer and attached either to the medical face mask or to the face directly. Our approach is capable of breaking the MTCNN detector in a real-world scenario. |
Tasks | Face Detection |
Published | 2019-10-14 |
URL | https://arxiv.org/abs/1910.06261v1 |
https://arxiv.org/pdf/1910.06261v1.pdf | |
PWC | https://paperswithcode.com/paper/real-world-attack-on-mtcnn-face-detection |
Repo | https://github.com/edosedgar/mtcnnattack |
Framework | tf |
Interpretable machine learning: definitions, methods, and applications
Title | Interpretable machine learning: definitions, methods, and applications |
Authors | W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, Bin Yu |
Abstract | Official code for using / reproducing ACD (ICLR 2019) from the paper “Hierarchical interpretations for neural network predictions” https://arxiv.org/abs/1806.05337 |
Tasks | Feature Importance, Interpretable Machine Learning |
Published | 2019-01-14 |
URL | http://arxiv.org/abs/1901.04592v1 |
http://arxiv.org/pdf/1901.04592v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-machine-learning-definitions |
Repo | https://github.com/sumbose/iRF |
Framework | none |
DocBERT: BERT for Document Classification
Title | DocBERT: BERT for Document Classification |
Authors | Ashutosh Adhikari, Achyudh Ram, Raphael Tang, Jimmy Lin |
Abstract | We present, to our knowledge, the first application of BERT to document classification. A few characteristics of the task might lead one to think that BERT is not the most appropriate model: syntactic structures matter less for content categories, documents can often be longer than typical BERT input, and documents often have multiple labels. Nevertheless, we show that a straightforward classification model using BERT is able to achieve the state of the art across four popular datasets. To address the computational expense associated with BERT inference, we distill knowledge from BERT-large to small bidirectional LSTMs, reaching BERT-base parity on multiple datasets using 30x fewer parameters. The primary contribution of our paper is improved baselines that can provide the foundation for future work. |
Tasks | Document Classification, Sentiment Analysis |
Published | 2019-04-17 |
URL | https://arxiv.org/abs/1904.08398v3 |
https://arxiv.org/pdf/1904.08398v3.pdf | |
PWC | https://paperswithcode.com/paper/docbert-bert-for-document-classification |
Repo | https://github.com/castorini/hedwig |
Framework | pytorch |
Spike-based primitives for graph algorithms
Title | Spike-based primitives for graph algorithms |
Authors | Kathleen E. Hamilton, Tiffany M. Mintz, Catherine D. Schuman |
Abstract | In this paper we consider graph algorithms and graphical analysis as a new application for neuromorphic computing platforms. We demonstrate how the nonlinear dynamics of spiking neurons can be used to implement low-level graph operations. Our results are hardware agnostic, and we present multiple versions of routines that can utilize static synapses or require synapse plasticity. |
Tasks | |
Published | 2019-03-25 |
URL | http://arxiv.org/abs/1903.10574v1 |
http://arxiv.org/pdf/1903.10574v1.pdf | |
PWC | https://paperswithcode.com/paper/spike-based-primitives-for-graph-algorithms |
Repo | https://github.com/abasak24/ece594Neuromorphic |
Framework | none |
DeepPBM: Deep Probabilistic Background Model Estimation from Video Sequences
Title | DeepPBM: Deep Probabilistic Background Model Estimation from Video Sequences |
Authors | Amirreza Farnoosh, Behnaz Rezaei, Sarah Ostadabbas |
Abstract | This paper presents a novel unsupervised probabilistic model estimation of visual background in video sequences using a variational autoencoder framework. Due to the redundant nature of the backgrounds in surveillance videos, visual information of the background can be compressed into a low-dimensional subspace in the encoder part of the variational autoencoder, while the highly variant information of its moving foreground gets filtered throughout its encoding-decoding process. Our deep probabilistic background model (DeepPBM) estimation approach is enabled by the power of deep neural networks in learning compressed representations of video frames and reconstructing them back to the original domain. We evaluated the performance of our DeepPBM in background subtraction on 9 surveillance videos from the background model challenge (BMC2012) dataset, and compared that with a standard subspace learning technique, robust principle component analysis (RPCA), which similarly estimates a deterministic low dimensional representation of the background in videos and is widely used for this application. Our method outperforms RPCA on BMC2012 dataset with 23% in average in F-measure score, emphasizing that background subtraction using the trained model can be done in more than 10 times faster. |
Tasks | |
Published | 2019-02-03 |
URL | http://arxiv.org/abs/1902.00820v1 |
http://arxiv.org/pdf/1902.00820v1.pdf | |
PWC | https://paperswithcode.com/paper/deeppbm-deep-probabilistic-background-model |
Repo | https://github.com/ostadabbas/DeepPBM |
Framework | pytorch |
Constrained domain adaptation for segmentation
Title | Constrained domain adaptation for segmentation |
Authors | Mathilde Bateson, Jose Dolz, Hoel Kervadec, Hervé Lombaert, Ismail Ben Ayed |
Abstract | We propose to adapt segmentation networks with a constrained formulation, which embeds domain-invariant prior knowledge about the segmentation regions. Such knowledge may take the form of simple anatomical information, e.g., structure size or shape, estimated from source samples or known a priori. Our method imposes domain-invariant inequality constraints on the network outputs of unlabeled target samples. It implicitly matches prediction statistics between target and source domains with permitted uncertainty of prior knowledge. We address our constrained problem with a differentiable penalty, fully suited for standard stochastic gradient descent approaches, removing the need for computationally expensive Lagrangian optimization with dual projections. Unlike current two-step adversarial training, our formulation is based on a single loss in a single network, which simplifies adaptation by avoiding extra adversarial steps, while improving convergence and quality of training. The comparison of our approach with state-of-the-art adversarial methods reveals substantially better performance on the challenging task of adapting spine segmentation across different MRI modalities. Our results also show a robustness to imprecision of size priors, approaching the accuracy of a fully supervised model trained directly in a target domain.Our method can be readily used for various constraints and segmentation problems. |
Tasks | Domain Adaptation |
Published | 2019-08-08 |
URL | https://arxiv.org/abs/1908.02996v1 |
https://arxiv.org/pdf/1908.02996v1.pdf | |
PWC | https://paperswithcode.com/paper/constrained-domain-adaptation-for |
Repo | https://github.com/CDAMICCAI2019/CDA |
Framework | pytorch |
Cross-task weakly supervised learning from instructional videos
Title | Cross-task weakly supervised learning from instructional videos |
Authors | Dimitri Zhukov, Jean-Baptiste Alayrac, Ramazan Gokberk Cinbis, David Fouhey, Ivan Laptev, Josef Sivic |
Abstract | In this paper we investigate learning visual models for the steps of ordinary tasks using weak supervision via instructional narrations and an ordered list of steps instead of strong supervision via temporal annotations. At the heart of our approach is the observation that weakly supervised learning may be easier if a model shares components while learning different steps: pour egg' should be trained jointly with other tasks involving pour’ and `egg’. We formalize this in a component model for recognizing steps and a weakly supervised learning framework that can learn this model under temporal constraints from narration and the list of steps. Past data does not permit systematic studying of sharing and so we also gather a new dataset, CrossTask, aimed at assessing cross-task sharing. Our experiments demonstrate that sharing across tasks improves performance, especially when done at the component level and that our component model can parse previously unseen tasks by virtue of its compositionality. | |
Tasks | |
Published | 2019-03-19 |
URL | http://arxiv.org/abs/1903.08225v2 |
http://arxiv.org/pdf/1903.08225v2.pdf | |
PWC | https://paperswithcode.com/paper/cross-task-weakly-supervised-learning-from |
Repo | https://github.com/DmZhukov/CrossTask |
Framework | pytorch |
Deep Random Splines for Point Process Intensity Estimation of Neural Population Data
Title | Deep Random Splines for Point Process Intensity Estimation of Neural Population Data |
Authors | Gabriel Loaiza-Ganem, Sean M. Perkins, Karen E. Schroeder, Mark M. Churchland, John P. Cunningham |
Abstract | Gaussian processes are the leading class of distributions on random functions, but they suffer from well known issues including difficulty scaling and inflexibility with respect to certain shape constraints (such as nonnegativity). Here we propose Deep Random Splines, a flexible class of random functions obtained by transforming Gaussian noise through a deep neural network whose output are the parameters of a spline. Unlike Gaussian processes, Deep Random Splines allow us to readily enforce shape constraints while inheriting the richness and tractability of deep generative models. We also present an observational model for point process data which uses Deep Random Splines to model the intensity function of each point process and apply it to neural population data to obtain a low-dimensional representation of spiking activity. Inference is performed via a variational autoencoder that uses a novel recurrent encoder architecture that can handle multiple point processes as input. We use a newly collected dataset where a primate completes a pedaling task, and observe better dimensionality reduction with our model than with competing alternatives. |
Tasks | Dimensionality Reduction, Gaussian Processes, Point Processes |
Published | 2019-03-06 |
URL | https://arxiv.org/abs/1903.02610v6 |
https://arxiv.org/pdf/1903.02610v6.pdf | |
PWC | https://paperswithcode.com/paper/deep-random-splines-for-point-process |
Repo | https://github.com/gabloa/drs |
Framework | tf |
Mixed-curvature Variational Autoencoders
Title | Mixed-curvature Variational Autoencoders |
Authors | Ondrej Skopek, Octavian-Eugen Ganea, Gary Bécigneul |
Abstract | Euclidean geometry has historically been the typical “workhorse” for machine learning applications due to its power and simplicity. However, it has recently been shown that geometric spaces with constant non-zero curvature improve representations and performance on a variety of data types and downstream tasks. Consequently, generative models like Variational Autoencoders (VAEs) have been successfully generalized to elliptical and hyperbolic latent spaces. While these approaches work well on data with particular kinds of biases e.g. tree-like data for a hyperbolic VAE, there exists no generic approach unifying and leveraging all three models. We develop a Mixed-curvature Variational Autoencoder, an efficient way to train a VAE whose latent space is a product of constant curvature Riemannian manifolds, where the per-component curvature is fixed or learnable. This generalizes the Euclidean VAE to curved latent spaces and recovers it when curvatures of all latent space components go to 0. |
Tasks | Latent Variable Models |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08411v2 |
https://arxiv.org/pdf/1911.08411v2.pdf | |
PWC | https://paperswithcode.com/paper/mixed-curvature-variational-autoencoders-1 |
Repo | https://github.com/oskopek/mvae |
Framework | pytorch |
MVB: A Large-Scale Dataset for Baggage Re-Identification and Merged Siamese Networks
Title | MVB: A Large-Scale Dataset for Baggage Re-Identification and Merged Siamese Networks |
Authors | Zhulin Zhang, Dong Li, Jinhua Wu, Yunda Sun, Li Zhang |
Abstract | In this paper, we present a novel dataset named MVB (Multi View Baggage) for baggage ReID task which has some essential differences from person ReID. The features of MVB are three-fold. First, MVB is the first publicly released large-scale dataset that contains 4519 baggage identities and 22660 annotated baggage images as well as its surface material labels. Second, all baggage images are captured by specially-designed multi-view camera system to handle pose variation and occlusion, in order to obtain the 3D information of baggage surface as complete as possible. Third, MVB has remarkable inter-class similarity and intra-class dissimilarity, considering the fact that baggage might have very similar appearance while the data is collected in two real airport environments, where imaging factors varies significantly from each other. Moreover, we proposed a merged Siamese network as baseline model and evaluated its performance. Experiments and case study are conducted on MVB. |
Tasks | |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11366v1 |
https://arxiv.org/pdf/1907.11366v1.pdf | |
PWC | https://paperswithcode.com/paper/mvb-a-large-scale-dataset-for-baggage-re |
Repo | https://github.com/qq326823564/LSDNN |
Framework | pytorch |
Importance Weighted Hierarchical Variational Inference
Title | Importance Weighted Hierarchical Variational Inference |
Authors | Artem Sobolev, Dmitry Vetrov |
Abstract | Variational Inference is a powerful tool in the Bayesian modeling toolkit, however, its effectiveness is determined by the expressivity of the utilized variational distributions in terms of their ability to match the true posterior distribution. In turn, the expressivity of the variational family is largely limited by the requirement of having a tractable density function. To overcome this roadblock, we introduce a new family of variational upper bounds on a marginal log density in the case of hierarchical models (also known as latent variable models). We then give an upper bound on the Kullback-Leibler divergence and derive a family of increasingly tighter variational lower bounds on the otherwise intractable standard evidence lower bound for hierarchical variational distributions, enabling the use of more expressive approximate posteriors. We show that previously known methods, such as Hierarchical Variational Models, Semi-Implicit Variational Inference and Doubly Semi-Implicit Variational Inference can be seen as special cases of the proposed approach, and empirically demonstrate superior performance of the proposed method in a set of experiments. |
Tasks | Latent Variable Models |
Published | 2019-05-08 |
URL | https://arxiv.org/abs/1905.03290v1 |
https://arxiv.org/pdf/1905.03290v1.pdf | |
PWC | https://paperswithcode.com/paper/190503290 |
Repo | https://github.com/artsobolev/IWHVI |
Framework | tf |
Low-Resource Response Generation with Template Prior
Title | Low-Resource Response Generation with Template Prior |
Authors | Ze Yang, Wei Wu, Jian Yang, Can Xu, Zhoujun Li |
Abstract | We study open domain response generation with limited message-response pairs. The problem exists in real-world applications but is less explored by the existing work. Since the paired data now is no longer enough to train a neural generation model, we consider leveraging the large scale of unpaired data that are much easier to obtain, and propose response generation with both paired and unpaired data. The generation model is defined by an encoder-decoder architecture with templates as prior, where the templates are estimated from the unpaired data as a neural hidden semi-markov model. By this means, response generation learned from the small paired data can be aided by the semantic and syntactic knowledge in the large unpaired data. To balance the effect of the prior and the input message to response generation, we propose learning the whole generation model with an adversarial approach. Empirical studies on question response generation and sentiment response generation indicate that when only a few pairs are available, our model can significantly outperform several state-of-the-art response generation models in terms of both automatic and human evaluation. |
Tasks | |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.11968v3 |
https://arxiv.org/pdf/1909.11968v3.pdf | |
PWC | https://paperswithcode.com/paper/low-resource-response-generation-with |
Repo | https://github.com/TobeyYang/S2S_Temp |
Framework | pytorch |