July 29, 2019

2922 words 14 mins read

Paper Group AWR 103

Paper Group AWR 103

One-Shot Learning for Semantic Segmentation. RED: Reinforced Encoder-Decoder Networks for Action Anticipation. End-to-end Learning of Deterministic Decision Trees. Deep Koalarization: Image Colorization using CNNs and Inception-ResNet-v2. Detecting and Recognizing Human-Object Interactions. Interpretability Beyond Feature Attribution: Quantitative …

One-Shot Learning for Semantic Segmentation

Title One-Shot Learning for Semantic Segmentation
Authors Amirreza Shaban, Shray Bansal, Zhen Liu, Irfan Essa, Byron Boots
Abstract Low-shot learning methods for image classification support learning from sparse data. We extend these techniques to support dense semantic image segmentation. Specifically, we train a network that, given a small set of annotated images, produces parameters for a Fully Convolutional Network (FCN). We use this FCN to perform dense pixel-level prediction on a test image for the new semantic class. Our architecture shows a 25% relative meanIoU improvement compared to the best baseline methods for one-shot segmentation on unseen classes in the PASCAL VOC 2012 dataset and is at least 3 times faster.
Tasks Image Classification, One-Shot Learning, One-Shot Segmentation, Semantic Segmentation
Published 2017-09-11
URL http://arxiv.org/abs/1709.03410v1
PDF http://arxiv.org/pdf/1709.03410v1.pdf
PWC https://paperswithcode.com/paper/one-shot-learning-for-semantic-segmentation
Repo https://github.com/vamsirk/OneShotSemanticSegmentation
Framework none

RED: Reinforced Encoder-Decoder Networks for Action Anticipation

Title RED: Reinforced Encoder-Decoder Networks for Action Anticipation
Authors Jiyang Gao, Zhenheng Yang, Ram Nevatia
Abstract Action anticipation aims to detect an action before it happens. Many real world applications in robotics and surveillance are related to this predictive capability. Current methods address this problem by first anticipating visual representations of future frames and then categorizing the anticipated representations to actions. However, anticipation is based on a single past frame’s representation, which ignores the history trend. Besides, it can only anticipate a fixed future time. We propose a Reinforced Encoder-Decoder (RED) network for action anticipation. RED takes multiple history representations as input and learns to anticipate a sequence of future representations. One salient aspect of RED is that a reinforcement module is adopted to provide sequence-level supervision; the reward function is designed to encourage the system to make correct predictions as early as possible. We test RED on TVSeries, THUMOS-14 and TV-Human-Interaction datasets for action anticipation and achieve state-of-the-art performance on all datasets.
Tasks
Published 2017-07-16
URL http://arxiv.org/abs/1707.04818v1
PDF http://arxiv.org/pdf/1707.04818v1.pdf
PWC https://paperswithcode.com/paper/red-reinforced-encoder-decoder-networks-for
Repo https://github.com/rajskar/CS763Project
Framework pytorch

End-to-end Learning of Deterministic Decision Trees

Title End-to-end Learning of Deterministic Decision Trees
Authors Thomas Hehn, Fred A. Hamprecht
Abstract Conventional decision trees have a number of favorable properties, including interpretability, a small computational footprint and the ability to learn from little training data. However, they lack a key quality that has helped fuel the deep learning revolution: that of being end-to-end trainable, and to learn from scratch those features that best allow to solve a given supervised learning problem. Recent work (Kontschieder 2015) has addressed this deficit, but at the cost of losing a main attractive trait of decision trees: the fact that each sample is routed along a small subset of tree nodes only. We here propose a model and Expectation-Maximization training scheme for decision trees that are fully probabilistic at train time, but after a deterministic annealing process become deterministic at test time. We also analyze the learned oblique split parameters on image datasets and show that Neural Networks can be trained at each split node. In summary, we present the first end-to-end learning scheme for deterministic decision trees and present results on par with or superior to published standard oblique decision tree algorithms.
Tasks
Published 2017-12-07
URL http://arxiv.org/abs/1712.02743v1
PDF http://arxiv.org/pdf/1712.02743v1.pdf
PWC https://paperswithcode.com/paper/end-to-end-learning-of-deterministic-decision
Repo https://github.com/tomsal/endtoenddecisiontrees
Framework pytorch

Deep Koalarization: Image Colorization using CNNs and Inception-ResNet-v2

Title Deep Koalarization: Image Colorization using CNNs and Inception-ResNet-v2
Authors Federico Baldassarre, Diego González Morín, Lucas Rodés-Guirao
Abstract We review some of the most recent approaches to colorize gray-scale images using deep learning methods. Inspired by these, we propose a model which combines a deep Convolutional Neural Network trained from scratch with high-level features extracted from the Inception-ResNet-v2 pre-trained model. Thanks to its fully convolutional architecture, our encoder-decoder model can process images of any size and aspect ratio. Other than presenting the training results, we assess the “public acceptance” of the generated images by means of a user study. Finally, we present a carousel of applications on different types of images, such as historical photographs.
Tasks Colorization
Published 2017-12-09
URL http://arxiv.org/abs/1712.03400v1
PDF http://arxiv.org/pdf/1712.03400v1.pdf
PWC https://paperswithcode.com/paper/deep-koalarization-image-colorization-using
Repo https://github.com/sukkritsharmaofficial/Colourization-of-B-W-photos
Framework none

Detecting and Recognizing Human-Object Interactions

Title Detecting and Recognizing Human-Object Interactions
Authors Georgia Gkioxari, Ross Girshick, Piotr Dollár, Kaiming He
Abstract To understand the visual world, a machine must not only recognize individual object instances but also how they interact. Humans are often at the center of such interactions and detecting human-object interactions is an important practical and scientific problem. In this paper, we address the task of detecting <human, verb, object> triplets in challenging everyday photos. We propose a novel model that is driven by a human-centric approach. Our hypothesis is that the appearance of a person – their pose, clothing, action – is a powerful cue for localizing the objects they are interacting with. To exploit this cue, our model learns to predict an action-specific density over target object locations based on the appearance of a detected person. Our model also jointly learns to detect people and objects, and by fusing these predictions it efficiently infers interaction triplets in a clean, jointly trained end-to-end system we call InteractNet. We validate our approach on the recently introduced Verbs in COCO (V-COCO) and HICO-DET datasets, where we show quantitatively compelling results.
Tasks Human-Object Interaction Detection
Published 2017-04-24
URL http://arxiv.org/abs/1704.07333v3
PDF http://arxiv.org/pdf/1704.07333v3.pdf
PWC https://paperswithcode.com/paper/detecting-and-recognizing-human-object
Repo https://github.com/facebookresearch/detectron
Framework pytorch

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

Title Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
Authors Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres
Abstract The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net’s internal state in terms of human-friendly concepts. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use CAVs as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result–for example, how sensitive a prediction of “zebra” is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.
Tasks Image Classification
Published 2017-11-30
URL http://arxiv.org/abs/1711.11279v5
PDF http://arxiv.org/pdf/1711.11279v5.pdf
PWC https://paperswithcode.com/paper/interpretability-beyond-feature-attribution
Repo https://github.com/maragraziani/iMIMIC-RCVs
Framework tf

Deep Fisher Discriminant Learning for Mobile Hand Gesture Recognition

Title Deep Fisher Discriminant Learning for Mobile Hand Gesture Recognition
Authors Chunyu Xie, Ce Li, Baochang Zhang, Chen Chen, Jungong Han
Abstract Gesture recognition is a challenging problem in the field of biometrics. In this paper, we integrate Fisher criterion into Bidirectional Long-Short Term Memory (BLSTM) network and Bidirectional Gated Recurrent Unit (BGRU),thus leading to two new deep models termed as F-BLSTM and F-BGRU. BothFisher discriminative deep models can effectively classify the gesture based on analyzing the acceleration and angular velocity data of the human gestures. Moreover, we collect a large Mobile Gesture Database (MGD) based on the accelerations and angular velocities containing 5547 sequences of 12 gestures. Extensive experiments are conducted to validate the superior performance of the proposed networks as compared to the state-of-the-art BLSTM and BGRU on MGD database and two benchmark databases (i.e. BUAA mobile gesture and SmartWatch gesture).
Tasks Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition
Published 2017-07-12
URL http://arxiv.org/abs/1707.03692v1
PDF http://arxiv.org/pdf/1707.03692v1.pdf
PWC https://paperswithcode.com/paper/deep-fisher-discriminant-learning-for-mobile
Repo https://github.com/chriswegmann/drone_steering
Framework none

Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks

Title Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks
Authors Zhilin Yang, Ruslan Salakhutdinov, William W. Cohen
Abstract Recent papers have shown that neural networks obtain state-of-the-art performance on several different sequence tagging tasks. One appealing property of such systems is their generality, as excellent performance can be achieved with a unified architecture and without task-specific feature engineering. However, it is unclear if such systems can be used for tasks without large amounts of training data. In this paper we explore the problem of transfer learning for neural sequence taggers, where a source task with plentiful annotations (e.g., POS tagging on Penn Treebank) is used to improve performance on a target task with fewer available annotations (e.g., POS tagging for microblogs). We examine the effects of transfer learning for deep hierarchical recurrent networks across domains, applications, and languages, and show that significant improvement can often be obtained. These improvements lead to improvements over the current state-of-the-art on several well-studied tasks.
Tasks Feature Engineering, Named Entity Recognition, Part-Of-Speech Tagging, Transfer Learning
Published 2017-03-18
URL http://arxiv.org/abs/1703.06345v1
PDF http://arxiv.org/pdf/1703.06345v1.pdf
PWC https://paperswithcode.com/paper/transfer-learning-for-sequence-tagging-with
Repo https://github.com/jiesutd/NCRFpp
Framework pytorch

Quantifying the Effects of Enforcing Disentanglement on Variational Autoencoders

Title Quantifying the Effects of Enforcing Disentanglement on Variational Autoencoders
Authors Momchil Peychev, Petar Veličković, Pietro Liò
Abstract The notion of disentangled autoencoders was proposed as an extension to the variational autoencoder by introducing a disentanglement parameter $\beta$, controlling the learning pressure put on the possible underlying latent representations. For certain values of $\beta$ this kind of autoencoders is capable of encoding independent input generative factors in separate elements of the code, leading to a more interpretable and predictable model behaviour. In this paper we quantify the effects of the parameter $\beta$ on the model performance and disentanglement. After training multiple models with the same value of $\beta$, we establish the existence of consistent variance in one of the disentanglement measures, proposed in literature. The negative consequences of the disentanglement to the autoencoder’s discriminative ability are also asserted while varying the amount of examples available during training.
Tasks
Published 2017-11-24
URL http://arxiv.org/abs/1711.09159v1
PDF http://arxiv.org/pdf/1711.09159v1.pdf
PWC https://paperswithcode.com/paper/quantifying-the-effects-of-enforcing
Repo https://github.com/mpeychev/disentangled-autoencoders
Framework tf

Robust SfM with Little Image Overlap

Title Robust SfM with Little Image Overlap
Authors Yohann Salaun, Renaud Marlet, Pascal Monasse
Abstract Usual Structure-from-Motion (SfM) techniques require at least trifocal overlaps to calibrate cameras and reconstruct a scene. We consider here scenarios of reduced image sets with little overlap, possibly as low as two images at most seeing the same part of the scene. We propose a new method, based on line coplanarity hypotheses, for estimating the relative scale of two independent bifocal calibrations sharing a camera, without the need of any trifocal information or Manhattan-world assumption. We use it to compute SfM in a chain of up-to-scale relative motions. For accuracy, we however also make use of trifocal information for line and/or point features, when present, relaxing usual trifocal constraints. For robustness to wrong assumptions and mismatches, we embed all constraints in a parameterless RANSAC-like approach. Experiments show that we can calibrate datasets that previously could not, and that this wider applicability does not come at the cost of inaccuracy.
Tasks
Published 2017-03-23
URL http://arxiv.org/abs/1703.07957v2
PDF http://arxiv.org/pdf/1703.07957v2.pdf
PWC https://paperswithcode.com/paper/robust-sfm-with-little-image-overlap
Repo https://github.com/ySalaun/LineSfM
Framework none

Wasserstein GAN

Title Wasserstein GAN
Authors Martin Arjovsky, Soumith Chintala, Léon Bottou
Abstract We introduce a new algorithm named WGAN, an alternative to traditional GAN training. In this new model, we show that we can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches. Furthermore, we show that the corresponding optimization problem is sound, and provide extensive theoretical work highlighting the deep connections to other distances between distributions.
Tasks Image Generation
Published 2017-01-26
URL http://arxiv.org/abs/1701.07875v3
PDF http://arxiv.org/pdf/1701.07875v3.pdf
PWC https://paperswithcode.com/paper/wasserstein-gan
Repo https://github.com/karl-hajjar/Generative-Adversarial-Networks
Framework pytorch
Title Practical Bayesian Optimization for Model Fitting with Bayesian Adaptive Direct Search
Authors Luigi Acerbi, Wei Ji Ma
Abstract Computational models in fields such as computational neuroscience are often evaluated via stochastic simulation or numerical approximation. Fitting these models implies a difficult optimization problem over complex, possibly noisy parameter landscapes. Bayesian optimization (BO) has been successfully applied to solving expensive black-box problems in engineering and machine learning. Here we explore whether BO can be applied as a general tool for model fitting. First, we present a novel hybrid BO algorithm, Bayesian adaptive direct search (BADS), that achieves competitive performance with an affordable computational overhead for the running time of typical models. We then perform an extensive benchmark of BADS vs. many common and state-of-the-art nonconvex, derivative-free optimizers, on a set of model-fitting problems with real data and models from six studies in behavioral, cognitive, and computational neuroscience. With default settings, BADS consistently finds comparable or better solutions than other methods, including `vanilla’ BO, showing great promise for advanced BO techniques, and BADS in particular, as a general model-fitting tool. |
Tasks
Published 2017-05-11
URL http://arxiv.org/abs/1705.04405v2
PDF http://arxiv.org/pdf/1705.04405v2.pdf
PWC https://paperswithcode.com/paper/practical-bayesian-optimization-for-model
Repo https://github.com/lacerbi/bads
Framework none

Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity

Title Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity
Authors Joseph Gomes, Bharath Ramsundar, Evan N. Feinberg, Vijay S. Pande
Abstract Empirical scoring functions based on either molecular force fields or cheminformatics descriptors are widely used, in conjunction with molecular docking, during the early stages of drug discovery to predict potency and binding affinity of a drug-like molecule to a given target. These models require expert-level knowledge of physical chemistry and biology to be encoded as hand-tuned parameters or features rather than allowing the underlying model to select features in a data-driven procedure. Here, we develop a general 3-dimensional spatial convolution operation for learning atomic-level chemical interactions directly from atomic coordinates and demonstrate its application to structure-based bioactivity prediction. The atomic convolutional neural network is trained to predict the experimentally determined binding affinity of a protein-ligand complex by direct calculation of the energy associated with the complex, protein, and ligand given the crystal structure of the binding pose. Non-covalent interactions present in the complex that are absent in the protein-ligand sub-structures are identified and the model learns the interaction strength associated with these features. We test our model by predicting the binding free energy of a subset of protein-ligand complexes found in the PDBBind dataset and compare with state-of-the-art cheminformatics and machine learning-based approaches. We find that all methods achieve experimental accuracy and that atomic convolutional networks either outperform or perform competitively with the cheminformatics based methods. Unlike all previous protein-ligand prediction systems, atomic convolutional networks are end-to-end and fully-differentiable. They represent a new data-driven, physics-based deep learning model paradigm that offers a strong foundation for future improvements in structure-based bioactivity prediction.
Tasks Drug Discovery
Published 2017-03-30
URL http://arxiv.org/abs/1703.10603v1
PDF http://arxiv.org/pdf/1703.10603v1.pdf
PWC https://paperswithcode.com/paper/atomic-convolutional-networks-for-predicting
Repo https://github.com/deepchem/deepchem
Framework tf

Gradient Estimators for Implicit Models

Title Gradient Estimators for Implicit Models
Authors Yingzhen Li, Richard E. Turner
Abstract Implicit models, which allow for the generation of samples but not for point-wise evaluation of probabilities, are omnipresent in real-world problems tackled by machine learning and a hot topic of current research. Some examples include data simulators that are widely used in engineering and scientific research, generative adversarial networks (GANs) for image synthesis, and hot-off-the-press approximate inference techniques relying on implicit distributions. The majority of existing approaches to learning implicit models rely on approximating the intractable distribution or optimisation objective for gradient-based optimisation, which is liable to produce inaccurate updates and thus poor models. This paper alleviates the need for such approximations by proposing the Stein gradient estimator, which directly estimates the score function of the implicitly defined distribution. The efficacy of the proposed estimator is empirically demonstrated by examples that include meta-learning for approximate inference, and entropy regularised GANs that provide improved sample diversity.
Tasks Image Generation, Meta-Learning
Published 2017-05-19
URL http://arxiv.org/abs/1705.07107v5
PDF http://arxiv.org/pdf/1705.07107v5.pdf
PWC https://paperswithcode.com/paper/gradient-estimators-for-implicit-models
Repo https://github.com/YingzhenLi/SteinGrad
Framework tf

Hierarchical 3D fully convolutional networks for multi-organ segmentation

Title Hierarchical 3D fully convolutional networks for multi-organ segmentation
Authors Holger R. Roth, Hirohisa Oda, Yuichiro Hayashi, Masahiro Oda, Natsuki Shimizu, Michitaka Fujiwara, Kazunari Misawa, Kensaku Mori
Abstract Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of full volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of seven abdominal structures (artery, vein, liver, spleen, stomach, gallbladder, and pancreas) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training organ-specific models. To this end, we propose a two-stage, coarse-to-fine approach that trains an FCN model to roughly delineate the organs of interest in the first stage (seeing $\sim$40% of the voxels within a simple, automatically generated binary mask of the patient’s body). We then use these predictions of the first-stage FCN to define a candidate region that will be used to train a second FCN. This step reduces the number of voxels the FCN has to classify to $\sim$10% while maintaining a recall high of $>$99%. This second-stage FCN can now focus on more detailed segmentation of the organs. We respectively utilize training and validation sets consisting of 281 and 50 clinical CT images. Our hierarchical approach provides an improved Dice score of 7.5 percentage points per organ on average in our validation set. We furthermore test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans with three anatomical labels (liver, spleen, and pancreas). In such challenging organs as the pancreas, our hierarchical approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset.
Tasks
Published 2017-04-21
URL http://arxiv.org/abs/1704.06382v1
PDF http://arxiv.org/pdf/1704.06382v1.pdf
PWC https://paperswithcode.com/paper/hierarchical-3d-fully-convolutional-networks
Repo https://github.com/holgerroth/3Dunet_abdomen_cascade
Framework none
comments powered by Disqus