July 29, 2019

2922 words 14 mins read

Paper Group AWR 103

One-Shot Learning for Semantic Segmentation. RED: Reinforced Encoder-Decoder Networks for Action Anticipation. End-to-end Learning of Deterministic Decision Trees. Deep Koalarization: Image Colorization using CNNs and Inception-ResNet-v2. Detecting and Recognizing Human-Object Interactions. Interpretability Beyond Feature Attribution: Quantitative …

One-Shot Learning for Semantic Segmentation


Title	One-Shot Learning for Semantic Segmentation
Authors	Amirreza Shaban, Shray Bansal, Zhen Liu, Irfan Essa, Byron Boots
Abstract	Low-shot learning methods for image classification support learning from sparse data. We extend these techniques to support dense semantic image segmentation. Specifically, we train a network that, given a small set of annotated images, produces parameters for a Fully Convolutional Network (FCN). We use this FCN to perform dense pixel-level prediction on a test image for the new semantic class. Our architecture shows a 25% relative meanIoU improvement compared to the best baseline methods for one-shot segmentation on unseen classes in the PASCAL VOC 2012 dataset and is at least 3 times faster.
Tasks	Image Classification, One-Shot Learning, One-Shot Segmentation, Semantic Segmentation
Published	2017-09-11
URL	http://arxiv.org/abs/1709.03410v1
PDF	http://arxiv.org/pdf/1709.03410v1.pdf
PWC	https://paperswithcode.com/paper/one-shot-learning-for-semantic-segmentation
Repo	https://github.com/vamsirk/OneShotSemanticSegmentation
Framework	none

RED: Reinforced Encoder-Decoder Networks for Action Anticipation


Title	RED: Reinforced Encoder-Decoder Networks for Action Anticipation
Authors	Jiyang Gao, Zhenheng Yang, Ram Nevatia
Abstract	Action anticipation aims to detect an action before it happens. Many real world applications in robotics and surveillance are related to this predictive capability. Current methods address this problem by first anticipating visual representations of future frames and then categorizing the anticipated representations to actions. However, anticipation is based on a single past frame’s representation, which ignores the history trend. Besides, it can only anticipate a fixed future time. We propose a Reinforced Encoder-Decoder (RED) network for action anticipation. RED takes multiple history representations as input and learns to anticipate a sequence of future representations. One salient aspect of RED is that a reinforcement module is adopted to provide sequence-level supervision; the reward function is designed to encourage the system to make correct predictions as early as possible. We test RED on TVSeries, THUMOS-14 and TV-Human-Interaction datasets for action anticipation and achieve state-of-the-art performance on all datasets.
Tasks
Published	2017-07-16
URL	http://arxiv.org/abs/1707.04818v1
PDF	http://arxiv.org/pdf/1707.04818v1.pdf
PWC	https://paperswithcode.com/paper/red-reinforced-encoder-decoder-networks-for
Repo	https://github.com/rajskar/CS763Project
Framework	pytorch

End-to-end Learning of Deterministic Decision Trees


Title	End-to-end Learning of Deterministic Decision Trees
Authors	Thomas Hehn, Fred A. Hamprecht
Abstract	Conventional decision trees have a number of favorable properties, including interpretability, a small computational footprint and the ability to learn from little training data. However, they lack a key quality that has helped fuel the deep learning revolution: that of being end-to-end trainable, and to learn from scratch those features that best allow to solve a given supervised learning problem. Recent work (Kontschieder 2015) has addressed this deficit, but at the cost of losing a main attractive trait of decision trees: the fact that each sample is routed along a small subset of tree nodes only. We here propose a model and Expectation-Maximization training scheme for decision trees that are fully probabilistic at train time, but after a deterministic annealing process become deterministic at test time. We also analyze the learned oblique split parameters on image datasets and show that Neural Networks can be trained at each split node. In summary, we present the first end-to-end learning scheme for deterministic decision trees and present results on par with or superior to published standard oblique decision tree algorithms.
Tasks
Published	2017-12-07
URL	http://arxiv.org/abs/1712.02743v1
PDF	http://arxiv.org/pdf/1712.02743v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-learning-of-deterministic-decision
Repo	https://github.com/tomsal/endtoenddecisiontrees
Framework	pytorch

Deep Koalarization: Image Colorization using CNNs and Inception-ResNet-v2


Title	Deep Koalarization: Image Colorization using CNNs and Inception-ResNet-v2
Authors	Federico Baldassarre, Diego González Morín, Lucas Rodés-Guirao
Abstract	We review some of the most recent approaches to colorize gray-scale images using deep learning methods. Inspired by these, we propose a model which combines a deep Convolutional Neural Network trained from scratch with high-level features extracted from the Inception-ResNet-v2 pre-trained model. Thanks to its fully convolutional architecture, our encoder-decoder model can process images of any size and aspect ratio. Other than presenting the training results, we assess the “public acceptance” of the generated images by means of a user study. Finally, we present a carousel of applications on different types of images, such as historical photographs.
Tasks	Colorization
Published	2017-12-09
URL	http://arxiv.org/abs/1712.03400v1
PDF	http://arxiv.org/pdf/1712.03400v1.pdf
PWC	https://paperswithcode.com/paper/deep-koalarization-image-colorization-using
Repo	https://github.com/sukkritsharmaofficial/Colourization-of-B-W-photos
Framework	none

Detecting and Recognizing Human-Object Interactions


Title	Detecting and Recognizing Human-Object Interactions
Authors	Georgia Gkioxari, Ross Girshick, Piotr Dollár, Kaiming He
Abstract	To understand the visual world, a machine must not only recognize individual object instances but also how they interact. Humans are often at the center of such interactions and detecting human-object interactions is an important practical and scientific problem. In this paper, we address the task of detecting <human, verb, object> triplets in challenging everyday photos. We propose a novel model that is driven by a human-centric approach. Our hypothesis is that the appearance of a person – their pose, clothing, action – is a powerful cue for localizing the objects they are interacting with. To exploit this cue, our model learns to predict an action-specific density over target object locations based on the appearance of a detected person. Our model also jointly learns to detect people and objects, and by fusing these predictions it efficiently infers interaction triplets in a clean, jointly trained end-to-end system we call InteractNet. We validate our approach on the recently introduced Verbs in COCO (V-COCO) and HICO-DET datasets, where we show quantitatively compelling results.
Tasks	Human-Object Interaction Detection
Published	2017-04-24
URL	http://arxiv.org/abs/1704.07333v3
PDF	http://arxiv.org/pdf/1704.07333v3.pdf
PWC	https://paperswithcode.com/paper/detecting-and-recognizing-human-object
Repo	https://github.com/facebookresearch/detectron
Framework	pytorch

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)


Title	Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
Authors	Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres
Abstract	The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net’s internal state in terms of human-friendly concepts. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use CAVs as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result–for example, how sensitive a prediction of “zebra” is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.
Tasks	Image Classification
Published	2017-11-30
URL	http://arxiv.org/abs/1711.11279v5
PDF	http://arxiv.org/pdf/1711.11279v5.pdf
PWC	https://paperswithcode.com/paper/interpretability-beyond-feature-attribution
Repo	https://github.com/maragraziani/iMIMIC-RCVs
Framework	tf

Deep Fisher Discriminant Learning for Mobile Hand Gesture Recognition


Title	Deep Fisher Discriminant Learning for Mobile Hand Gesture Recognition
Authors	Chunyu Xie, Ce Li, Baochang Zhang, Chen Chen, Jungong Han
Abstract	Gesture recognition is a challenging problem in the field of biometrics. In this paper, we integrate Fisher criterion into Bidirectional Long-Short Term Memory (BLSTM) network and Bidirectional Gated Recurrent Unit (BGRU),thus leading to two new deep models termed as F-BLSTM and F-BGRU. BothFisher discriminative deep models can effectively classify the gesture based on analyzing the acceleration and angular velocity data of the human gestures. Moreover, we collect a large Mobile Gesture Database (MGD) based on the accelerations and angular velocities containing 5547 sequences of 12 gestures. Extensive experiments are conducted to validate the superior performance of the proposed networks as compared to the state-of-the-art BLSTM and BGRU on MGD database and two benchmark databases (i.e. BUAA mobile gesture and SmartWatch gesture).
Tasks	Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition
Published	2017-07-12
URL	http://arxiv.org/abs/1707.03692v1
PDF	http://arxiv.org/pdf/1707.03692v1.pdf
PWC	https://paperswithcode.com/paper/deep-fisher-discriminant-learning-for-mobile
Repo	https://github.com/chriswegmann/drone_steering
Framework	none

Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks


Title	Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks
Authors	Zhilin Yang, Ruslan Salakhutdinov, William W. Cohen
Abstract	Recent papers have shown that neural networks obtain state-of-the-art performance on several different sequence tagging tasks. One appealing property of such systems is their generality, as excellent performance can be achieved with a unified architecture and without task-specific feature engineering. However, it is unclear if such systems can be used for tasks without large amounts of training data. In this paper we explore the problem of transfer learning for neural sequence taggers, where a source task with plentiful annotations (e.g., POS tagging on Penn Treebank) is used to improve performance on a target task with fewer available annotations (e.g., POS tagging for microblogs). We examine the effects of transfer learning for deep hierarchical recurrent networks across domains, applications, and languages, and show that significant improvement can often be obtained. These improvements lead to improvements over the current state-of-the-art on several well-studied tasks.
Tasks	Feature Engineering, Named Entity Recognition, Part-Of-Speech Tagging, Transfer Learning
Published	2017-03-18
URL	http://arxiv.org/abs/1703.06345v1
PDF	http://arxiv.org/pdf/1703.06345v1.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-for-sequence-tagging-with
Repo	https://github.com/jiesutd/NCRFpp
Framework	pytorch

Quantifying the Effects of Enforcing Disentanglement on Variational Autoencoders


Title	Quantifying the Effects of Enforcing Disentanglement on Variational Autoencoders
Authors	Momchil Peychev, Petar Veličković, Pietro Liò
Abstract	The notion of disentangled autoencoders was proposed as an extension to the variational autoencoder by introducing a disentanglement parameter $\beta$, controlling the learning pressure put on the possible underlying latent representations. For certain values of $\beta$ this kind of autoencoders is capable of encoding independent input generative factors in separate elements of the code, leading to a more interpretable and predictable model behaviour. In this paper we quantify the effects of the parameter $\beta$ on the model performance and disentanglement. After training multiple models with the same value of $\beta$, we establish the existence of consistent variance in one of the disentanglement measures, proposed in literature. The negative consequences of the disentanglement to the autoencoder’s discriminative ability are also asserted while varying the amount of examples available during training.
Tasks
Published	2017-11-24
URL	http://arxiv.org/abs/1711.09159v1
PDF	http://arxiv.org/pdf/1711.09159v1.pdf
PWC	https://paperswithcode.com/paper/quantifying-the-effects-of-enforcing
Repo	https://github.com/mpeychev/disentangled-autoencoders
Framework	tf

Robust SfM with Little Image Overlap


Title	Robust SfM with Little Image Overlap
Authors	Yohann Salaun, Renaud Marlet, Pascal Monasse
Abstract	Usual Structure-from-Motion (SfM) techniques require at least trifocal overlaps to calibrate cameras and reconstruct a scene. We consider here scenarios of reduced image sets with little overlap, possibly as low as two images at most seeing the same part of the scene. We propose a new method, based on line coplanarity hypotheses, for estimating the relative scale of two independent bifocal calibrations sharing a camera, without the need of any trifocal information or Manhattan-world assumption. We use it to compute SfM in a chain of up-to-scale relative motions. For accuracy, we however also make use of trifocal information for line and/or point features, when present, relaxing usual trifocal constraints. For robustness to wrong assumptions and mismatches, we embed all constraints in a parameterless RANSAC-like approach. Experiments show that we can calibrate datasets that previously could not, and that this wider applicability does not come at the cost of inaccuracy.
Tasks
Published	2017-03-23
URL	http://arxiv.org/abs/1703.07957v2
PDF	http://arxiv.org/pdf/1703.07957v2.pdf
PWC	https://paperswithcode.com/paper/robust-sfm-with-little-image-overlap
Repo	https://github.com/ySalaun/LineSfM
Framework	none

Wasserstein GAN


Title	Wasserstein GAN
Authors	Martin Arjovsky, Soumith Chintala, Léon Bottou
Abstract	We introduce a new algorithm named WGAN, an alternative to traditional GAN training. In this new model, we show that we can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches. Furthermore, we show that the corresponding optimization problem is sound, and provide extensive theoretical work highlighting the deep connections to other distances between distributions.
Tasks	Image Generation
Published	2017-01-26
URL	http://arxiv.org/abs/1701.07875v3
PDF	http://arxiv.org/pdf/1701.07875v3.pdf
PWC	https://paperswithcode.com/paper/wasserstein-gan
Repo	https://github.com/karl-hajjar/Generative-Adversarial-Networks
Framework	pytorch

Practical Bayesian Optimization for Model Fitting with Bayesian Adaptive Direct Search


Title	Practical Bayesian Optimization for Model Fitting with Bayesian Adaptive Direct Search
Authors	Luigi Acerbi, Wei Ji Ma
Abstract	Computational models in fields such as computational neuroscience are often evaluated via stochastic simulation or numerical approximation. Fitting these models implies a difficult optimization problem over complex, possibly noisy parameter landscapes. Bayesian optimization (BO) has been successfully applied to solving expensive black-box problems in engineering and machine learning. Here we explore whether BO can be applied as a general tool for model fitting. First, we present a novel hybrid BO algorithm, Bayesian adaptive direct search (BADS), that achieves competitive performance with an affordable computational overhead for the running time of typical models. We then perform an extensive benchmark of BADS vs. many common and state-of-the-art nonconvex, derivative-free optimizers, on a set of model-fitting problems with real data and models from six studies in behavioral, cognitive, and computational neuroscience. With default settings, BADS consistently finds comparable or better solutions than other methods, including `vanilla’ BO, showing great promise for advanced BO techniques, and BADS in particular, as a general model-fitting tool. \|
Tasks
Published	2017-05-11
URL	http://arxiv.org/abs/1705.04405v2
PDF	http://arxiv.org/pdf/1705.04405v2.pdf
PWC	https://paperswithcode.com/paper/practical-bayesian-optimization-for-model
Repo	https://github.com/lacerbi/bads
Framework	none

Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity


Title	Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity
Authors	Joseph Gomes, Bharath Ramsundar, Evan N. Feinberg, Vijay S. Pande
Abstract	Empirical scoring functions based on either molecular force fields or cheminformatics descriptors are widely used, in conjunction with molecular docking, during the early stages of drug discovery to predict potency and binding affinity of a drug-like molecule to a given target. These models require expert-level knowledge of physical chemistry and biology to be encoded as hand-tuned parameters or features rather than allowing the underlying model to select features in a data-driven procedure. Here, we develop a general 3-dimensional spatial convolution operation for learning atomic-level chemical interactions directly from atomic coordinates and demonstrate its application to structure-based bioactivity prediction. The atomic convolutional neural network is trained to predict the experimentally determined binding affinity of a protein-ligand complex by direct calculation of the energy associated with the complex, protein, and ligand given the crystal structure of the binding pose. Non-covalent interactions present in the complex that are absent in the protein-ligand sub-structures are identified and the model learns the interaction strength associated with these features. We test our model by predicting the binding free energy of a subset of protein-ligand complexes found in the PDBBind dataset and compare with state-of-the-art cheminformatics and machine learning-based approaches. We find that all methods achieve experimental accuracy and that atomic convolutional networks either outperform or perform competitively with the cheminformatics based methods. Unlike all previous protein-ligand prediction systems, atomic convolutional networks are end-to-end and fully-differentiable. They represent a new data-driven, physics-based deep learning model paradigm that offers a strong foundation for future improvements in structure-based bioactivity prediction.
Tasks	Drug Discovery
Published	2017-03-30
URL	http://arxiv.org/abs/1703.10603v1
PDF	http://arxiv.org/pdf/1703.10603v1.pdf
PWC	https://paperswithcode.com/paper/atomic-convolutional-networks-for-predicting
Repo	https://github.com/deepchem/deepchem
Framework	tf

Gradient Estimators for Implicit Models


Title	Gradient Estimators for Implicit Models
Authors	Yingzhen Li, Richard E. Turner
Abstract	Implicit models, which allow for the generation of samples but not for point-wise evaluation of probabilities, are omnipresent in real-world problems tackled by machine learning and a hot topic of current research. Some examples include data simulators that are widely used in engineering and scientific research, generative adversarial networks (GANs) for image synthesis, and hot-off-the-press approximate inference techniques relying on implicit distributions. The majority of existing approaches to learning implicit models rely on approximating the intractable distribution or optimisation objective for gradient-based optimisation, which is liable to produce inaccurate updates and thus poor models. This paper alleviates the need for such approximations by proposing the Stein gradient estimator, which directly estimates the score function of the implicitly defined distribution. The efficacy of the proposed estimator is empirically demonstrated by examples that include meta-learning for approximate inference, and entropy regularised GANs that provide improved sample diversity.
Tasks	Image Generation, Meta-Learning
Published	2017-05-19
URL	http://arxiv.org/abs/1705.07107v5
PDF	http://arxiv.org/pdf/1705.07107v5.pdf
PWC	https://paperswithcode.com/paper/gradient-estimators-for-implicit-models
Repo	https://github.com/YingzhenLi/SteinGrad
Framework	tf

Hierarchical 3D fully convolutional networks for multi-organ segmentation


Title	Hierarchical 3D fully convolutional networks for multi-organ segmentation
Authors	Holger R. Roth, Hirohisa Oda, Yuichiro Hayashi, Masahiro Oda, Natsuki Shimizu, Michitaka Fujiwara, Kazunari Misawa, Kensaku Mori
Abstract	Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of full volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of seven abdominal structures (artery, vein, liver, spleen, stomach, gallbladder, and pancreas) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training organ-specific models. To this end, we propose a two-stage, coarse-to-fine approach that trains an FCN model to roughly delineate the organs of interest in the first stage (seeing $\sim$40% of the voxels within a simple, automatically generated binary mask of the patient’s body). We then use these predictions of the first-stage FCN to define a candidate region that will be used to train a second FCN. This step reduces the number of voxels the FCN has to classify to $\sim$10% while maintaining a recall high of $>$99%. This second-stage FCN can now focus on more detailed segmentation of the organs. We respectively utilize training and validation sets consisting of 281 and 50 clinical CT images. Our hierarchical approach provides an improved Dice score of 7.5 percentage points per organ on average in our validation set. We furthermore test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans with three anatomical labels (liver, spleen, and pancreas). In such challenging organs as the pancreas, our hierarchical approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset.
Tasks
Published	2017-04-21
URL	http://arxiv.org/abs/1704.06382v1
PDF	http://arxiv.org/pdf/1704.06382v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-3d-fully-convolutional-networks
Repo	https://github.com/holgerroth/3Dunet_abdomen_cascade
Framework	none