Paper Group AWR 103
One-Shot Learning for Semantic Segmentation. RED: Reinforced Encoder-Decoder Networks for Action Anticipation. End-to-end Learning of Deterministic Decision Trees. Deep Koalarization: Image Colorization using CNNs and Inception-ResNet-v2. Detecting and Recognizing Human-Object Interactions. Interpretability Beyond Feature Attribution: Quantitative …
One-Shot Learning for Semantic Segmentation
Title | One-Shot Learning for Semantic Segmentation |
Authors | Amirreza Shaban, Shray Bansal, Zhen Liu, Irfan Essa, Byron Boots |
Abstract | Low-shot learning methods for image classification support learning from sparse data. We extend these techniques to support dense semantic image segmentation. Specifically, we train a network that, given a small set of annotated images, produces parameters for a Fully Convolutional Network (FCN). We use this FCN to perform dense pixel-level prediction on a test image for the new semantic class. Our architecture shows a 25% relative meanIoU improvement compared to the best baseline methods for one-shot segmentation on unseen classes in the PASCAL VOC 2012 dataset and is at least 3 times faster. |
Tasks | Image Classification, One-Shot Learning, One-Shot Segmentation, Semantic Segmentation |
Published | 2017-09-11 |
URL | http://arxiv.org/abs/1709.03410v1 |
http://arxiv.org/pdf/1709.03410v1.pdf | |
PWC | https://paperswithcode.com/paper/one-shot-learning-for-semantic-segmentation |
Repo | https://github.com/vamsirk/OneShotSemanticSegmentation |
Framework | none |
RED: Reinforced Encoder-Decoder Networks for Action Anticipation
Title | RED: Reinforced Encoder-Decoder Networks for Action Anticipation |
Authors | Jiyang Gao, Zhenheng Yang, Ram Nevatia |
Abstract | Action anticipation aims to detect an action before it happens. Many real world applications in robotics and surveillance are related to this predictive capability. Current methods address this problem by first anticipating visual representations of future frames and then categorizing the anticipated representations to actions. However, anticipation is based on a single past frame’s representation, which ignores the history trend. Besides, it can only anticipate a fixed future time. We propose a Reinforced Encoder-Decoder (RED) network for action anticipation. RED takes multiple history representations as input and learns to anticipate a sequence of future representations. One salient aspect of RED is that a reinforcement module is adopted to provide sequence-level supervision; the reward function is designed to encourage the system to make correct predictions as early as possible. We test RED on TVSeries, THUMOS-14 and TV-Human-Interaction datasets for action anticipation and achieve state-of-the-art performance on all datasets. |
Tasks | |
Published | 2017-07-16 |
URL | http://arxiv.org/abs/1707.04818v1 |
http://arxiv.org/pdf/1707.04818v1.pdf | |
PWC | https://paperswithcode.com/paper/red-reinforced-encoder-decoder-networks-for |
Repo | https://github.com/rajskar/CS763Project |
Framework | pytorch |
End-to-end Learning of Deterministic Decision Trees
Title | End-to-end Learning of Deterministic Decision Trees |
Authors | Thomas Hehn, Fred A. Hamprecht |
Abstract | Conventional decision trees have a number of favorable properties, including interpretability, a small computational footprint and the ability to learn from little training data. However, they lack a key quality that has helped fuel the deep learning revolution: that of being end-to-end trainable, and to learn from scratch those features that best allow to solve a given supervised learning problem. Recent work (Kontschieder 2015) has addressed this deficit, but at the cost of losing a main attractive trait of decision trees: the fact that each sample is routed along a small subset of tree nodes only. We here propose a model and Expectation-Maximization training scheme for decision trees that are fully probabilistic at train time, but after a deterministic annealing process become deterministic at test time. We also analyze the learned oblique split parameters on image datasets and show that Neural Networks can be trained at each split node. In summary, we present the first end-to-end learning scheme for deterministic decision trees and present results on par with or superior to published standard oblique decision tree algorithms. |
Tasks | |
Published | 2017-12-07 |
URL | http://arxiv.org/abs/1712.02743v1 |
http://arxiv.org/pdf/1712.02743v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-learning-of-deterministic-decision |
Repo | https://github.com/tomsal/endtoenddecisiontrees |
Framework | pytorch |
Deep Koalarization: Image Colorization using CNNs and Inception-ResNet-v2
Title | Deep Koalarization: Image Colorization using CNNs and Inception-ResNet-v2 |
Authors | Federico Baldassarre, Diego González Morín, Lucas Rodés-Guirao |
Abstract | We review some of the most recent approaches to colorize gray-scale images using deep learning methods. Inspired by these, we propose a model which combines a deep Convolutional Neural Network trained from scratch with high-level features extracted from the Inception-ResNet-v2 pre-trained model. Thanks to its fully convolutional architecture, our encoder-decoder model can process images of any size and aspect ratio. Other than presenting the training results, we assess the “public acceptance” of the generated images by means of a user study. Finally, we present a carousel of applications on different types of images, such as historical photographs. |
Tasks | Colorization |
Published | 2017-12-09 |
URL | http://arxiv.org/abs/1712.03400v1 |
http://arxiv.org/pdf/1712.03400v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-koalarization-image-colorization-using |
Repo | https://github.com/sukkritsharmaofficial/Colourization-of-B-W-photos |
Framework | none |
Detecting and Recognizing Human-Object Interactions
Title | Detecting and Recognizing Human-Object Interactions |
Authors | Georgia Gkioxari, Ross Girshick, Piotr Dollár, Kaiming He |
Abstract | To understand the visual world, a machine must not only recognize individual object instances but also how they interact. Humans are often at the center of such interactions and detecting human-object interactions is an important practical and scientific problem. In this paper, we address the task of detecting <human, verb, object> triplets in challenging everyday photos. We propose a novel model that is driven by a human-centric approach. Our hypothesis is that the appearance of a person – their pose, clothing, action – is a powerful cue for localizing the objects they are interacting with. To exploit this cue, our model learns to predict an action-specific density over target object locations based on the appearance of a detected person. Our model also jointly learns to detect people and objects, and by fusing these predictions it efficiently infers interaction triplets in a clean, jointly trained end-to-end system we call InteractNet. We validate our approach on the recently introduced Verbs in COCO (V-COCO) and HICO-DET datasets, where we show quantitatively compelling results. |
Tasks | Human-Object Interaction Detection |
Published | 2017-04-24 |
URL | http://arxiv.org/abs/1704.07333v3 |
http://arxiv.org/pdf/1704.07333v3.pdf | |
PWC | https://paperswithcode.com/paper/detecting-and-recognizing-human-object |
Repo | https://github.com/facebookresearch/detectron |
Framework | pytorch |
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
Title | Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) |
Authors | Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres |
Abstract | The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net’s internal state in terms of human-friendly concepts. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use CAVs as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result–for example, how sensitive a prediction of “zebra” is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application. |
Tasks | Image Classification |
Published | 2017-11-30 |
URL | http://arxiv.org/abs/1711.11279v5 |
http://arxiv.org/pdf/1711.11279v5.pdf | |
PWC | https://paperswithcode.com/paper/interpretability-beyond-feature-attribution |
Repo | https://github.com/maragraziani/iMIMIC-RCVs |
Framework | tf |
Deep Fisher Discriminant Learning for Mobile Hand Gesture Recognition
Title | Deep Fisher Discriminant Learning for Mobile Hand Gesture Recognition |
Authors | Chunyu Xie, Ce Li, Baochang Zhang, Chen Chen, Jungong Han |
Abstract | Gesture recognition is a challenging problem in the field of biometrics. In this paper, we integrate Fisher criterion into Bidirectional Long-Short Term Memory (BLSTM) network and Bidirectional Gated Recurrent Unit (BGRU),thus leading to two new deep models termed as F-BLSTM and F-BGRU. BothFisher discriminative deep models can effectively classify the gesture based on analyzing the acceleration and angular velocity data of the human gestures. Moreover, we collect a large Mobile Gesture Database (MGD) based on the accelerations and angular velocities containing 5547 sequences of 12 gestures. Extensive experiments are conducted to validate the superior performance of the proposed networks as compared to the state-of-the-art BLSTM and BGRU on MGD database and two benchmark databases (i.e. BUAA mobile gesture and SmartWatch gesture). |
Tasks | Gesture Recognition, Hand Gesture Recognition, Hand-Gesture Recognition |
Published | 2017-07-12 |
URL | http://arxiv.org/abs/1707.03692v1 |
http://arxiv.org/pdf/1707.03692v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-fisher-discriminant-learning-for-mobile |
Repo | https://github.com/chriswegmann/drone_steering |
Framework | none |
Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks
Title | Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks |
Authors | Zhilin Yang, Ruslan Salakhutdinov, William W. Cohen |
Abstract | Recent papers have shown that neural networks obtain state-of-the-art performance on several different sequence tagging tasks. One appealing property of such systems is their generality, as excellent performance can be achieved with a unified architecture and without task-specific feature engineering. However, it is unclear if such systems can be used for tasks without large amounts of training data. In this paper we explore the problem of transfer learning for neural sequence taggers, where a source task with plentiful annotations (e.g., POS tagging on Penn Treebank) is used to improve performance on a target task with fewer available annotations (e.g., POS tagging for microblogs). We examine the effects of transfer learning for deep hierarchical recurrent networks across domains, applications, and languages, and show that significant improvement can often be obtained. These improvements lead to improvements over the current state-of-the-art on several well-studied tasks. |
Tasks | Feature Engineering, Named Entity Recognition, Part-Of-Speech Tagging, Transfer Learning |
Published | 2017-03-18 |
URL | http://arxiv.org/abs/1703.06345v1 |
http://arxiv.org/pdf/1703.06345v1.pdf | |
PWC | https://paperswithcode.com/paper/transfer-learning-for-sequence-tagging-with |
Repo | https://github.com/jiesutd/NCRFpp |
Framework | pytorch |
Quantifying the Effects of Enforcing Disentanglement on Variational Autoencoders
Title | Quantifying the Effects of Enforcing Disentanglement on Variational Autoencoders |
Authors | Momchil Peychev, Petar Veličković, Pietro Liò |
Abstract | The notion of disentangled autoencoders was proposed as an extension to the variational autoencoder by introducing a disentanglement parameter $\beta$, controlling the learning pressure put on the possible underlying latent representations. For certain values of $\beta$ this kind of autoencoders is capable of encoding independent input generative factors in separate elements of the code, leading to a more interpretable and predictable model behaviour. In this paper we quantify the effects of the parameter $\beta$ on the model performance and disentanglement. After training multiple models with the same value of $\beta$, we establish the existence of consistent variance in one of the disentanglement measures, proposed in literature. The negative consequences of the disentanglement to the autoencoder’s discriminative ability are also asserted while varying the amount of examples available during training. |
Tasks | |
Published | 2017-11-24 |
URL | http://arxiv.org/abs/1711.09159v1 |
http://arxiv.org/pdf/1711.09159v1.pdf | |
PWC | https://paperswithcode.com/paper/quantifying-the-effects-of-enforcing |
Repo | https://github.com/mpeychev/disentangled-autoencoders |
Framework | tf |
Robust SfM with Little Image Overlap
Title | Robust SfM with Little Image Overlap |
Authors | Yohann Salaun, Renaud Marlet, Pascal Monasse |
Abstract | Usual Structure-from-Motion (SfM) techniques require at least trifocal overlaps to calibrate cameras and reconstruct a scene. We consider here scenarios of reduced image sets with little overlap, possibly as low as two images at most seeing the same part of the scene. We propose a new method, based on line coplanarity hypotheses, for estimating the relative scale of two independent bifocal calibrations sharing a camera, without the need of any trifocal information or Manhattan-world assumption. We use it to compute SfM in a chain of up-to-scale relative motions. For accuracy, we however also make use of trifocal information for line and/or point features, when present, relaxing usual trifocal constraints. For robustness to wrong assumptions and mismatches, we embed all constraints in a parameterless RANSAC-like approach. Experiments show that we can calibrate datasets that previously could not, and that this wider applicability does not come at the cost of inaccuracy. |
Tasks | |
Published | 2017-03-23 |
URL | http://arxiv.org/abs/1703.07957v2 |
http://arxiv.org/pdf/1703.07957v2.pdf | |
PWC | https://paperswithcode.com/paper/robust-sfm-with-little-image-overlap |
Repo | https://github.com/ySalaun/LineSfM |
Framework | none |
Wasserstein GAN
Title | Wasserstein GAN |
Authors | Martin Arjovsky, Soumith Chintala, Léon Bottou |
Abstract | We introduce a new algorithm named WGAN, an alternative to traditional GAN training. In this new model, we show that we can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches. Furthermore, we show that the corresponding optimization problem is sound, and provide extensive theoretical work highlighting the deep connections to other distances between distributions. |
Tasks | Image Generation |
Published | 2017-01-26 |
URL | http://arxiv.org/abs/1701.07875v3 |
http://arxiv.org/pdf/1701.07875v3.pdf | |
PWC | https://paperswithcode.com/paper/wasserstein-gan |
Repo | https://github.com/karl-hajjar/Generative-Adversarial-Networks |
Framework | pytorch |
Practical Bayesian Optimization for Model Fitting with Bayesian Adaptive Direct Search
Title | Practical Bayesian Optimization for Model Fitting with Bayesian Adaptive Direct Search |
Authors | Luigi Acerbi, Wei Ji Ma |
Abstract | Computational models in fields such as computational neuroscience are often evaluated via stochastic simulation or numerical approximation. Fitting these models implies a difficult optimization problem over complex, possibly noisy parameter landscapes. Bayesian optimization (BO) has been successfully applied to solving expensive black-box problems in engineering and machine learning. Here we explore whether BO can be applied as a general tool for model fitting. First, we present a novel hybrid BO algorithm, Bayesian adaptive direct search (BADS), that achieves competitive performance with an affordable computational overhead for the running time of typical models. We then perform an extensive benchmark of BADS vs. many common and state-of-the-art nonconvex, derivative-free optimizers, on a set of model-fitting problems with real data and models from six studies in behavioral, cognitive, and computational neuroscience. With default settings, BADS consistently finds comparable or better solutions than other methods, including `vanilla’ BO, showing great promise for advanced BO techniques, and BADS in particular, as a general model-fitting tool. | |
Tasks | |
Published | 2017-05-11 |
URL | http://arxiv.org/abs/1705.04405v2 |
http://arxiv.org/pdf/1705.04405v2.pdf | |
PWC | https://paperswithcode.com/paper/practical-bayesian-optimization-for-model |
Repo | https://github.com/lacerbi/bads |
Framework | none |
Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity
Title | Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity |
Authors | Joseph Gomes, Bharath Ramsundar, Evan N. Feinberg, Vijay S. Pande |
Abstract | Empirical scoring functions based on either molecular force fields or cheminformatics descriptors are widely used, in conjunction with molecular docking, during the early stages of drug discovery to predict potency and binding affinity of a drug-like molecule to a given target. These models require expert-level knowledge of physical chemistry and biology to be encoded as hand-tuned parameters or features rather than allowing the underlying model to select features in a data-driven procedure. Here, we develop a general 3-dimensional spatial convolution operation for learning atomic-level chemical interactions directly from atomic coordinates and demonstrate its application to structure-based bioactivity prediction. The atomic convolutional neural network is trained to predict the experimentally determined binding affinity of a protein-ligand complex by direct calculation of the energy associated with the complex, protein, and ligand given the crystal structure of the binding pose. Non-covalent interactions present in the complex that are absent in the protein-ligand sub-structures are identified and the model learns the interaction strength associated with these features. We test our model by predicting the binding free energy of a subset of protein-ligand complexes found in the PDBBind dataset and compare with state-of-the-art cheminformatics and machine learning-based approaches. We find that all methods achieve experimental accuracy and that atomic convolutional networks either outperform or perform competitively with the cheminformatics based methods. Unlike all previous protein-ligand prediction systems, atomic convolutional networks are end-to-end and fully-differentiable. They represent a new data-driven, physics-based deep learning model paradigm that offers a strong foundation for future improvements in structure-based bioactivity prediction. |
Tasks | Drug Discovery |
Published | 2017-03-30 |
URL | http://arxiv.org/abs/1703.10603v1 |
http://arxiv.org/pdf/1703.10603v1.pdf | |
PWC | https://paperswithcode.com/paper/atomic-convolutional-networks-for-predicting |
Repo | https://github.com/deepchem/deepchem |
Framework | tf |
Gradient Estimators for Implicit Models
Title | Gradient Estimators for Implicit Models |
Authors | Yingzhen Li, Richard E. Turner |
Abstract | Implicit models, which allow for the generation of samples but not for point-wise evaluation of probabilities, are omnipresent in real-world problems tackled by machine learning and a hot topic of current research. Some examples include data simulators that are widely used in engineering and scientific research, generative adversarial networks (GANs) for image synthesis, and hot-off-the-press approximate inference techniques relying on implicit distributions. The majority of existing approaches to learning implicit models rely on approximating the intractable distribution or optimisation objective for gradient-based optimisation, which is liable to produce inaccurate updates and thus poor models. This paper alleviates the need for such approximations by proposing the Stein gradient estimator, which directly estimates the score function of the implicitly defined distribution. The efficacy of the proposed estimator is empirically demonstrated by examples that include meta-learning for approximate inference, and entropy regularised GANs that provide improved sample diversity. |
Tasks | Image Generation, Meta-Learning |
Published | 2017-05-19 |
URL | http://arxiv.org/abs/1705.07107v5 |
http://arxiv.org/pdf/1705.07107v5.pdf | |
PWC | https://paperswithcode.com/paper/gradient-estimators-for-implicit-models |
Repo | https://github.com/YingzhenLi/SteinGrad |
Framework | tf |
Hierarchical 3D fully convolutional networks for multi-organ segmentation
Title | Hierarchical 3D fully convolutional networks for multi-organ segmentation |
Authors | Holger R. Roth, Hirohisa Oda, Yuichiro Hayashi, Masahiro Oda, Natsuki Shimizu, Michitaka Fujiwara, Kazunari Misawa, Kensaku Mori |
Abstract | Recent advances in 3D fully convolutional networks (FCN) have made it feasible to produce dense voxel-wise predictions of full volumetric images. In this work, we show that a multi-class 3D FCN trained on manually labeled CT scans of seven abdominal structures (artery, vein, liver, spleen, stomach, gallbladder, and pancreas) can achieve competitive segmentation results, while avoiding the need for handcrafting features or training organ-specific models. To this end, we propose a two-stage, coarse-to-fine approach that trains an FCN model to roughly delineate the organs of interest in the first stage (seeing $\sim$40% of the voxels within a simple, automatically generated binary mask of the patient’s body). We then use these predictions of the first-stage FCN to define a candidate region that will be used to train a second FCN. This step reduces the number of voxels the FCN has to classify to $\sim$10% while maintaining a recall high of $>$99%. This second-stage FCN can now focus on more detailed segmentation of the organs. We respectively utilize training and validation sets consisting of 281 and 50 clinical CT images. Our hierarchical approach provides an improved Dice score of 7.5 percentage points per organ on average in our validation set. We furthermore test our models on a completely unseen data collection acquired at a different hospital that includes 150 CT scans with three anatomical labels (liver, spleen, and pancreas). In such challenging organs as the pancreas, our hierarchical approach improves the mean Dice score from 68.5 to 82.2%, achieving the highest reported average score on this dataset. |
Tasks | |
Published | 2017-04-21 |
URL | http://arxiv.org/abs/1704.06382v1 |
http://arxiv.org/pdf/1704.06382v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-3d-fully-convolutional-networks |
Repo | https://github.com/holgerroth/3Dunet_abdomen_cascade |
Framework | none |