October 17, 2019

2583 words 13 mins read

Paper Group ANR 758

Knowledge Transfer with Jacobian Matching. Probabilistic Solutions To Ordinary Differential Equations As Non-Linear Bayesian Filtering: A New Perspective. Predict and Constrain: Modeling Cardinality in Deep Structured Prediction. Clipped Matrix Completion: A Remedy for Ceiling Effects. A Dynamic Oracle for Linear-Time 2-Planar Dependency Parsing. A …

Knowledge Transfer with Jacobian Matching


Title	Knowledge Transfer with Jacobian Matching
Authors	Suraj Srinivas, Francois Fleuret
Abstract	Classical distillation methods transfer representations from a “teacher” neural network to a “student” network by matching their output activations. Recent methods also match the Jacobians, or the gradient of output activations with the input. However, this involves making some ad hoc decisions, in particular, the choice of the loss function. In this paper, we first establish an equivalence between Jacobian matching and distillation with input noise, from which we derive appropriate loss functions for Jacobian matching. We then rely on this analysis to apply Jacobian matching to transfer learning by establishing equivalence of a recent transfer learning procedure to distillation. We then show experimentally on standard image datasets that Jacobian-based penalties improve distillation, robustness to noisy inputs, and transfer learning.
Tasks	Transfer Learning
Published	2018-03-01
URL	http://arxiv.org/abs/1803.00443v1
PDF	http://arxiv.org/pdf/1803.00443v1.pdf
PWC	https://paperswithcode.com/paper/knowledge-transfer-with-jacobian-matching
Repo
Framework

Probabilistic Solutions To Ordinary Differential Equations As Non-Linear Bayesian Filtering: A New Perspective


Title	Probabilistic Solutions To Ordinary Differential Equations As Non-Linear Bayesian Filtering: A New Perspective
Authors	Filip Tronarp, Hans Kersting, Simo Särkkä, Philipp Hennig
Abstract	We formulate probabilistic numerical approximations to solutions of ordinary differential equations (ODEs) as problems in Gaussian process (GP) regression with non-linear measurement functions. This is achieved by defining the measurement sequence to consist of the observations of the difference between the derivative of the GP and the vector field evaluated at the GP—which are all identically zero at the solution of the ODE. When the GP has a state-space representation, the problem can be reduced to a non-linear Bayesian filtering problem and all widely-used approximations to the Bayesian filtering and smoothing problems become applicable. Furthermore, all previous GP-based ODE solvers that are formulated in terms of generating synthetic measurements of the gradient field come out as specific approximations. Based on the non-linear Bayesian filtering problem posed in this paper, we develop novel Gaussian solvers for which we establish favourable stability properties. Additionally, non-Gaussian approximations to the filtering problem are derived by the particle filter approach. The resulting solvers are compared with other probabilistic solvers in illustrative experiments.
Tasks
Published	2018-10-08
URL	http://arxiv.org/abs/1810.03440v4
PDF	http://arxiv.org/pdf/1810.03440v4.pdf
PWC	https://paperswithcode.com/paper/probabilistic-solutions-to-ordinary
Repo
Framework

Predict and Constrain: Modeling Cardinality in Deep Structured Prediction


Title	Predict and Constrain: Modeling Cardinality in Deep Structured Prediction
Authors	Nataly Brukhim, Amir Globerson
Abstract	Many machine learning problems require the prediction of multi-dimensional labels. Such structured prediction models can benefit from modeling dependencies between labels. Recently, several deep learning approaches to structured prediction have been proposed. Here we focus on capturing cardinality constraints in such models. Namely, constraining the number of non-zero labels that the model outputs. Such constraints have proven very useful in previous structured prediction approaches, but it is a challenge to introduce them into a deep learning framework. Here we show how to do this via a novel deep architecture. Our approach outperforms strong baselines, achieving state-of-the-art results on multi-label classification benchmarks.
Tasks	Multi-Label Classification, Structured Prediction
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04721v1
PDF	http://arxiv.org/pdf/1802.04721v1.pdf
PWC	https://paperswithcode.com/paper/predict-and-constrain-modeling-cardinality-in
Repo
Framework

Clipped Matrix Completion: A Remedy for Ceiling Effects


Title	Clipped Matrix Completion: A Remedy for Ceiling Effects
Authors	Takeshi Teshima, Miao Xu, Issei Sato, Masashi Sugiyama
Abstract	We consider the problem of recovering a low-rank matrix from its clipped observations. Clipping is conceivable in many scientific areas that obstructs statistical analyses. On the other hand, matrix completion (MC) methods can recover a low-rank matrix from various information deficits by using the principle of low-rank completion. However, the current theoretical guarantees for low-rank MC do not apply to clipped matrices, as the deficit depends on the underlying values. Therefore, the feasibility of clipped matrix completion (CMC) is not trivial. In this paper, we first provide a theoretical guarantee for the exact recovery of CMC by using a trace-norm minimization algorithm. Furthermore, we propose practical CMC algorithms by extending ordinary MC methods. Our extension is to use the squared hinge loss in place of the squared loss for reducing the penalty of over-estimation on clipped entries. We also propose a novel regularization term tailored for CMC. It is a combination of two trace-norm terms, and we theoretically bound the recovery error under the regularization. We demonstrate the effectiveness of the proposed methods through experiments using both synthetic and benchmark data for recommendation systems.
Tasks	Matrix Completion, Recommendation Systems
Published	2018-09-13
URL	http://arxiv.org/abs/1809.04997v3
PDF	http://arxiv.org/pdf/1809.04997v3.pdf
PWC	https://paperswithcode.com/paper/clipped-matrix-completion-a-remedy-for
Repo
Framework

A Dynamic Oracle for Linear-Time 2-Planar Dependency Parsing


Title	A Dynamic Oracle for Linear-Time 2-Planar Dependency Parsing
Authors	Daniel Fernández-González, Carlos Gómez-Rodríguez
Abstract	We propose an efficient dynamic oracle for training the 2-Planar transition-based parser, a linear-time parser with over 99% coverage on non-projective syntactic corpora. This novel approach outperforms the static training strategy in the vast majority of languages tested and scored better on most datasets than the arc-hybrid parser enhanced with the SWAP transition, which can handle unrestricted non-projectivity.
Tasks	Dependency Parsing
Published	2018-05-14
URL	http://arxiv.org/abs/1805.05202v2
PDF	http://arxiv.org/pdf/1805.05202v2.pdf
PWC	https://paperswithcode.com/paper/a-dynamic-oracle-for-linear-time-2-planar-1
Repo
Framework

Annotating shadows, highlights and faces: the contribution of a ‘human in the loop’ for digital art history


Title	Annotating shadows, highlights and faces: the contribution of a ‘human in the loop’ for digital art history
Authors	Maarten W. A. Wijntjes
Abstract	While automatic computational techniques appear to reveal novel insights in digital art history, a complementary approach seems to get less attention: that of human annotation. We argue and exemplify that a ‘human in the loop’ can reveal insights that may be difficult to detect automatically. Specifically, we focussed on perceptual aspects within pictorial art. Using rather simple annotation tasks (e.g. delineate human lengths, indicate highlights and classify gaze direction) we could both replicate earlier findings and reveal novel insights into pictorial conventions. We found that Canaletto depicted human figures in rather accurate perspective, varied viewpoint elevation between approximately 3 and 9 meters and highly preferred light directions parallel to the projection plane. Furthermore, we found that taking the averaged images of leftward looking faces reveals a woman, and for rightward looking faces showed a male, confirming earlier accounts on lateral gender bias in pictorial art. Lastly, we confirmed and refined the well-known light-from-the-left bias. Together, the annotations, analyses and results exemplify how human annotation can contribute and complement to technical and digital art history.
Tasks
Published	2018-09-10
URL	http://arxiv.org/abs/1809.03539v1
PDF	http://arxiv.org/pdf/1809.03539v1.pdf
PWC	https://paperswithcode.com/paper/annotating-shadows-highlights-and-faces-the
Repo
Framework

Meta-learning: searching in the model space


Title	Meta-learning: searching in the model space
Authors	Włodzisław Duch, Karol Grudzińsk
Abstract	There is no free lunch, no single learning algorithm that will outperform other algorithms on all data. In practice different approaches are tried and the best algorithm selected. An alternative solution is to build new algorithms on demand by creating a framework that accommodates many algorithms. The best combination of parameters and procedures is searched here in the space of all possible models belonging to the framework of Similarity-Based Methods (SBMs). Such meta-learning approach gives a chance to find the best method in all cases. Issues related to the meta-learning and first tests of this approach are presented.
Tasks	Meta-Learning
Published	2018-06-16
URL	http://arxiv.org/abs/1806.06207v1
PDF	http://arxiv.org/pdf/1806.06207v1.pdf
PWC	https://paperswithcode.com/paper/meta-learning-searching-in-the-model-space
Repo
Framework

Automatic, Personalized, and Flexible Playlist Generation using Reinforcement Learning


Title	Automatic, Personalized, and Flexible Playlist Generation using Reinforcement Learning
Authors	Shun-Yao Shih, Heng-Yu Chi
Abstract	Songs can be well arranged by professional music curators to form a riveting playlist that creates engaging listening experiences. However, it is time-consuming for curators to timely rearrange these playlists for fitting trends in future. By exploiting the techniques of deep learning and reinforcement learning, in this paper, we consider music playlist generation as a language modeling problem and solve it by the proposed attention language model with policy gradient. We develop a systematic and interactive approach so that the resulting playlists can be tuned flexibly according to user preferences. Considering a playlist as a sequence of words, we first train our attention RNN language model on baseline recommended playlists. By optimizing suitable imposed reward functions, the model is thus refined for corresponding preferences. The experimental results demonstrate that our approach not only generates coherent playlists automatically but is also able to flexibly recommend personalized playlists for diversity, novelty and freshness.
Tasks	Language Modelling
Published	2018-09-12
URL	http://arxiv.org/abs/1809.04214v1
PDF	http://arxiv.org/pdf/1809.04214v1.pdf
PWC	https://paperswithcode.com/paper/automatic-personalized-and-flexible-playlist
Repo
Framework

Combining Model-Free Q-Ensembles and Model-Based Approaches for Informed Exploration


Title	Combining Model-Free Q-Ensembles and Model-Based Approaches for Informed Exploration
Authors	Sreecharan Sankaranarayanan, Raghuram Mandyam Annasamy, Katia Sycara, Carolyn Penstein Rosé
Abstract	Q-Ensembles are a model-free approach where input images are fed into different Q-networks and exploration is driven by the assumption that uncertainty is proportional to the variance of the output Q-values obtained. They have been shown to perform relatively well compared to other exploration strategies. Further, model-based approaches, such as encoder-decoder models have been used successfully for next frame prediction given previous frames. This paper proposes to integrate the model-free Q-ensembles and model-based approaches with the hope of compounding the benefits of both and achieving superior exploration as a result. Results show that a model-based trajectory memory approach when combined with Q-ensembles produces superior performance when compared to only using Q-ensembles.
Tasks
Published	2018-06-12
URL	http://arxiv.org/abs/1806.04552v1
PDF	http://arxiv.org/pdf/1806.04552v1.pdf
PWC	https://paperswithcode.com/paper/combining-model-free-q-ensembles-and-model
Repo
Framework

Improved Techniques For Weakly-Supervised Object Localization


Title	Improved Techniques For Weakly-Supervised Object Localization
Authors	Junsuk Choe, Joo Hyun Park, Hyunjung Shim
Abstract	We propose an improved technique for weakly-supervised object localization. Conventional methods have a limitation that they focus only on most discriminative parts of the target objects. The recent study addressed this issue and resolved this limitation by augmenting the training data for less discriminative parts. To this end, we employ an effective data augmentation for improving the accuracy of the object localization. In addition, we introduce improved learning techniques by optimizing Convolutional Neural Networks (CNN) based on the state-of-the-art model. Based on extensive experiments, we evaluate the effectiveness of the proposed approach both qualitatively and quantitatively. Especially, we observe that our method improves the Top-1 localization accuracy by 21.4 - 37.3% depending on configurations, compared to the current state-of-the-art technique of the weakly-supervised object localization.
Tasks	Data Augmentation, Object Localization, Weakly-Supervised Object Localization
Published	2018-02-22
URL	http://arxiv.org/abs/1802.07888v2
PDF	http://arxiv.org/pdf/1802.07888v2.pdf
PWC	https://paperswithcode.com/paper/improved-techniques-for-weakly-supervised
Repo
Framework

Guided Dropout


Title	Guided Dropout
Authors	Rohit Keshari, Richa Singh, Mayank Vatsa
Abstract	Dropout is often used in deep neural networks to prevent over-fitting. Conventionally, dropout training invokes \textit{random drop} of nodes from the hidden layers of a Neural Network. It is our hypothesis that a guided selection of nodes for intelligent dropout can lead to better generalization as compared to the traditional dropout. In this research, we propose “guided dropout” for training deep neural network which drop nodes by measuring the strength of each node. We also demonstrate that conventional dropout is a specific case of the proposed guided dropout. Experimental evaluation on multiple datasets including MNIST, CIFAR10, CIFAR100, SVHN, and Tiny ImageNet demonstrate the efficacy of the proposed guided dropout.
Tasks
Published	2018-12-10
URL	http://arxiv.org/abs/1812.03965v1
PDF	http://arxiv.org/pdf/1812.03965v1.pdf
PWC	https://paperswithcode.com/paper/guided-dropout
Repo
Framework

Bridging Knowledge Gaps in Neural Entailment via Symbolic Models


Title	Bridging Knowledge Gaps in Neural Entailment via Symbolic Models
Authors	Dongyeop Kang, Tushar Khot, Ashish Sabharwal, Peter Clark
Abstract	Most textual entailment models focus on lexical gaps between the premise text and the hypothesis, but rarely on knowledge gaps. We focus on filling these knowledge gaps in the Science Entailment task, by leveraging an external structured knowledge base (KB) of science facts. Our new architecture combines standard neural entailment models with a knowledge lookup module. To facilitate this lookup, we propose a fact-level decomposition of the hypothesis, and verifying the resulting sub-facts against both the textual premise and the structured KB. Our model, NSnet, learns to aggregate predictions from these heterogeneous data formats. On the SciTail dataset, NSnet outperforms a simpler combination of the two predictions by 3% and the base entailment model by 5%.
Tasks	Natural Language Inference
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09333v2
PDF	http://arxiv.org/pdf/1808.09333v2.pdf
PWC	https://paperswithcode.com/paper/bridging-knowledge-gaps-in-neural-entailment
Repo
Framework

Physical Primitive Decomposition


Title	Physical Primitive Decomposition
Authors	Zhijian Liu, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu
Abstract	Objects are made of parts, each with distinct geometry, physics, functionality, and affordances. Developing such a distributed, physical, interpretable representation of objects will facilitate intelligent agents to better explore and interact with the world. In this paper, we study physical primitive decomposition—understanding an object through its components, each with physical and geometric attributes. As annotated data for object parts and physics are rare, we propose a novel formulation that learns physical primitives by explaining both an object’s appearance and its behaviors in physical events. Our model performs well on block towers and tools in both synthetic and real scenarios; we also demonstrate that visual and physical observations often provide complementary signals. We further present ablation and behavioral studies to better understand our model and contrast it with human performance.
Tasks
Published	2018-09-13
URL	http://arxiv.org/abs/1809.05070v1
PDF	http://arxiv.org/pdf/1809.05070v1.pdf
PWC	https://paperswithcode.com/paper/physical-primitive-decomposition
Repo
Framework

Gradient Descent Happens in a Tiny Subspace


Title	Gradient Descent Happens in a Tiny Subspace
Authors	Guy Gur-Ari, Daniel A. Roberts, Ethan Dyer
Abstract	We show that in a variety of large-scale deep learning scenarios the gradient dynamically converges to a very small subspace after a short period of training. The subspace is spanned by a few top eigenvectors of the Hessian (equal to the number of classes in the dataset), and is mostly preserved over long periods of training. A simple argument then suggests that gradient descent may happen mostly in this subspace. We give an example of this effect in a solvable model of classification, and we comment on possible implications for optimization and learning.
Tasks
Published	2018-12-12
URL	http://arxiv.org/abs/1812.04754v1
PDF	http://arxiv.org/pdf/1812.04754v1.pdf
PWC	https://paperswithcode.com/paper/gradient-descent-happens-in-a-tiny-subspace
Repo
Framework

Iterative Attention Mining for Weakly Supervised Thoracic Disease Pattern Localization in Chest X-Rays


Title	Iterative Attention Mining for Weakly Supervised Thoracic Disease Pattern Localization in Chest X-Rays
Authors	Jinzheng Cai, Le Lu, Adam P. Harrison, Xiaoshuang Shi, Pingjun Chen, Lin Yang
Abstract	Given image labels as the only supervisory signal, we focus on harvesting, or mining, thoracic disease localizations from chest X-ray images. Harvesting such localizations from existing datasets allows for the creation of improved data sources for computer-aided diagnosis and retrospective analyses. We train a convolutional neural network (CNN) for image classification and propose an attention mining (AM) strategy to improve the model’s sensitivity or saliency to disease patterns. The intuition of AM is that once the most salient disease area is blocked or hidden from the CNN model, it will pay attention to alternative image regions, while still attempting to make correct predictions. However, the model requires to be properly constrained during AM, otherwise, it may overfit to uncorrelated image parts and forget the valuable knowledge that it has learned from the original image classification task. To alleviate such side effects, we then design a knowledge preservation (KP) loss, which minimizes the discrepancy between responses for X-ray images from the original and the updated networks. Furthermore, we modify the CNN model to include multi-scale aggregation (MSA), improving its localization ability on small-scale disease findings, e.g., lung nodules. We experimentally validate our method on the publicly-available ChestX-ray14 dataset, outperforming a class activation map (CAM)-based approach, and demonstrating the value of our novel framework for mining disease locations.
Tasks	Image Classification
Published	2018-07-03
URL	http://arxiv.org/abs/1807.00958v1
PDF	http://arxiv.org/pdf/1807.00958v1.pdf
PWC	https://paperswithcode.com/paper/iterative-attention-mining-for-weakly
Repo
Framework