Paper Group ANR 758
Knowledge Transfer with Jacobian Matching. Probabilistic Solutions To Ordinary Differential Equations As Non-Linear Bayesian Filtering: A New Perspective. Predict and Constrain: Modeling Cardinality in Deep Structured Prediction. Clipped Matrix Completion: A Remedy for Ceiling Effects. A Dynamic Oracle for Linear-Time 2-Planar Dependency Parsing. A …
Knowledge Transfer with Jacobian Matching
Title | Knowledge Transfer with Jacobian Matching |
Authors | Suraj Srinivas, Francois Fleuret |
Abstract | Classical distillation methods transfer representations from a “teacher” neural network to a “student” network by matching their output activations. Recent methods also match the Jacobians, or the gradient of output activations with the input. However, this involves making some ad hoc decisions, in particular, the choice of the loss function. In this paper, we first establish an equivalence between Jacobian matching and distillation with input noise, from which we derive appropriate loss functions for Jacobian matching. We then rely on this analysis to apply Jacobian matching to transfer learning by establishing equivalence of a recent transfer learning procedure to distillation. We then show experimentally on standard image datasets that Jacobian-based penalties improve distillation, robustness to noisy inputs, and transfer learning. |
Tasks | Transfer Learning |
Published | 2018-03-01 |
URL | http://arxiv.org/abs/1803.00443v1 |
http://arxiv.org/pdf/1803.00443v1.pdf | |
PWC | https://paperswithcode.com/paper/knowledge-transfer-with-jacobian-matching |
Repo | |
Framework | |
Probabilistic Solutions To Ordinary Differential Equations As Non-Linear Bayesian Filtering: A New Perspective
Title | Probabilistic Solutions To Ordinary Differential Equations As Non-Linear Bayesian Filtering: A New Perspective |
Authors | Filip Tronarp, Hans Kersting, Simo Särkkä, Philipp Hennig |
Abstract | We formulate probabilistic numerical approximations to solutions of ordinary differential equations (ODEs) as problems in Gaussian process (GP) regression with non-linear measurement functions. This is achieved by defining the measurement sequence to consist of the observations of the difference between the derivative of the GP and the vector field evaluated at the GP—which are all identically zero at the solution of the ODE. When the GP has a state-space representation, the problem can be reduced to a non-linear Bayesian filtering problem and all widely-used approximations to the Bayesian filtering and smoothing problems become applicable. Furthermore, all previous GP-based ODE solvers that are formulated in terms of generating synthetic measurements of the gradient field come out as specific approximations. Based on the non-linear Bayesian filtering problem posed in this paper, we develop novel Gaussian solvers for which we establish favourable stability properties. Additionally, non-Gaussian approximations to the filtering problem are derived by the particle filter approach. The resulting solvers are compared with other probabilistic solvers in illustrative experiments. |
Tasks | |
Published | 2018-10-08 |
URL | http://arxiv.org/abs/1810.03440v4 |
http://arxiv.org/pdf/1810.03440v4.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-solutions-to-ordinary |
Repo | |
Framework | |
Predict and Constrain: Modeling Cardinality in Deep Structured Prediction
Title | Predict and Constrain: Modeling Cardinality in Deep Structured Prediction |
Authors | Nataly Brukhim, Amir Globerson |
Abstract | Many machine learning problems require the prediction of multi-dimensional labels. Such structured prediction models can benefit from modeling dependencies between labels. Recently, several deep learning approaches to structured prediction have been proposed. Here we focus on capturing cardinality constraints in such models. Namely, constraining the number of non-zero labels that the model outputs. Such constraints have proven very useful in previous structured prediction approaches, but it is a challenge to introduce them into a deep learning framework. Here we show how to do this via a novel deep architecture. Our approach outperforms strong baselines, achieving state-of-the-art results on multi-label classification benchmarks. |
Tasks | Multi-Label Classification, Structured Prediction |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04721v1 |
http://arxiv.org/pdf/1802.04721v1.pdf | |
PWC | https://paperswithcode.com/paper/predict-and-constrain-modeling-cardinality-in |
Repo | |
Framework | |
Clipped Matrix Completion: A Remedy for Ceiling Effects
Title | Clipped Matrix Completion: A Remedy for Ceiling Effects |
Authors | Takeshi Teshima, Miao Xu, Issei Sato, Masashi Sugiyama |
Abstract | We consider the problem of recovering a low-rank matrix from its clipped observations. Clipping is conceivable in many scientific areas that obstructs statistical analyses. On the other hand, matrix completion (MC) methods can recover a low-rank matrix from various information deficits by using the principle of low-rank completion. However, the current theoretical guarantees for low-rank MC do not apply to clipped matrices, as the deficit depends on the underlying values. Therefore, the feasibility of clipped matrix completion (CMC) is not trivial. In this paper, we first provide a theoretical guarantee for the exact recovery of CMC by using a trace-norm minimization algorithm. Furthermore, we propose practical CMC algorithms by extending ordinary MC methods. Our extension is to use the squared hinge loss in place of the squared loss for reducing the penalty of over-estimation on clipped entries. We also propose a novel regularization term tailored for CMC. It is a combination of two trace-norm terms, and we theoretically bound the recovery error under the regularization. We demonstrate the effectiveness of the proposed methods through experiments using both synthetic and benchmark data for recommendation systems. |
Tasks | Matrix Completion, Recommendation Systems |
Published | 2018-09-13 |
URL | http://arxiv.org/abs/1809.04997v3 |
http://arxiv.org/pdf/1809.04997v3.pdf | |
PWC | https://paperswithcode.com/paper/clipped-matrix-completion-a-remedy-for |
Repo | |
Framework | |
A Dynamic Oracle for Linear-Time 2-Planar Dependency Parsing
Title | A Dynamic Oracle for Linear-Time 2-Planar Dependency Parsing |
Authors | Daniel Fernández-González, Carlos Gómez-Rodríguez |
Abstract | We propose an efficient dynamic oracle for training the 2-Planar transition-based parser, a linear-time parser with over 99% coverage on non-projective syntactic corpora. This novel approach outperforms the static training strategy in the vast majority of languages tested and scored better on most datasets than the arc-hybrid parser enhanced with the SWAP transition, which can handle unrestricted non-projectivity. |
Tasks | Dependency Parsing |
Published | 2018-05-14 |
URL | http://arxiv.org/abs/1805.05202v2 |
http://arxiv.org/pdf/1805.05202v2.pdf | |
PWC | https://paperswithcode.com/paper/a-dynamic-oracle-for-linear-time-2-planar-1 |
Repo | |
Framework | |
Annotating shadows, highlights and faces: the contribution of a ‘human in the loop’ for digital art history
Title | Annotating shadows, highlights and faces: the contribution of a ‘human in the loop’ for digital art history |
Authors | Maarten W. A. Wijntjes |
Abstract | While automatic computational techniques appear to reveal novel insights in digital art history, a complementary approach seems to get less attention: that of human annotation. We argue and exemplify that a ‘human in the loop’ can reveal insights that may be difficult to detect automatically. Specifically, we focussed on perceptual aspects within pictorial art. Using rather simple annotation tasks (e.g. delineate human lengths, indicate highlights and classify gaze direction) we could both replicate earlier findings and reveal novel insights into pictorial conventions. We found that Canaletto depicted human figures in rather accurate perspective, varied viewpoint elevation between approximately 3 and 9 meters and highly preferred light directions parallel to the projection plane. Furthermore, we found that taking the averaged images of leftward looking faces reveals a woman, and for rightward looking faces showed a male, confirming earlier accounts on lateral gender bias in pictorial art. Lastly, we confirmed and refined the well-known light-from-the-left bias. Together, the annotations, analyses and results exemplify how human annotation can contribute and complement to technical and digital art history. |
Tasks | |
Published | 2018-09-10 |
URL | http://arxiv.org/abs/1809.03539v1 |
http://arxiv.org/pdf/1809.03539v1.pdf | |
PWC | https://paperswithcode.com/paper/annotating-shadows-highlights-and-faces-the |
Repo | |
Framework | |
Meta-learning: searching in the model space
Title | Meta-learning: searching in the model space |
Authors | Włodzisław Duch, Karol Grudzińsk |
Abstract | There is no free lunch, no single learning algorithm that will outperform other algorithms on all data. In practice different approaches are tried and the best algorithm selected. An alternative solution is to build new algorithms on demand by creating a framework that accommodates many algorithms. The best combination of parameters and procedures is searched here in the space of all possible models belonging to the framework of Similarity-Based Methods (SBMs). Such meta-learning approach gives a chance to find the best method in all cases. Issues related to the meta-learning and first tests of this approach are presented. |
Tasks | Meta-Learning |
Published | 2018-06-16 |
URL | http://arxiv.org/abs/1806.06207v1 |
http://arxiv.org/pdf/1806.06207v1.pdf | |
PWC | https://paperswithcode.com/paper/meta-learning-searching-in-the-model-space |
Repo | |
Framework | |
Automatic, Personalized, and Flexible Playlist Generation using Reinforcement Learning
Title | Automatic, Personalized, and Flexible Playlist Generation using Reinforcement Learning |
Authors | Shun-Yao Shih, Heng-Yu Chi |
Abstract | Songs can be well arranged by professional music curators to form a riveting playlist that creates engaging listening experiences. However, it is time-consuming for curators to timely rearrange these playlists for fitting trends in future. By exploiting the techniques of deep learning and reinforcement learning, in this paper, we consider music playlist generation as a language modeling problem and solve it by the proposed attention language model with policy gradient. We develop a systematic and interactive approach so that the resulting playlists can be tuned flexibly according to user preferences. Considering a playlist as a sequence of words, we first train our attention RNN language model on baseline recommended playlists. By optimizing suitable imposed reward functions, the model is thus refined for corresponding preferences. The experimental results demonstrate that our approach not only generates coherent playlists automatically but is also able to flexibly recommend personalized playlists for diversity, novelty and freshness. |
Tasks | Language Modelling |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04214v1 |
http://arxiv.org/pdf/1809.04214v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-personalized-and-flexible-playlist |
Repo | |
Framework | |
Combining Model-Free Q-Ensembles and Model-Based Approaches for Informed Exploration
Title | Combining Model-Free Q-Ensembles and Model-Based Approaches for Informed Exploration |
Authors | Sreecharan Sankaranarayanan, Raghuram Mandyam Annasamy, Katia Sycara, Carolyn Penstein Rosé |
Abstract | Q-Ensembles are a model-free approach where input images are fed into different Q-networks and exploration is driven by the assumption that uncertainty is proportional to the variance of the output Q-values obtained. They have been shown to perform relatively well compared to other exploration strategies. Further, model-based approaches, such as encoder-decoder models have been used successfully for next frame prediction given previous frames. This paper proposes to integrate the model-free Q-ensembles and model-based approaches with the hope of compounding the benefits of both and achieving superior exploration as a result. Results show that a model-based trajectory memory approach when combined with Q-ensembles produces superior performance when compared to only using Q-ensembles. |
Tasks | |
Published | 2018-06-12 |
URL | http://arxiv.org/abs/1806.04552v1 |
http://arxiv.org/pdf/1806.04552v1.pdf | |
PWC | https://paperswithcode.com/paper/combining-model-free-q-ensembles-and-model |
Repo | |
Framework | |
Improved Techniques For Weakly-Supervised Object Localization
Title | Improved Techniques For Weakly-Supervised Object Localization |
Authors | Junsuk Choe, Joo Hyun Park, Hyunjung Shim |
Abstract | We propose an improved technique for weakly-supervised object localization. Conventional methods have a limitation that they focus only on most discriminative parts of the target objects. The recent study addressed this issue and resolved this limitation by augmenting the training data for less discriminative parts. To this end, we employ an effective data augmentation for improving the accuracy of the object localization. In addition, we introduce improved learning techniques by optimizing Convolutional Neural Networks (CNN) based on the state-of-the-art model. Based on extensive experiments, we evaluate the effectiveness of the proposed approach both qualitatively and quantitatively. Especially, we observe that our method improves the Top-1 localization accuracy by 21.4 - 37.3% depending on configurations, compared to the current state-of-the-art technique of the weakly-supervised object localization. |
Tasks | Data Augmentation, Object Localization, Weakly-Supervised Object Localization |
Published | 2018-02-22 |
URL | http://arxiv.org/abs/1802.07888v2 |
http://arxiv.org/pdf/1802.07888v2.pdf | |
PWC | https://paperswithcode.com/paper/improved-techniques-for-weakly-supervised |
Repo | |
Framework | |
Guided Dropout
Title | Guided Dropout |
Authors | Rohit Keshari, Richa Singh, Mayank Vatsa |
Abstract | Dropout is often used in deep neural networks to prevent over-fitting. Conventionally, dropout training invokes \textit{random drop} of nodes from the hidden layers of a Neural Network. It is our hypothesis that a guided selection of nodes for intelligent dropout can lead to better generalization as compared to the traditional dropout. In this research, we propose “guided dropout” for training deep neural network which drop nodes by measuring the strength of each node. We also demonstrate that conventional dropout is a specific case of the proposed guided dropout. Experimental evaluation on multiple datasets including MNIST, CIFAR10, CIFAR100, SVHN, and Tiny ImageNet demonstrate the efficacy of the proposed guided dropout. |
Tasks | |
Published | 2018-12-10 |
URL | http://arxiv.org/abs/1812.03965v1 |
http://arxiv.org/pdf/1812.03965v1.pdf | |
PWC | https://paperswithcode.com/paper/guided-dropout |
Repo | |
Framework | |
Bridging Knowledge Gaps in Neural Entailment via Symbolic Models
Title | Bridging Knowledge Gaps in Neural Entailment via Symbolic Models |
Authors | Dongyeop Kang, Tushar Khot, Ashish Sabharwal, Peter Clark |
Abstract | Most textual entailment models focus on lexical gaps between the premise text and the hypothesis, but rarely on knowledge gaps. We focus on filling these knowledge gaps in the Science Entailment task, by leveraging an external structured knowledge base (KB) of science facts. Our new architecture combines standard neural entailment models with a knowledge lookup module. To facilitate this lookup, we propose a fact-level decomposition of the hypothesis, and verifying the resulting sub-facts against both the textual premise and the structured KB. Our model, NSnet, learns to aggregate predictions from these heterogeneous data formats. On the SciTail dataset, NSnet outperforms a simpler combination of the two predictions by 3% and the base entailment model by 5%. |
Tasks | Natural Language Inference |
Published | 2018-08-28 |
URL | http://arxiv.org/abs/1808.09333v2 |
http://arxiv.org/pdf/1808.09333v2.pdf | |
PWC | https://paperswithcode.com/paper/bridging-knowledge-gaps-in-neural-entailment |
Repo | |
Framework | |
Physical Primitive Decomposition
Title | Physical Primitive Decomposition |
Authors | Zhijian Liu, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu |
Abstract | Objects are made of parts, each with distinct geometry, physics, functionality, and affordances. Developing such a distributed, physical, interpretable representation of objects will facilitate intelligent agents to better explore and interact with the world. In this paper, we study physical primitive decomposition—understanding an object through its components, each with physical and geometric attributes. As annotated data for object parts and physics are rare, we propose a novel formulation that learns physical primitives by explaining both an object’s appearance and its behaviors in physical events. Our model performs well on block towers and tools in both synthetic and real scenarios; we also demonstrate that visual and physical observations often provide complementary signals. We further present ablation and behavioral studies to better understand our model and contrast it with human performance. |
Tasks | |
Published | 2018-09-13 |
URL | http://arxiv.org/abs/1809.05070v1 |
http://arxiv.org/pdf/1809.05070v1.pdf | |
PWC | https://paperswithcode.com/paper/physical-primitive-decomposition |
Repo | |
Framework | |
Gradient Descent Happens in a Tiny Subspace
Title | Gradient Descent Happens in a Tiny Subspace |
Authors | Guy Gur-Ari, Daniel A. Roberts, Ethan Dyer |
Abstract | We show that in a variety of large-scale deep learning scenarios the gradient dynamically converges to a very small subspace after a short period of training. The subspace is spanned by a few top eigenvectors of the Hessian (equal to the number of classes in the dataset), and is mostly preserved over long periods of training. A simple argument then suggests that gradient descent may happen mostly in this subspace. We give an example of this effect in a solvable model of classification, and we comment on possible implications for optimization and learning. |
Tasks | |
Published | 2018-12-12 |
URL | http://arxiv.org/abs/1812.04754v1 |
http://arxiv.org/pdf/1812.04754v1.pdf | |
PWC | https://paperswithcode.com/paper/gradient-descent-happens-in-a-tiny-subspace |
Repo | |
Framework | |
Iterative Attention Mining for Weakly Supervised Thoracic Disease Pattern Localization in Chest X-Rays
Title | Iterative Attention Mining for Weakly Supervised Thoracic Disease Pattern Localization in Chest X-Rays |
Authors | Jinzheng Cai, Le Lu, Adam P. Harrison, Xiaoshuang Shi, Pingjun Chen, Lin Yang |
Abstract | Given image labels as the only supervisory signal, we focus on harvesting, or mining, thoracic disease localizations from chest X-ray images. Harvesting such localizations from existing datasets allows for the creation of improved data sources for computer-aided diagnosis and retrospective analyses. We train a convolutional neural network (CNN) for image classification and propose an attention mining (AM) strategy to improve the model’s sensitivity or saliency to disease patterns. The intuition of AM is that once the most salient disease area is blocked or hidden from the CNN model, it will pay attention to alternative image regions, while still attempting to make correct predictions. However, the model requires to be properly constrained during AM, otherwise, it may overfit to uncorrelated image parts and forget the valuable knowledge that it has learned from the original image classification task. To alleviate such side effects, we then design a knowledge preservation (KP) loss, which minimizes the discrepancy between responses for X-ray images from the original and the updated networks. Furthermore, we modify the CNN model to include multi-scale aggregation (MSA), improving its localization ability on small-scale disease findings, e.g., lung nodules. We experimentally validate our method on the publicly-available ChestX-ray14 dataset, outperforming a class activation map (CAM)-based approach, and demonstrating the value of our novel framework for mining disease locations. |
Tasks | Image Classification |
Published | 2018-07-03 |
URL | http://arxiv.org/abs/1807.00958v1 |
http://arxiv.org/pdf/1807.00958v1.pdf | |
PWC | https://paperswithcode.com/paper/iterative-attention-mining-for-weakly |
Repo | |
Framework | |