Paper Group NANR 39
Projected Canonical Decomposition for Knowledge Base Completion. Learning a Spatio-Temporal Embedding for Video Instance Segmentation. A Stochastic Derivative Free Optimization Method with Momentum. Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks. Non-linear System Identification from Partial Observations via Iterative Smo …
Projected Canonical Decomposition for Knowledge Base Completion
Title | Projected Canonical Decomposition for Knowledge Base Completion |
Authors | Anonymous |
Abstract | The leading approaches to tensor completion and link prediction are based on the canonical polyadic (CP) decomposition of tensors. While these approaches were originally motivated by low rank approximations, the best performances are usually obtained for ranks as high as permitted by computation constraints. For large scale factorization problems where the factor dimensions have to be kept small, the performances of these approaches tend to drop drastically. The other main tensor factorization model, Tucker decomposition, is more flexible than CP for fixed factor dimensions, so we expect Tucker-based approaches to yield better performance under strong constraints on the number of parameters. However, as we show in this paper through experiments on standard benchmarks of link prediction in knowledge bases, ComplEx, a variant of CP, achieves similar performances to recent approaches based on Tucker decomposition on all operating points in terms of number of parameters. In a control experiment, we show that one problem in the practical application of Tucker decomposition to large-scale tensor completion comes from the adaptive optimization algorithms based on diagonal rescaling, such as Adagrad. We present a new algorithm for a constrained version of Tucker which implicitly applies Adagrad to a CP-based model with an additional projection of the embeddings onto a fixed lower dimensional subspace. The resulting Tucker-style extension of ComplEx obtains similar best performances as ComplEx, with substantial gains on some datasets under constraints on the number of parameters. |
Tasks | Knowledge Base Completion, Link Prediction |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ByeAK1BKPB |
https://openreview.net/pdf?id=ByeAK1BKPB | |
PWC | https://paperswithcode.com/paper/projected-canonical-decomposition-for |
Repo | |
Framework | |
Learning a Spatio-Temporal Embedding for Video Instance Segmentation
Title | Learning a Spatio-Temporal Embedding for Video Instance Segmentation |
Authors | Anonymous |
Abstract | Understanding object motion is one of the core problems in computer vision. It requires segmenting and tracking objects over time. Significant progress has been made in instance segmentation, but such models cannot track objects, and more crucially, they are unable to reason in both 3D space and time. We propose a new spatio-temporal embedding loss on videos that generates temporally consistent video instance segmentation. Our model includes a temporal network that learns to model temporal context and motion, which is essential to produce smooth embeddings over time. Further, our model also estimates monocular depth, with a self-supervised loss, as the relative distance to an object effectively constrains where it can be next, ensuring a time-consistent embedding. Finally, we show that our model can accurately track and segment instances, even with occlusions and missed detections, advancing the state-of-the-art on the KITTI Multi-Object and Tracking Dataset. |
Tasks | Instance Segmentation, Semantic Segmentation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyxTJxrtvr |
https://openreview.net/pdf?id=HyxTJxrtvr | |
PWC | https://paperswithcode.com/paper/learning-a-spatio-temporal-embedding-for |
Repo | |
Framework | |
A Stochastic Derivative Free Optimization Method with Momentum
Title | A Stochastic Derivative Free Optimization Method with Momentum |
Authors | Anonymous |
Abstract | We consider the problem of unconstrained minimization of a smooth objective function in $\mathbb{R}^d$ in setting where only function evaluations are possible. We propose and analyze stochastic zeroth-order method with heavy ball momentum. In particular, we propose, SMTP, a momentum version of the stochastic three-point method (STP) Bergou et al. (2019). We show new complexity results for non-convex, convex and strongly convex functions. We test our method on a collection of learning to continuous control tasks on several MuJoCo Todorov et al. (2012) environments with varying difficulty and compare against STP, other state-of-the-art derivative-free optimization algorithms and against policy gradient methods. SMTP significantly outperforms STP and all other methods that we considered in our numerical experiments. Our second contribution is SMTP with importance sampling which we call SMTP_IS. We provide convergence analysis of this method for non-convex, convex and strongly convex objectives. |
Tasks | Continuous Control, Policy Gradient Methods |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HylAoJSKvH |
https://openreview.net/pdf?id=HylAoJSKvH | |
PWC | https://paperswithcode.com/paper/a-stochastic-derivative-free-optimization |
Repo | |
Framework | |
Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks
Title | Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks |
Authors | Anonymous |
Abstract | High-performance Deep Neural Networks (DNNs) are increasingly deployed in many real-world applications e.g., cloud prediction APIs. Recent advances in model functionality stealing attacks via black-box access (i.e., inputs in, predictions out) threaten the business model of such applications, which require a lot of time, money, and effort to develop. Existing defenses take a passive role against stealing attacks, such as by truncating predicted information. We find such passive defenses ineffective against DNN stealing attacks. In this paper, we propose the first defense which actively perturbs predictions targeted at poisoning the training objective of the attacker. We find our defense effective across a wide range of challenging datasets and DNN model stealing attacks, and additionally outperforms existing defenses. Our defense is the first that can withstand highly accurate model stealing attacks for tens of thousands of queries, amplifying the attacker’s error rate up to a factor of 85$\times$ with minimal impact on the utility for benign users. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SyevYxHtDB |
https://openreview.net/pdf?id=SyevYxHtDB | |
PWC | https://paperswithcode.com/paper/prediction-poisoning-towards-defenses-against |
Repo | |
Framework | |
Non-linear System Identification from Partial Observations via Iterative Smoothing and Learning
Title | Non-linear System Identification from Partial Observations via Iterative Smoothing and Learning |
Authors | Anonymous |
Abstract | System identification is the process of building a mathematical model of an unknown system from measurements of its inputs and outputs. It is a key step for model-based control, estimator design, and output prediction. This work presents an algorithm for non-linear offline system identification from partial observations, i.e. situations in which the system’s full-state is not directly observable. The algorithm presented, called SISL, iteratively infers the system’s full state through non-linear optimization and then updates the model parameters. We test our algorithm on a simulated system of coupled Lorenz attractors, showing our algorithm’s ability to identify high-dimensional systems that prove intractable for particle-based approaches. We also use SISL to identify the dynamics of an aerobatic helicopter. By augmenting the state with unobserved fluid states, we learn a model that predicts the acceleration of the helicopter better than state-of-the-art approaches. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1gR3ANFPS |
https://openreview.net/pdf?id=B1gR3ANFPS | |
PWC | https://paperswithcode.com/paper/non-linear-system-identification-from-partial |
Repo | |
Framework | |
Continuous Graph Flow
Title | Continuous Graph Flow |
Authors | Anonymous |
Abstract | In this paper, we propose Continuous Graph Flow, a generative continuous flow based method that aims to model complex distributions of graph-structured data. Once learned, the model can be applied to an arbitrary graph, defining a probability density over the random variables represented by the graph. It is formulated as an ordinary differential equation system with shared and reusable functions that operate over the graphs. This leads to a new type of neural graph message passing scheme that performs continuous message passing over time. This class of models offers several advantages: a flexible representation that can generalize to variable data dimensions; ability to model dependencies in complex data distributions; reversible and memory-efficient; and exact and efficient computation of the likelihood of the data. We demonstrate the effectiveness of our model on a diverse set of generation tasks across different domains: graph generation, image puzzle generation, and layout generation from scene graphs. Our proposed model achieves significantly better performance compared to state-of-the-art models. |
Tasks | Graph Generation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkgZSCEtvr |
https://openreview.net/pdf?id=BkgZSCEtvr | |
PWC | https://paperswithcode.com/paper/continuous-graph-flow |
Repo | |
Framework | |
Multichannel Generative Language Models
Title | Multichannel Generative Language Models |
Authors | Anonymous |
Abstract | A channel corresponds to a viewpoint or transformation of an underlying meaning. A pair of parallel sentences in English and French express the same underlying meaning but through two separate channels corresponding to their languages. In this work, we present Multichannel Generative Language Models (MGLM), which models the joint distribution over multiple channels, and all its decompositions using a single neural network. MGLM can be trained by feeding it k way parallel-data, bilingual data, or monolingual data across pre-determined channels. MGLM is capable of both conditional generation and unconditional sampling. For conditional generation, the model is given a fully observed channel, and generates the k-1 channels in parallel. In the case of machine translation, this is akin to giving it one source, and the model generates k-1 targets. MGLM can also do partial conditional sampling, where the channels are seeded with prespecified words, and the model is asked to infill the rest. Finally, we can sample from MGLM unconditionally over all k channels. Our experiments on the Multi30K dataset containing English, French, Czech, and German languages suggest that the multitask training with the joint objective leads to improvements in bilingual translations. We provide a quantitative analysis of the quality-diversity trade-offs for different variants of the multichannel model for conditional generation, and a measurement of self-consistency during unconditional generation. We provide qualitative examples for parallel greedy decoding across languages and sampling from the joint distribution of the 4 languages. |
Tasks | Machine Translation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1xQNlBYPS |
https://openreview.net/pdf?id=r1xQNlBYPS | |
PWC | https://paperswithcode.com/paper/multichannel-generative-language-models |
Repo | |
Framework | |
Training Neural Networks for and by Interpolation
Title | Training Neural Networks for and by Interpolation |
Authors | Anonymous |
Abstract | In modern supervised learning, many deep neural networks are able to interpolate the data: the empirical loss can be driven to near zero on all samples simultaneously. In this work, we explicitly exploit this interpolation property for the design of a new optimization algorithm for deep learning. Specifically, we use it to compute an adaptive learning-rate in closed form at each iteration. This results in the Adaptive Learning-rates for Interpolation with Gradients (ALI-G) algorithm. ALI-G retains the main advantage of SGD which is a low computational cost per iteration. But unlike SGD, the learning-rate of ALI-G uses a single constant hyper-parameter and does not require a decay schedule, which makes it considerably easier to tune. We provide convergence guarantees of ALI-G in the stochastic convex setting. Notably, all our convergence results tackle the realistic case where the interpolation property is satisfied up to some tolerance. We provide experiments on a variety of architectures and tasks: (i) learning a differentiable neural computer; (ii) training a wide residual network on the SVHN data set; (iii) training a Bi-LSTM on the SNLI data set; and (iv) training wide residual networks and densely connected networks on the CIFAR data sets. ALI-G produces state-of-the-art results among adaptive methods, and even yields comparable performance with SGD, which requires manually tuned learning-rate schedules. Furthermore, ALI-G is simple to implement in any standard deep learning framework and can be used as a drop-in replacement in existing code. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJevJCVYvB |
https://openreview.net/pdf?id=BJevJCVYvB | |
PWC | https://paperswithcode.com/paper/training-neural-networks-for-and-by-1 |
Repo | |
Framework | |
Learning Neural Causal Models from Unknown Interventions
Title | Learning Neural Causal Models from Unknown Interventions |
Authors | Anonymous |
Abstract | Meta-learning over a set of distributions can be interpreted as learning different types of parameters corresponding to short-term vs long-term aspects of the mechanisms underlying the generation of data. These are respectively captured by quickly-changing \textit{parameters} and slowly-changing \textit{meta-parameters}. We present a new framework for meta-learning causal models where the relationship between each variable and its parents is modeled by a neural network, modulated by structural meta-parameters which capture the overall topology of a directed graphical model. Our approach avoids a discrete search over models in favour of a continuous optimization procedure. We study a setting where interventional distributions are induced as a result of a random intervention on a single unknown variable of an unknown ground truth causal model, and the observations arising after such an intervention constitute one meta-example. To disentangle the slow-changing aspects of each conditional from the fast-changing adaptations to each intervention, we parametrize the neural network into fast parameters and slow meta-parameters. We introduce a meta-learning objective that favours solutions \textit{robust} to frequent but sparse interventional distribution change, and which generalize well to previously unseen interventions. Optimizing this objective is shown experimentally to recover the structure of the causal graph. Finally, we find that when the learner is unaware of the intervention variable, it is able to infer that information, improving results further and focusing the parameter and meta-parameter updates where needed. |
Tasks | Meta-Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1gN6kSFwS |
https://openreview.net/pdf?id=H1gN6kSFwS | |
PWC | https://paperswithcode.com/paper/learning-neural-causal-models-from-unknown |
Repo | |
Framework | |
Sliced Cramer Synaptic Consolidation for Preserving Deeply Learned Representations
Title | Sliced Cramer Synaptic Consolidation for Preserving Deeply Learned Representations |
Authors | Anonymous |
Abstract | Deep neural networks suffer from the inability to preserve the learned data representation (i.e., catastrophic forgetting) in domains where the input data distribution is non-stationary, and it changes during training. Various selective synaptic plasticity approaches have been recently proposed to preserve network parameters, which are crucial for previously learned tasks while learning new tasks. We explore such selective synaptic plasticity approaches through a unifying lens of memory replay and show the close relationship between methods like Elastic Weight Consolidation (EWC) and Memory-Aware-Synapses (MAS). We then propose a fundamentally different class of preservation methods that aim at preserving the distribution of internal neural representations for previous tasks while learning a new one. We propose the sliced Cram'{e}r distance as a suitable choice for such preservation and evaluate our Sliced Cramer Preservation (SCP) algorithm through extensive empirical investigations on various network architectures in both supervised and unsupervised learning settings. We show that SCP consistently utilizes the learning capacity of the network better than online-EWC and MAS methods on various incremental learning tasks. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJge3TNKwH |
https://openreview.net/pdf?id=BJge3TNKwH | |
PWC | https://paperswithcode.com/paper/sliced-cramer-synaptic-consolidation-for |
Repo | |
Framework | |
Hierarchical Disentangle Network for Object Representation Learning
Title | Hierarchical Disentangle Network for Object Representation Learning |
Authors | Anonymous |
Abstract | An object can be described as the combination of primary visual attributes. Disentangling such underlying primitives is the long objective of representation learning. It is observed that categories have the natural multi-granularity or hierarchical characteristics, i.e. any two objects can share some common primitives in a particular category granularity while they may possess their unique ones in another granularity. However, previous works usually operate in a flat manner (i.e. in a particular granularity) to disentangle the representations of objects. Though they may obtain the primitives to constitute objects as the categories in that granularity, their results are obviously not efficient and complete. In this paper, we propose the hierarchical disentangle network (HDN) to exploit the rich hierarchical characteristics among categories to divide the disentangling process in a coarse-to-fine manner, such that each level only focuses on learning the specific representations in its granularity and finally the common and unique representations in all granularities jointly constitute the raw object. Specifically, HDN is designed based on an encoder-decoder architecture. To simultaneously ensure the disentanglement and interpretability of the encoded representations, a novel hierarchical generative adversarial network (GAN) is elaborately designed. Quantitative and qualitative evaluations on four object datasets validate the effectiveness of our method. |
Tasks | Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkg8xTEtvB |
https://openreview.net/pdf?id=rkg8xTEtvB | |
PWC | https://paperswithcode.com/paper/hierarchical-disentangle-network-for-object |
Repo | |
Framework | |
In-Domain Representation Learning For Remote Sensing
Title | In-Domain Representation Learning For Remote Sensing |
Authors | Anonymous |
Abstract | Given the importance of remote sensing, surprisingly little attention has been paid to it by the representation learning community. To address it and to speed up innovation in this domain, we provide simplified access to 5 diverse remote sensing datasets in a standardized form. We specifically explore in-domain representation learning and address the question of “what characteristics should a dataset have to be a good source for remote sensing representation learning”. The established baselines achieve state-of-the-art performance on these datasets. |
Tasks | Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJx_JAVKDB |
https://openreview.net/pdf?id=BJx_JAVKDB | |
PWC | https://paperswithcode.com/paper/in-domain-representation-learning-for-remote |
Repo | |
Framework | |
Knowledge Hypergraphs: Prediction Beyond Binary Relations
Title | Knowledge Hypergraphs: Prediction Beyond Binary Relations |
Authors | Anonymous |
Abstract | A Knowledge Hypergraph is a knowledge base where relations are defined on two or more entities. In this work, we introduce two embedding-based models that perform link prediction in knowledge hypergraphs: (1) HSimplE is a shift-based method that is inspired by an existing model operating on knowledge graphs, in which the representation of an entity is a function of its position in the relation, and (2) HypE is a convolution-based method which disentangles the representation of an entity from its position in the relation. We test our models on two new knowledge hypergraph datasets that we obtain from Freebase, and show that both HSimplE and HypE are more effective in predicting links in knowledge hypergraphs than the proposed baselines and existing methods. Our experiments show that HypE outperforms HSimplE when trained with fewer parameters and when tested on samples that contain at least one entity in a position never encountered during training. |
Tasks | Knowledge Graphs, Link Prediction |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryxIZR4tvS |
https://openreview.net/pdf?id=ryxIZR4tvS | |
PWC | https://paperswithcode.com/paper/knowledge-hypergraphs-prediction-beyond |
Repo | |
Framework | |
MaskConvNet: Training Efficient ConvNets from Scratch via Budget-constrained Filter Pruning
Title | MaskConvNet: Training Efficient ConvNets from Scratch via Budget-constrained Filter Pruning |
Authors | Raden Mu’az Mun’im, Jie Lin, Vijay Chandrasekhar, Koichi Shinoda |
Abstract | In this paper, we propose a framework, called MaskConvNet, for ConvNets filter pruning. MaskConvNet provides elegant support for training budget-aware pruned networks from scratch, by adding a simple mask module to a ConvNet architecture. MaskConvNet enjoys several advantages - (1) Flexible, the mask module can be integrated with any ConvNets in a plug-and-play manner. (2) Simple, the mask module is implemented by a hard Sigmoid function with a small number of trainable mask variables, adding negligible memory and computational overheads to the networks during training. (3) Effective, it is able to achieve competitive pruning rate while maintaining comparable accuracy with the baseline ConvNets without pruning, regardless of the datasets and ConvNet architectures used. (4) Fast, it is observed that the number of training epochs required by MaskConvNet is close to training a baseline without pruning. (5) Budget-aware, with a sparsity budget on target metric (e.g. model size and FLOP), MaskConvNet is able to train in a way that the optimizer can adaptively sparsify the network and automatically maintain sparsity level, till the pruned network produces good accuracy and fulfill the budget constraint simultaneously. Results on CIFAR-10 and ImageNet with several ConvNet architectures show that MaskConvNet works competitively well compared to previous pruning methods, with budget-constraint well respected. Code is available at https://www.dropbox.com/s/c4zi3n7h1bexl12/maskconv-iclr-code.zip?dl=0. We hope MaskConvNet, as a simple and general pruning framework, can address the gaps in existing literate and advance future studies to push the boundaries of neural network pruning. |
Tasks | Network Pruning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1gyl6Vtvr |
https://openreview.net/pdf?id=S1gyl6Vtvr | |
PWC | https://paperswithcode.com/paper/maskconvnet-training-efficient-convnets-from |
Repo | |
Framework | |
The Intriguing Effects of Focal Loss on the Calibration of Deep Neural Networks
Title | The Intriguing Effects of Focal Loss on the Calibration of Deep Neural Networks |
Authors | Anonymous |
Abstract | Miscalibration – a mismatch between a model’s confidence and its correctness – of Deep Neural Networks (DNNs) makes their predictions hard for downstream components to trust. Ideally, we want networks to be accurate, calibrated and confident. Temperature scaling, the most popular calibration approach, will calibrate a DNN without affecting its accuracy, but it will also make its correct predictions under-confident. In this paper, we show that replacing the widely used cross-entropy loss with focal loss allows us to learn models that are already very well calibrated. When combined with temperature scaling, focal loss, whilst preserving accuracy and yielding state-of-the-art calibrated models, also preserves the confidence of the model’s correct predictions, which is extremely desirable for downstream tasks. We provide a thorough analysis of the factors causing miscalibration, and use the insights we glean from this to theoretically justify the empirically excellent performance of focal loss. We perform extensive experiments on a variety of computer vision (CIFAR-10/100) and NLP (SST, 20 Newsgroup) datasets, and with a wide variety of different network architectures, and show that our approach achieves state-of-the-art accuracy and calibration in almost all cases. |
Tasks | Calibration |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJxTZeHFPH |
https://openreview.net/pdf?id=SJxTZeHFPH | |
PWC | https://paperswithcode.com/paper/the-intriguing-effects-of-focal-loss-on-the |
Repo | |
Framework | |