April 1, 2020

2873 words 14 mins read

Paper Group NANR 58

Incremental RNN: A Dynamical View.. Image-guided Neural Object Rendering. SGD Learns One-Layer Networks in WGANs. Geom-GCN: Geometric Graph Convolutional Networks. AutoLR: A Method for Automatic Tuning of Learning Rate. Deep Imitative Models for Flexible Inference, Planning, and Control. Large-scale Pretraining for Neural Machine Translation with T …

Incremental RNN: A Dynamical View.


Title	Incremental RNN: A Dynamical View.
Authors	Anonymous
Abstract	Recurrent neural networks (RNNs) are particularly well-suited for modeling long-term dependencies in sequential data, but are notoriously hard to train because the error backpropagated in time either vanishes or explodes at an exponential rate. While a number of works attempt to mitigate this effect through gated recurrent units, skip-connections, parametric constraints and design choices, we propose a novel incremental RNN (iRNN), where hidden state vectors keep track of incremental changes, and as such approximate state-vector increments of Rosenblatt’s (1962) continuous-time RNNs. iRNN exhibits identity gradients and is able to account for long-term dependencies (LTD). We show that our method is computationally efficient overcoming overheads of many existing methods that attempt to improve RNN training, while suffering no performance degradation. We demonstrate the utility of our approach with extensive experiments and show competitive performance against standard LSTMs on LTD and other non-LTD tasks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HylpqA4FwS
PDF	https://openreview.net/pdf?id=HylpqA4FwS
PWC	https://paperswithcode.com/paper/incremental-rnn-a-dynamical-view
Repo
Framework

Image-guided Neural Object Rendering


Title	Image-guided Neural Object Rendering
Authors	Justus Thies, Michael Zollhöfer, Christian Theobalt, Marc Stamminger, Matthias Nießner
Abstract	We propose a learned image-guided rendering technique that combines the benefits of image-based rendering and GAN-based image synthesis. The goal of our method is to generate photo-realistic re-renderings of reconstructed objects for virtual and augmented reality applications (e.g., virtual showrooms, virtual tours and sightseeing, the digital inspection of historical artifacts). A core component of our work is the handling of view-dependent effects. Specifically, we directly train an object-specific deep neural network to synthesize the view-dependent appearance of an object. As input data we are using an RGB video of the object. This video is used to reconstruct a proxy geometry of the object via multi-view stereo. Based on this 3D proxy, the appearance of a captured view can be warped into a new target view as in classical image-based rendering. This warping assumes diffuse surfaces, in case of view-dependent effects, such as specular highlights, it leads to artifacts. To this end, we propose EffectsNet, a deep neural network that predicts view-dependent effects. Based on these estimations, we are able to convert observed images to diffuse images. These diffuse images can be projected into other views. In the target view, our pipeline reinserts the new view-dependent effects. To composite multiple reprojected images to a final output, we learn a composition network that outputs photo-realistic results. Using this image-guided approach, the network does not have to allocate capacity on ``remembering’’ object appearance, instead it learns how to combine the appearance of captured images. We demonstrate the effectiveness of our approach both qualitatively and quantitatively on synthetic as well as on real data. \|
Tasks	Image Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=Hyg9anEFPS
PDF	https://openreview.net/pdf?id=Hyg9anEFPS
PWC	https://paperswithcode.com/paper/image-guided-neural-object-rendering
Repo
Framework

SGD Learns One-Layer Networks in WGANs


Title	SGD Learns One-Layer Networks in WGANs
Authors	Anonymous
Abstract	Generative adversarial networks (GANs) are a widely used framework for learning generative models. Wasserstein GANs (WGANs), one of the most successful variants of GANs, require solving a minmax problem to global optimality, but in practice, are successfully trained with stochastic gradient descent-ascent. In this paper, we show that, when the generator is a one-layer network, stochastic gradient descent-ascent converges to a global solution in polynomial time and sample complexity.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJePwgSYwB
PDF	https://openreview.net/pdf?id=rJePwgSYwB
PWC	https://paperswithcode.com/paper/sgd-learns-one-layer-networks-in-wgans
Repo
Framework

Geom-GCN: Geometric Graph Convolutional Networks


Title	Geom-GCN: Geometric Graph Convolutional Networks
Authors	Anonymous
Abstract	Message-passing neural networks (MPNNs) have been successfully applied in a wide variety of applications in the real world. However, two fundamental weaknesses of MPNNs’ aggregators limit their ability to represent graph-structured data: losing the structural information of nodes in neighborhoods and lacking the ability to capture long-range dependencies in disassortative graphs. Few studies have noticed the weaknesses from different perspectives. From the observations on classical neural network and network geometry, we propose a novel geometric aggregation scheme for graph neural networks to overcome the two weaknesses. The behind basic idea is the aggregation on a graph can benefit from a continuous space underlying the graph. The proposed aggregation scheme is permutation-invariant and consists of three modules, node embedding, structural neighborhood, and bi-level aggregation. We also present an implementation of the scheme in graph convolutional networks, termed Geom-GCN, to perform transductive learning on graphs. Experimental results show the proposed Geom-GCN achieved state-of-the-art performance on a wide range of open datasets of graphs.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=S1e2agrFvS
PDF	https://openreview.net/pdf?id=S1e2agrFvS
PWC	https://paperswithcode.com/paper/geom-gcn-geometric-graph-convolutional
Repo
Framework

AutoLR: A Method for Automatic Tuning of Learning Rate


Title	AutoLR: A Method for Automatic Tuning of Learning Rate
Authors	Anonymous
Abstract	One very important hyperparameter for training deep neural networks is the learning rate of the optimizer. The choice of learning rate schedule determines the computational cost of getting close to a minima, how close you actually get to the minima, and most importantly the kind of local minima (wide/narrow) attained. The kind of minima attained has a significant impact on the generalization accuracy of the network. Current systems employ hand tuned learning rate schedules, which are painstakingly tuned for each network and dataset. Given that the state space of schedules is huge, finding a satisfactory learning rate schedule can be very time consuming. In this paper, we present AutoLR, a method for auto-tuning the learning rate as training proceeds. Our method works with any optimizer, and we demonstrate results on SGD, Momentum, and Adam optimizers. We extensively evaluate AutoLR on multiple datasets, models, and across multiple optimizers. We compare favorably against state of the art learning rate schedules for the given dataset and models, including for ImageNet on Resnet-50, Cifar-10 on Resnet-18, and SQuAD fine-tuning on BERT. For example, AutoLR achieves an EM score of 81.2 on SQuAD v1.1 with BERT_BASE compared to 80.8 reported in (Devlin et al. (2018)) by just auto-tuning the learning rate schedule. To the best of our knowledge, this is the first automatic learning rate tuning scheme to achieve state of the art generalization accuracy on these datasets with the given models.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SkgtbaVYvH
PDF	https://openreview.net/pdf?id=SkgtbaVYvH
PWC	https://paperswithcode.com/paper/autolr-a-method-for-automatic-tuning-of
Repo
Framework

Deep Imitative Models for Flexible Inference, Planning, and Control


Title	Deep Imitative Models for Flexible Inference, Planning, and Control
Authors	Anonymous
Abstract	Imitation Learning (IL) is an appealing approach to learn desirable autonomous behavior. However, directing IL to achieve arbitrary goals is difficult. In contrast, planning-based algorithms use dynamics models and reward functions to achieve goals. Yet, reward functions that evoke desirable behavior are often difficult to specify. In this paper, we propose “Imitative Models” to combine the benefits of IL and goal-directed planning. Imitative Models are probabilistic predictive models of desirable behavior able to plan interpretable expert-like trajectories to achieve specified goals. We derive families of flexible goal objectives, including constrained goal regions, unconstrained goal sets, and energy-based goals. We show that our method can use these objectives to successfully direct behavior. Our method substantially outperforms six IL approaches and a planning-based approach in a dynamic simulated autonomous driving task, and is efficiently learned from expert demonstrations without online data collection. We also show our approach is robust to poorly-specified goals, such as goals on the wrong side of the road.
Tasks	Autonomous Driving, Imitation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=Skl4mRNYDr
PDF	https://openreview.net/pdf?id=Skl4mRNYDr
PWC	https://paperswithcode.com/paper/deep-imitative-models-for-flexible-inference-2
Repo
Framework

Large-scale Pretraining for Neural Machine Translation with Tens of Billions of Sentence Pairs


Title	Large-scale Pretraining for Neural Machine Translation with Tens of Billions of Sentence Pairs
Authors	Anonymous
Abstract	In this paper, we investigate the problem of training neural machine translation (NMT) systems with a dataset of more than 40 billion bilingual sentence pairs, which is larger than the largest dataset to date by orders of magnitude. Unprecedented challenges emerge in this situation compared to previous NMT work, including severe noise in the data and prohibitively long training time. We propose practical solutions to handle these issues and demonstrate that large-scale pretraining significantly improves NMT performance. We are able to push the BLEU score of WMT17 Chinese-English dataset to 32.3, with a significant performance boost of +3.2 over existing state-of-the-art results.
Tasks	Machine Translation
Published	2020-01-01
URL	https://openreview.net/forum?id=Bkl8YR4YDB
PDF	https://openreview.net/pdf?id=Bkl8YR4YDB
PWC	https://paperswithcode.com/paper/large-scale-pretraining-for-neural-machine-1
Repo
Framework

Regularizing Deep Multi-Task Networks using Orthogonal Gradients


Title	Regularizing Deep Multi-Task Networks using Orthogonal Gradients
Authors	Anonymous
Abstract	Deep neural networks are a promising approach towards multi-task learning because of their capability to leverage knowledge across domains and learn general purpose representations. Nevertheless, they can fail to live up to these promises as tasks often compete for a model’s limited resources, potentially leading to lower overall performance. In this work we tackle the issue of interfering tasks through a comprehensive analysis of their training, derived from looking at the interaction between gradients within their shared parameters. Our empirical results show that well-performing models have low variance in the angles between task gradients and that popular regularization methods implicitly reduce this measure. Based on this observation, we propose a novel gradient regularization term that minimizes task interference by enforcing near orthogonal gradients. Updating the shared parameters using this property encourages task specific decoders to optimize different parts of the feature extractor, thus reducing competition. We evaluate our method with classification and regression tasks on the multiDigitMNIST and NYUv2 dataset where we obtain competitive results. This work is a first step towards non-interfering multi-task optimization.
Tasks	Multi-Task Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=SJeXJANFPr
PDF	https://openreview.net/pdf?id=SJeXJANFPr
PWC	https://paperswithcode.com/paper/regularizing-deep-multi-task-networks-using
Repo
Framework

Adversarial Imitation Attack


Title	Adversarial Imitation Attack
Authors	Anonymous
Abstract	Deep learning models are known to be vulnerable to adversarial examples. A practical adversarial attack should require as little as possible knowledge of attacked models T. Current substitute attacks need pre-trained models to generate adversarial examples and their attack success rates heavily rely on the transferability of adversarial examples. Current score-based and decision-based attacks require lots of queries for the T. In this study, we propose a novel adversarial imitation attack. First, it produces a replica of the T by a two-player game like the generative adversarial networks (GANs). The objective of the generative model G is to generate examples which lead D returning different outputs with T. The objective of the discriminative model D is to output the same labels with T under the same inputs. Then, the adversarial examples generated by D are utilized to fool the T. Compared with the current substitute attacks, imitation attack can use less training data to produce a replica of T and improve the transferability of adversarial examples. Experiments demonstrate that our imitation attack requires less training data than the black-box substitute attacks, but achieves an attack success rate close to the white-box attack on unseen data with no query.
Tasks	Adversarial Attack
Published	2020-01-01
URL	https://openreview.net/forum?id=SJlVVAEKwS
PDF	https://openreview.net/pdf?id=SJlVVAEKwS
PWC	https://paperswithcode.com/paper/adversarial-imitation-attack
Repo
Framework

On the expected running time of nonconvex optimization with early stopping


Title	On the expected running time of nonconvex optimization with early stopping
Authors	Anonymous
Abstract	This work examines the convergence of stochastic gradient algorithms that use early stopping based on a validation function, wherein optimization ends when the magnitude of a validation function gradient drops below a threshold. We derive conditions that guarantee this stopping rule is well-defined and analyze the expected number of iterations and gradient evaluations needed to meet this criteria. The guarantee accounts for the distance between the training and validation sets, measured with the Wasserstein distance. We develop the approach for stochastic gradient descent (SGD), allowing for biased update directions subject to a Lyapunov condition. We apply the approach to obtain new bounds on the expected running time of several algorithms, including Decentralized SGD (DSGD), a variant of decentralized SGD, known as \textit{Stacked SGD}, and the stochastic variance reduced gradient (SVRG) algorithm. Finally, we consider the generalization properties of the iterate returned by early stopping.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SygkSkSFDB
PDF	https://openreview.net/pdf?id=SygkSkSFDB
PWC	https://paperswithcode.com/paper/on-the-expected-running-time-of-nonconvex
Repo
Framework

On Identifiability in Transformers


Title	On Identifiability in Transformers
Authors	Anonymous
Abstract	In this work we contribute towards a deeper understanding of the Transformer architecture by investigating two of its core components: self-attention and contextual embeddings. In particular, we study the identifiability of attention weights and token embeddings, and the aggregation of context into hidden tokens. We show that attention weights are not unique and propose effective attention as an alternative for better interpretability. Furthermore, we show that input tokens retain their identity in the first hidden layers and then progressively become less identifiable. We also provide evidence for the role of non-linear activations in preserving token identity. Finally, we demonstrate strong mixing of input information in the generation of contextual embeddings by means of a novel quantification method based on gradient attribution. Overall, we show that self-attention distributions are not directly interpretable and present tools to further investigate Transformer models.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BJg1f6EFDB
PDF	https://openreview.net/pdf?id=BJg1f6EFDB
PWC	https://paperswithcode.com/paper/on-identifiability-in-transformers
Repo
Framework

Improved Detection of Adversarial Attacks via Penetration Distortion Maximization


Title	Improved Detection of Adversarial Attacks via Penetration Distortion Maximization
Authors	Anonymous
Abstract	This paper is concerned with the defense of deep models against adversarial at- tacks. We develop an adversarial detection method, which is inspired by the cer- tificate defense approach, and captures the idea of separating class clusters in the embedding space so as to increase the margin. The resulting defense is intuitive, effective, scalable and can be integrated into any given neural classification model. Our method demonstrates state-of-the-art detection performance under all threat models.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJguRyBYvr
PDF	https://openreview.net/pdf?id=rJguRyBYvr
PWC	https://paperswithcode.com/paper/improved-detection-of-adversarial-attacks-via
Repo
Framework

Robust Reinforcement Learning for Continuous Control with Model Misspecification


Title	Robust Reinforcement Learning for Continuous Control with Model Misspecification
Authors	Anonymous
Abstract	We provide a framework for incorporating robustness – to perturbations in the transition dynamics which we refer to as model misspecification – into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on incorporating robustness into a state-of-the-art continuous control RL algorithm called Maximum a-posteriori Policy Optimization (MPO). We achieve this by learning a policy that optimizes for a worst case, entropy-regularized, expected return objective and derive a corresponding robust entropy-regularized Bellman contraction operator. In addition, we introduce a less conservative, soft-robust, entropy-regularized objective with a corresponding Bellman operator. We show that both, robust and soft-robust policies, outperform their non-robust counterparts in nine Mujoco domains with environment perturbations. In addition, we show improved robust performance on a challenging, simulated, dexterous robotic hand. Finally, we present multiple investigative experiments that provide a deeper insight into the robustness framework; including an adaptation to another continuous control RL algorithm. Performance videos can be found online at https://sites.google.com/view/robust-rl.
Tasks	Continuous Control
Published	2020-01-01
URL	https://openreview.net/forum?id=HJgC60EtwB
PDF	https://openreview.net/pdf?id=HJgC60EtwB
PWC	https://paperswithcode.com/paper/robust-reinforcement-learning-for-continuous-1
Repo
Framework

Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction


Title	Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction
Authors	Anonymous
Abstract	With the recent success and popularity of pre-trained language models (LMs) in natural language processing, there has been a rise in efforts to understand their inner workings. In line with such interest, we propose a novel method that assists us in investigating the extent to which pre-trained LMs capture the syntactic notion of constituency. Our method provides an effective way of extracting constituency trees from the pre-trained LMs without training. In addition, we report intriguing findings in the induced trees, including the fact that pre-trained LMs outperform other approaches in correctly demarcating adverb phrases in sentences.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1xPR3NtPB
PDF	https://openreview.net/pdf?id=H1xPR3NtPB
PWC	https://paperswithcode.com/paper/are-pre-trained-language-models-aware-of
Repo
Framework

Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints


Title	Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints
Authors	Anonymous
Abstract	In most practical settings and theoretical analyses, one assumes that a model can be trained until convergence. However, the growing complexity of machine learning datasets and models may violate such assumptions. Indeed, current approaches for hyper-parameter tuning and neural architecture search tend to be limited by practical resource constraints. Therefore, we introduce a formal setting for studying training under the non-asymptotic, resource-constrained regime, i.e., budgeted training. We analyze the following problem: “given a dataset, algorithm, and fixed resource budget, what is the best achievable performance?” We focus on the number of optimization iterations as the representative resource. Under such a setting, we show that it is critical to adjust the learning rate schedule according to the given budget. Among budget-aware learning schedules, we find simple linear decay to be both robust and high-performing. We support our claim through extensive experiments with state-of-the-art models on ImageNet (image classification), Kinetics (video classification), MS COCO (object detection and instance segmentation), and Cityscapes (semantic segmentation). We also analyze our results and find that the key to a good schedule is budgeted convergence, a phenomenon whereby the gradient vanishes at the end of each allowed budget. We also revisit existing approaches for fast convergence and show that budget-aware learning schedules readily outperform such approaches under (the practical but under-explored) budgeted training setting.
Tasks	Image Classification, Instance Segmentation, Neural Architecture Search, Object Detection, Semantic Segmentation, Video Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=HyxLRTVKPH
PDF	https://openreview.net/pdf?id=HyxLRTVKPH
PWC	https://paperswithcode.com/paper/budgeted-training-rethinking-deep-neural-1
Repo
Framework