April 1, 2020

3012 words 15 mins read

Paper Group NANR 12

Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation. Measuring Calibration in Deep Learning. Partial Simulation for Imitation Learning. Semi-supervised 3D Face Reconstruction with Nonlinear Disentangled Representations. iWGAN: an Autoencoder WGAN for Inference. A Syntax-Aware Approach for Unsupervised Text Style Tra …

Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation


Title	Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation
Authors	Anonymous
Abstract	Natural question generation (QG) aims to generate questions from a passage and an answer. Previous works on QG either (i) ignore the rich structure information hidden in text, (ii) solely rely on cross-entropy loss that leads to issues like exposure bias and inconsistency between train/test measurement, or (iii) fail to fully exploit the answer information. To address these limitations, in this paper, we propose a reinforcement learning (RL) based graph-to sequence (Graph2Seq) model for QG. Our model consists of a Graph2Seq generator with a novel Bidirectional Gated Graph Neural Network based encoder to embed the passage, and a hybrid evaluator with a mixed objective function that combines both the cross-entropy and RL loss to ensure the generation of syntactically and semantically valid text. We also introduce an effective Deep Alignment Network for incorporating the answer information into the passage at both the word and contextual level. Our model is end-to-end trainable and achieves new state-of-the-art scores, outperforming existing methods by a significant margin on the standard SQuAD benchmark for QG.
Tasks	Graph-to-Sequence, Question Generation
Published	2020-01-01
URL	https://openreview.net/forum?id=HygnDhEtvr
PDF	https://openreview.net/pdf?id=HygnDhEtvr
PWC	https://paperswithcode.com/paper/reinforcement-learning-based-graph-to-1
Repo
Framework

Measuring Calibration in Deep Learning


Title	Measuring Calibration in Deep Learning
Authors	Anonymous
Abstract	Overconfidence and underconfidence in machine learning classifiers is measured by calibration: the degree to which the probabilities predicted for each class match the accuracy of the classifier on that prediction. We propose two new measures for calibration, the Static Calibration Error (SCE) and Adaptive Calibration Error (ACE). These measures take into account every prediction made by a model, in contrast to the popular Expected Calibration Error.
Tasks	Calibration
Published	2020-01-01
URL	https://openreview.net/forum?id=r1la7krKPS
PDF	https://openreview.net/pdf?id=r1la7krKPS
PWC	https://paperswithcode.com/paper/measuring-calibration-in-deep-learning-1
Repo
Framework

Partial Simulation for Imitation Learning


Title	Partial Simulation for Imitation Learning
Authors	Anonymous
Abstract	Model-based imitation learning methods require full knowledge of the transition kernel for policy evaluation. In this work, we introduce the Expert Induced Markov Decision Process (eMDP) model as a formulation of solving imitation problems using Reinforcement Learning (RL), when only partial knowledge about the transition kernel is available. The idea of eMDP is to replace the unknown transition kernel with a synthetic kernel that: a) simulate the transition of state components for which the transition kernel is known (s_r), and b) extract from demonstrations the state components for which the kernel is unknown (s_u). The next state is then stitched from the two components: s={s_r,s_u}. We describe in detail the recipe for building an eMDP and analyze the errors caused by its synthetic kernel. Our experiments include imitation tasks in multiplayer games, where the agent has to imitate one expert in the presence of other experts for whom we cannot provide a transition model. We show that combining a policy gradient algorithm with our model achieves superior performance compared to the simulation-free alternative.
Tasks	Imitation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=SJe_D1SYvr
PDF	https://openreview.net/pdf?id=SJe_D1SYvr
PWC	https://paperswithcode.com/paper/partial-simulation-for-imitation-learning
Repo
Framework

Semi-supervised 3D Face Reconstruction with Nonlinear Disentangled Representations


Title	Semi-supervised 3D Face Reconstruction with Nonlinear Disentangled Representations
Authors	Anonymous
Abstract	Recovering 3D geometry shape, albedo and lighting from a single image has wide applications in many areas, which is also a typical ill-posed problem. In order to eliminate the ambiguity, face prior knowledge like linear 3D morphable models (3DMM) learned from limited scan data are often adopted to the reconstruction process. However, methods based on linear parametric models cannot generalize well for facial images in the wild with various ages, ethnicity, expressions, poses, and lightings. Recent methods aim to learn a nonlinear parametric model using convolutional neural networks (CNN) to regress the face shape and texture directly. However, the models were only trained on a dataset that is generated from a linear 3DMM. Moreover, the identity and expression representations are entangled in these models, which hurdles many facial editing applications. In this paper, we train our model with adversarial loss in a semi-supervised manner on hybrid batches of unlabeled and labeled face images to exploit the value of large amounts of unlabeled face images from unconstrained photo collections. A novel center loss is introduced to make sure that different facial images from the same person have the same identity shape and albedo. Besides, our proposed model disentangles identity, expression, pose, and lighting representations, which improves the overall reconstruction performance and facilitates facial editing applications, e.g., expression transfer. Comprehensive experiments demonstrate that our model produces high-quality reconstruction compared to state-of-the-art methods and is robust to various expression, pose, and lighting conditions.
Tasks	3D Face Reconstruction, Face Reconstruction
Published	2020-01-01
URL	https://openreview.net/forum?id=H1lK5kBKvr
PDF	https://openreview.net/pdf?id=H1lK5kBKvr
PWC	https://paperswithcode.com/paper/semi-supervised-3d-face-reconstruction-with
Repo
Framework

iWGAN: an Autoencoder WGAN for Inference


Title	iWGAN: an Autoencoder WGAN for Inference
Authors	Anonymous
Abstract	Generative Adversarial Networks (GANs) have been impactful on many problems and applications but suffer from unstable training. Wasserstein GAN (WGAN) leverages the Wasserstein distance to avoid the caveats in the minmax two-player training of GANs but has other defects such as mode collapse and lack of metric to detect the convergence. We introduce a novel inference WGAN (iWGAN) model, which is a principled framework to fuse auto-encoders and WGANs. The iWGAN jointly learns an encoder network and a generative network using an iterative primal dual optimization process. We establish the generalization error bound of iWGANs. We further provide a rigorous probabilistic interpretation of our model under the framework of maximum likelihood estimation. The iWGAN, with a clear stopping criteria, has many advantages over other autoencoder GANs. The empirical experiments show that our model greatly mitigates the symptom of mode collapse, speeds up the convergence, and is able to provide a measurement of quality check for each individual sample. We illustrate the ability of iWGANs by obtaining a competitive and stable performance with state-of-the-art for benchmark datasets.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HJg6VREFDH
PDF	https://openreview.net/pdf?id=HJg6VREFDH
PWC	https://paperswithcode.com/paper/iwgan-an-autoencoder-wgan-for-inference
Repo
Framework

A Syntax-Aware Approach for Unsupervised Text Style Transfer


Title	A Syntax-Aware Approach for Unsupervised Text Style Transfer
Authors	Anonymous
Abstract	Unsupervised text style transfer aims to rewrite the text of a source style into a target style while preserving the style-independent content, without parallel training corpus. Most of the existing methods address the problem by only leveraging the surface forms of words. In this paper, we incorporate the syntactic knowledge and propose a multi-task learning based Syntax-Aware Style Transfer (SAST) model. Our SAST jointly learns to generate a transferred output with aligned words and syntactic labels, where the alignment between the words and syntactic labels is enforced with a consistency constraint. The auxiliary syntactic label generation task regularizes the model to form more generalized representations, which is a desirable property especially in unsupervised tasks. Experimental results on two benchmark datasets for text style transfer demonstrate the effectiveness of the proposed method in terms of transfer accuracy, content preservation, and fluency.
Tasks	Multi-Task Learning, Style Transfer, Text Style Transfer
Published	2020-01-01
URL	https://openreview.net/forum?id=Bkll_kHFPB
PDF	https://openreview.net/pdf?id=Bkll_kHFPB
PWC	https://paperswithcode.com/paper/a-syntax-aware-approach-for-unsupervised-text
Repo
Framework

Superbloom: Bloom filter meets Transformer


Title	Superbloom: Bloom filter meets Transformer
Authors	Anonymous
Abstract	We extend the idea of word pieces in natural language models to machine learning tasks on opaque ids. This is achieved by applying hash functions to map each id to multiple hash tokens in a much smaller space, similarly to a Bloom filter. We show that by applying a multi-layer Transformer to these Bloom filter digests, we are able to obtain models with high accuracy. They outperform models of a similar size without hashing and, to a large degree, models of a much larger size trained using sampled softmax with the same computational budget. Our key observation is that it is important to use a multi-layer Transformer for Bloom filter digests to remove ambiguity in the hashed input. We believe this provides an alternative method to solving problems with large vocabulary size.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJxy5A4twS
PDF	https://openreview.net/pdf?id=SJxy5A4twS
PWC	https://paperswithcode.com/paper/superbloom-bloom-filter-meets-transformer
Repo
Framework

The Effect of Neural Net Architecture on Gradient Confusion & Training Performance


Title	The Effect of Neural Net Architecture on Gradient Confusion & Training Performance
Authors	Anonymous
Abstract	The goal of this paper is to study why typical neural networks train so fast, and how neural network architecture affects the speed of training. We introduce a simple concept called gradient confusion to help formally analyze this. When confusion is high, stochastic gradients produced by different data samples may be negatively correlated, slowing down convergence. But when gradient confusion is low, data samples interact harmoniously, and training proceeds quickly. Through novel theoretical and experimental results, we show how the neural net architecture affects gradient confusion, and thus the efficiency of training. We show that increasing the width of neural networks leads to lower gradient confusion, and thus easier model training. On the other hand, increasing the depth of neural networks has the opposite effect. Finally, we observe empirically that techniques like batch normalization and skip connections reduce gradient confusion, which helps reduce the training burden of very deep networks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=r1xNJ0NYDH
PDF	https://openreview.net/pdf?id=r1xNJ0NYDH
PWC	https://paperswithcode.com/paper/the-effect-of-neural-net-architecture-on
Repo
Framework

Efficient Saliency Maps for Explainable AI


Title	Efficient Saliency Maps for Explainable AI
Authors	Anonymous
Abstract	We describe an explainable AI saliency map method for use with deep convolutional neural networks (CNN) that is much more efficient than popular gradient methods. It is also quantitatively similar or better in accuracy. Our technique works by measuring information at the end of each network scale. This is then combined into a single saliency map. We describe how saliency measures can be made more efficient by exploiting Saliency Map Order Equivalence. Finally, we visualize individual scale/layer contributions by using a Layer Ordered Visualization of Information. This provides an interesting comparison of scale information contributions within the network not provided by other saliency map methods. Our method is generally straight forward and should be applicable to the most commonly used CNNs. (Full source code is available at http://www.anonymous.submission.com).
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=ryxf9CEKDr
PDF	https://openreview.net/pdf?id=ryxf9CEKDr
PWC	https://paperswithcode.com/paper/efficient-saliency-maps-for-explainable-ai
Repo
Framework


Title	Common sense and Semantic-Guided Navigation via Language in Embodied Environments
Authors	Anonymous
Abstract	One key element which differentiates humans from artificial agents in performing various tasks is that humans have access to common sense and semantic understanding, learnt from past experiences. In this work, we evaluate whether common sense and semantic understanding benefit an artificial agent when completing a room navigation task, wherein we ask the agent to navigate to a target room (e.g. ``go to the kitchen”), in a realistic 3D environment. We leverage semantic information and patterns observed during training to build the common sense which guides the agent to reach the target. We encourage semantic understanding within the agent by introducing grounding as an auxiliary task. We train and evaluate the agent in three settings: (i)~imitation learning using expert trajectories (ii)~reinforcement learning using Proximal Policy Optimization and (iii)~self-supervised imitation learning for fine-tuning the agent on unseen environments using auxiliary tasks. From our experiments, we observed that common sense helps the agent in long-term planning, while semantic understanding helps in short-term and local planning (such as guiding the agent when to stop). When combined, the agent generalizes better. Further, incorporating common sense and semantic understanding leads to 40% improvement in task success and 112% improvement in success per length (\textit{SPL}) over the baseline during imitation learning. Moreover, initial evidence suggests that the cross-modal embeddings learnt during training capture structural and positional patterns of the environment, implying that the agent inherently learns a map of the environment. It also suggests that navigation in multi-modal tasks leads to better semantic understanding. \|
Tasks	Common Sense Reasoning, Imitation Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=Bkx5ceHFwH
PDF	https://openreview.net/pdf?id=Bkx5ceHFwH
PWC	https://paperswithcode.com/paper/common-sense-and-semantic-guided-navigation
Repo
Framework

Compositional Embeddings: Joint Perception and Comparison of Class Label Sets


Title	Compositional Embeddings: Joint Perception and Comparison of Class Label Sets
Authors	Anonymous
Abstract	We explore the idea of compositional set embeddings that can be used to infer not just a single class, but the set of classes associated with the input data (e.g., image, video, audio signal). This can be useful, for example, in multi-object detection in images, or multi-speaker diarization (one-shot learning) in audio. In particular, we devise and implement two novel models consisting of (1) an embedding function f trained jointly with a “composite” function g that computes set union opera- tions between the classes encoded in two embedding vectors; and (2) embedding f trained jointly with a “query” function h that computes whether the classes en- coded in one embedding subsume the classes encoded in another embedding. In contrast to prior work, these models must both perceive the classes associated with the input examples, and also encode the relationships between different class label sets. In experiments conducted on simulated data, OmniGlot, and COCO datasets, the proposed composite embedding models outperform baselines based on traditional embedding approaches.
Tasks	Object Detection, Omniglot, One-Shot Learning, Speaker Diarization
Published	2020-01-01
URL	https://openreview.net/forum?id=BJx-ZeSKDB
PDF	https://openreview.net/pdf?id=BJx-ZeSKDB
PWC	https://paperswithcode.com/paper/compositional-embeddings-joint-perception-and
Repo
Framework

Few-shot Learning by Focusing on Differences


Title	Few-shot Learning by Focusing on Differences
Authors	Anonymous
Abstract	Few-shot classification may involve differentiating data that belongs to a different level of labels granularity. Compounded by the fact that the number of available labeled examples are scarce in the novel classification set, relying solely on the loss function to implicitly guide the classifier to separate data based on its label might not be enough; few-shot classifier needs to be very biased to perform well. In this paper, we propose a model that incorporates a simple prior: focusing on differences by building a dissimilar set of class representations. The model treats a class representation as a vector and removes its component that is shared among closely related class representatives. It does so through the combination of learned attention and vector orthogonalization. Our model works well on our newly introduced dataset, Hierarchical-CIFAR, that contains different level of labels granularity. It also substantially improved the performance on fine-grained classification dataset, CUB; whereas staying competitive on standard benchmarks such as mini-Imagenet, Omniglot, and few-shot dataset derived from CIFAR.
Tasks	Few-Shot Learning, Omniglot
Published	2020-01-01
URL	https://openreview.net/forum?id=B1xwv1StvS
PDF	https://openreview.net/pdf?id=B1xwv1StvS
PWC	https://paperswithcode.com/paper/few-shot-learning-by-focusing-on-differences
Repo
Framework

Role of two learning rates in convergence of model-agnostic meta-learning


Title	Role of two learning rates in convergence of model-agnostic meta-learning
Authors	Anonymous
Abstract	Model-agnostic meta-learning (MAML) is known as a powerful meta-learning method. However, MAML is notorious for being hard to train because of the existence of two learning rates. Therefore, in this paper, we derive the conditions that inner learning rate $\alpha$ and meta-learning rate $\beta$ must satisfy for MAML to converge to minima with some simplifications. We find that the upper bound of $\beta$ depends on $ \alpha$, in contrast to the case of using the normal gradient descent method. Moreover, we show that the threshold of $\beta$ increases as $\alpha$ approaches its own upper bound. This result is verified by experiments on various few-shot tasks and architectures; specifically, we perform sinusoid regression and classification of Omniglot and MiniImagenet datasets with a multilayer perceptron and a convolutional neural network. Based on this outcome, we present a guideline for determining the learning rates: first, search for the largest possible $\alpha$; next, tune $\beta$ based on the chosen value of $\alpha$.
Tasks	Meta-Learning, Omniglot
Published	2020-01-01
URL	https://openreview.net/forum?id=r1e8qpVKPS
PDF	https://openreview.net/pdf?id=r1e8qpVKPS
PWC	https://paperswithcode.com/paper/role-of-two-learning-rates-in-convergence-of
Repo
Framework

Unsupervised Few Shot Learning via Self-supervised Training


Title	Unsupervised Few Shot Learning via Self-supervised Training
Authors	Anonymous
Abstract	Learning from limited exemplars (few-shot learning) is a fundamental, unsolved problem that has been laboriously explored in the machine learning community. However, current few-shot learners are mostly supervised and rely heavily on a large amount of labeled examples. Unsupervised learning is a more natural procedure for cognitive mammals and has produced promising results in many machine learning tasks. In the current study, we develop a method to learn an unsupervised few-shot learner via self-supervised training (UFLST), which can effectively generalize to novel but related classes. The proposed model consists of two alternate processes, progressive clustering and episodic training. The former generates pseudo-labeled training examples for constructing episodic tasks; and the later trains the few-shot learner using the generated episodic tasks which further optimizes the feature representations of data. The two processes facilitate with each other, and eventually produce a high quality few-shot learner. Using the benchmark dataset Omniglot, we show that our model outperforms other unsupervised few-shot learning methods to a large extend and approaches to the performances of supervised methods. Using the benchmark dataset Market1501, we further demonstrate the feasibility of our model to a real-world application on person re-identification.
Tasks	Few-Shot Learning, Omniglot, Person Re-Identification
Published	2020-01-01
URL	https://openreview.net/forum?id=Ske-ih4FPS
PDF	https://openreview.net/pdf?id=Ske-ih4FPS
PWC	https://paperswithcode.com/paper/unsupervised-few-shot-learning-via-self
Repo
Framework

Gradientless Descent: High-Dimensional Zeroth-Order Optimization


Title	Gradientless Descent: High-Dimensional Zeroth-Order Optimization
Authors	Anonymous
Abstract	Zeroth-order optimization is the process of minimizing an objective $f(x)$, given oracle access to evaluations at adaptively chosen inputs $x$. In this paper, we present two simple yet powerful GradientLess Descent (GLD) algorithms that do not rely on an underlying gradient estimate and are numerically stable. We analyze our algorithm from a novel geometric perspective and we show that for {\it any monotone transform} of a smooth and strongly convex objective with latent dimension $k \ge n$, we present a novel analysis that shows convergence within an $\epsilon$-ball of the optimum in $O(kQ\log(n)\log(R/\epsilon))$ evaluations, where the input dimension is $n$, $R$ is the diameter of the input space and $Q$ is the condition number. Our rates are the first of its kind to be both 1) poly-logarithmically dependent on dimensionality and 2) invariant under monotone transformations. We further leverage our geometric perspective to show that our analysis is optimal. Both monotone invariance and its ability to utilize a low latent dimensionality are key to the empirical success of our algorithms, as demonstrated on synthetic and MuJoCo benchmarks.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Skep6TVYDB
PDF	https://openreview.net/pdf?id=Skep6TVYDB
PWC	https://paperswithcode.com/paper/gradientless-descent-high-dimensional-zeroth
Repo
Framework