Paper Group NANR 12
Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation. Measuring Calibration in Deep Learning. Partial Simulation for Imitation Learning. Semi-supervised 3D Face Reconstruction with Nonlinear Disentangled Representations. iWGAN: an Autoencoder WGAN for Inference. A Syntax-Aware Approach for Unsupervised Text Style Tra …
Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation
Title | Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation |
Authors | Anonymous |
Abstract | Natural question generation (QG) aims to generate questions from a passage and an answer. Previous works on QG either (i) ignore the rich structure information hidden in text, (ii) solely rely on cross-entropy loss that leads to issues like exposure bias and inconsistency between train/test measurement, or (iii) fail to fully exploit the answer information. To address these limitations, in this paper, we propose a reinforcement learning (RL) based graph-to sequence (Graph2Seq) model for QG. Our model consists of a Graph2Seq generator with a novel Bidirectional Gated Graph Neural Network based encoder to embed the passage, and a hybrid evaluator with a mixed objective function that combines both the cross-entropy and RL loss to ensure the generation of syntactically and semantically valid text. We also introduce an effective Deep Alignment Network for incorporating the answer information into the passage at both the word and contextual level. Our model is end-to-end trainable and achieves new state-of-the-art scores, outperforming existing methods by a significant margin on the standard SQuAD benchmark for QG. |
Tasks | Graph-to-Sequence, Question Generation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HygnDhEtvr |
https://openreview.net/pdf?id=HygnDhEtvr | |
PWC | https://paperswithcode.com/paper/reinforcement-learning-based-graph-to-1 |
Repo | |
Framework | |
Measuring Calibration in Deep Learning
Title | Measuring Calibration in Deep Learning |
Authors | Anonymous |
Abstract | Overconfidence and underconfidence in machine learning classifiers is measured by calibration: the degree to which the probabilities predicted for each class match the accuracy of the classifier on that prediction. We propose two new measures for calibration, the Static Calibration Error (SCE) and Adaptive Calibration Error (ACE). These measures take into account every prediction made by a model, in contrast to the popular Expected Calibration Error. |
Tasks | Calibration |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1la7krKPS |
https://openreview.net/pdf?id=r1la7krKPS | |
PWC | https://paperswithcode.com/paper/measuring-calibration-in-deep-learning-1 |
Repo | |
Framework | |
Partial Simulation for Imitation Learning
Title | Partial Simulation for Imitation Learning |
Authors | Anonymous |
Abstract | Model-based imitation learning methods require full knowledge of the transition kernel for policy evaluation. In this work, we introduce the Expert Induced Markov Decision Process (eMDP) model as a formulation of solving imitation problems using Reinforcement Learning (RL), when only partial knowledge about the transition kernel is available. The idea of eMDP is to replace the unknown transition kernel with a synthetic kernel that: a) simulate the transition of state components for which the transition kernel is known (s_r), and b) extract from demonstrations the state components for which the kernel is unknown (s_u). The next state is then stitched from the two components: s={s_r,s_u}. We describe in detail the recipe for building an eMDP and analyze the errors caused by its synthetic kernel. Our experiments include imitation tasks in multiplayer games, where the agent has to imitate one expert in the presence of other experts for whom we cannot provide a transition model. We show that combining a policy gradient algorithm with our model achieves superior performance compared to the simulation-free alternative. |
Tasks | Imitation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJe_D1SYvr |
https://openreview.net/pdf?id=SJe_D1SYvr | |
PWC | https://paperswithcode.com/paper/partial-simulation-for-imitation-learning |
Repo | |
Framework | |
Semi-supervised 3D Face Reconstruction with Nonlinear Disentangled Representations
Title | Semi-supervised 3D Face Reconstruction with Nonlinear Disentangled Representations |
Authors | Anonymous |
Abstract | Recovering 3D geometry shape, albedo and lighting from a single image has wide applications in many areas, which is also a typical ill-posed problem. In order to eliminate the ambiguity, face prior knowledge like linear 3D morphable models (3DMM) learned from limited scan data are often adopted to the reconstruction process. However, methods based on linear parametric models cannot generalize well for facial images in the wild with various ages, ethnicity, expressions, poses, and lightings. Recent methods aim to learn a nonlinear parametric model using convolutional neural networks (CNN) to regress the face shape and texture directly. However, the models were only trained on a dataset that is generated from a linear 3DMM. Moreover, the identity and expression representations are entangled in these models, which hurdles many facial editing applications. In this paper, we train our model with adversarial loss in a semi-supervised manner on hybrid batches of unlabeled and labeled face images to exploit the value of large amounts of unlabeled face images from unconstrained photo collections. A novel center loss is introduced to make sure that different facial images from the same person have the same identity shape and albedo. Besides, our proposed model disentangles identity, expression, pose, and lighting representations, which improves the overall reconstruction performance and facilitates facial editing applications, e.g., expression transfer. Comprehensive experiments demonstrate that our model produces high-quality reconstruction compared to state-of-the-art methods and is robust to various expression, pose, and lighting conditions. |
Tasks | 3D Face Reconstruction, Face Reconstruction |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1lK5kBKvr |
https://openreview.net/pdf?id=H1lK5kBKvr | |
PWC | https://paperswithcode.com/paper/semi-supervised-3d-face-reconstruction-with |
Repo | |
Framework | |
iWGAN: an Autoencoder WGAN for Inference
Title | iWGAN: an Autoencoder WGAN for Inference |
Authors | Anonymous |
Abstract | Generative Adversarial Networks (GANs) have been impactful on many problems and applications but suffer from unstable training. Wasserstein GAN (WGAN) leverages the Wasserstein distance to avoid the caveats in the minmax two-player training of GANs but has other defects such as mode collapse and lack of metric to detect the convergence. We introduce a novel inference WGAN (iWGAN) model, which is a principled framework to fuse auto-encoders and WGANs. The iWGAN jointly learns an encoder network and a generative network using an iterative primal dual optimization process. We establish the generalization error bound of iWGANs. We further provide a rigorous probabilistic interpretation of our model under the framework of maximum likelihood estimation. The iWGAN, with a clear stopping criteria, has many advantages over other autoencoder GANs. The empirical experiments show that our model greatly mitigates the symptom of mode collapse, speeds up the convergence, and is able to provide a measurement of quality check for each individual sample. We illustrate the ability of iWGANs by obtaining a competitive and stable performance with state-of-the-art for benchmark datasets. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJg6VREFDH |
https://openreview.net/pdf?id=HJg6VREFDH | |
PWC | https://paperswithcode.com/paper/iwgan-an-autoencoder-wgan-for-inference |
Repo | |
Framework | |
A Syntax-Aware Approach for Unsupervised Text Style Transfer
Title | A Syntax-Aware Approach for Unsupervised Text Style Transfer |
Authors | Anonymous |
Abstract | Unsupervised text style transfer aims to rewrite the text of a source style into a target style while preserving the style-independent content, without parallel training corpus. Most of the existing methods address the problem by only leveraging the surface forms of words. In this paper, we incorporate the syntactic knowledge and propose a multi-task learning based Syntax-Aware Style Transfer (SAST) model. Our SAST jointly learns to generate a transferred output with aligned words and syntactic labels, where the alignment between the words and syntactic labels is enforced with a consistency constraint. The auxiliary syntactic label generation task regularizes the model to form more generalized representations, which is a desirable property especially in unsupervised tasks. Experimental results on two benchmark datasets for text style transfer demonstrate the effectiveness of the proposed method in terms of transfer accuracy, content preservation, and fluency. |
Tasks | Multi-Task Learning, Style Transfer, Text Style Transfer |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Bkll_kHFPB |
https://openreview.net/pdf?id=Bkll_kHFPB | |
PWC | https://paperswithcode.com/paper/a-syntax-aware-approach-for-unsupervised-text |
Repo | |
Framework | |
Superbloom: Bloom filter meets Transformer
Title | Superbloom: Bloom filter meets Transformer |
Authors | Anonymous |
Abstract | We extend the idea of word pieces in natural language models to machine learning tasks on opaque ids. This is achieved by applying hash functions to map each id to multiple hash tokens in a much smaller space, similarly to a Bloom filter. We show that by applying a multi-layer Transformer to these Bloom filter digests, we are able to obtain models with high accuracy. They outperform models of a similar size without hashing and, to a large degree, models of a much larger size trained using sampled softmax with the same computational budget. Our key observation is that it is important to use a multi-layer Transformer for Bloom filter digests to remove ambiguity in the hashed input. We believe this provides an alternative method to solving problems with large vocabulary size. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJxy5A4twS |
https://openreview.net/pdf?id=SJxy5A4twS | |
PWC | https://paperswithcode.com/paper/superbloom-bloom-filter-meets-transformer |
Repo | |
Framework | |
The Effect of Neural Net Architecture on Gradient Confusion & Training Performance
Title | The Effect of Neural Net Architecture on Gradient Confusion & Training Performance |
Authors | Anonymous |
Abstract | The goal of this paper is to study why typical neural networks train so fast, and how neural network architecture affects the speed of training. We introduce a simple concept called gradient confusion to help formally analyze this. When confusion is high, stochastic gradients produced by different data samples may be negatively correlated, slowing down convergence. But when gradient confusion is low, data samples interact harmoniously, and training proceeds quickly. Through novel theoretical and experimental results, we show how the neural net architecture affects gradient confusion, and thus the efficiency of training. We show that increasing the width of neural networks leads to lower gradient confusion, and thus easier model training. On the other hand, increasing the depth of neural networks has the opposite effect. Finally, we observe empirically that techniques like batch normalization and skip connections reduce gradient confusion, which helps reduce the training burden of very deep networks. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1xNJ0NYDH |
https://openreview.net/pdf?id=r1xNJ0NYDH | |
PWC | https://paperswithcode.com/paper/the-effect-of-neural-net-architecture-on |
Repo | |
Framework | |
Efficient Saliency Maps for Explainable AI
Title | Efficient Saliency Maps for Explainable AI |
Authors | Anonymous |
Abstract | We describe an explainable AI saliency map method for use with deep convolutional neural networks (CNN) that is much more efficient than popular gradient methods. It is also quantitatively similar or better in accuracy. Our technique works by measuring information at the end of each network scale. This is then combined into a single saliency map. We describe how saliency measures can be made more efficient by exploiting Saliency Map Order Equivalence. Finally, we visualize individual scale/layer contributions by using a Layer Ordered Visualization of Information. This provides an interesting comparison of scale information contributions within the network not provided by other saliency map methods. Our method is generally straight forward and should be applicable to the most commonly used CNNs. (Full source code is available at http://www.anonymous.submission.com). |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryxf9CEKDr |
https://openreview.net/pdf?id=ryxf9CEKDr | |
PWC | https://paperswithcode.com/paper/efficient-saliency-maps-for-explainable-ai |
Repo | |
Framework | |
Common sense and Semantic-Guided Navigation via Language in Embodied Environments
Title | Common sense and Semantic-Guided Navigation via Language in Embodied Environments |
Authors | Anonymous |
Abstract | One key element which differentiates humans from artificial agents in performing various tasks is that humans have access to common sense and semantic understanding, learnt from past experiences. In this work, we evaluate whether common sense and semantic understanding benefit an artificial agent when completing a room navigation task, wherein we ask the agent to navigate to a target room (e.g. ``go to the kitchen”), in a realistic 3D environment. We leverage semantic information and patterns observed during training to build the common sense which guides the agent to reach the target. We encourage semantic understanding within the agent by introducing grounding as an auxiliary task. We train and evaluate the agent in three settings: (i)~imitation learning using expert trajectories (ii)~reinforcement learning using Proximal Policy Optimization and (iii)~self-supervised imitation learning for fine-tuning the agent on unseen environments using auxiliary tasks. From our experiments, we observed that common sense helps the agent in long-term planning, while semantic understanding helps in short-term and local planning (such as guiding the agent when to stop). When combined, the agent generalizes better. Further, incorporating common sense and semantic understanding leads to 40% improvement in task success and 112% improvement in success per length (\textit{SPL}) over the baseline during imitation learning. Moreover, initial evidence suggests that the cross-modal embeddings learnt during training capture structural and positional patterns of the environment, implying that the agent inherently learns a map of the environment. It also suggests that navigation in multi-modal tasks leads to better semantic understanding. | |
Tasks | Common Sense Reasoning, Imitation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Bkx5ceHFwH |
https://openreview.net/pdf?id=Bkx5ceHFwH | |
PWC | https://paperswithcode.com/paper/common-sense-and-semantic-guided-navigation |
Repo | |
Framework | |
Compositional Embeddings: Joint Perception and Comparison of Class Label Sets
Title | Compositional Embeddings: Joint Perception and Comparison of Class Label Sets |
Authors | Anonymous |
Abstract | We explore the idea of compositional set embeddings that can be used to infer not just a single class, but the set of classes associated with the input data (e.g., image, video, audio signal). This can be useful, for example, in multi-object detection in images, or multi-speaker diarization (one-shot learning) in audio. In particular, we devise and implement two novel models consisting of (1) an embedding function f trained jointly with a “composite” function g that computes set union opera- tions between the classes encoded in two embedding vectors; and (2) embedding f trained jointly with a “query” function h that computes whether the classes en- coded in one embedding subsume the classes encoded in another embedding. In contrast to prior work, these models must both perceive the classes associated with the input examples, and also encode the relationships between different class label sets. In experiments conducted on simulated data, OmniGlot, and COCO datasets, the proposed composite embedding models outperform baselines based on traditional embedding approaches. |
Tasks | Object Detection, Omniglot, One-Shot Learning, Speaker Diarization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJx-ZeSKDB |
https://openreview.net/pdf?id=BJx-ZeSKDB | |
PWC | https://paperswithcode.com/paper/compositional-embeddings-joint-perception-and |
Repo | |
Framework | |
Few-shot Learning by Focusing on Differences
Title | Few-shot Learning by Focusing on Differences |
Authors | Anonymous |
Abstract | Few-shot classification may involve differentiating data that belongs to a different level of labels granularity. Compounded by the fact that the number of available labeled examples are scarce in the novel classification set, relying solely on the loss function to implicitly guide the classifier to separate data based on its label might not be enough; few-shot classifier needs to be very biased to perform well. In this paper, we propose a model that incorporates a simple prior: focusing on differences by building a dissimilar set of class representations. The model treats a class representation as a vector and removes its component that is shared among closely related class representatives. It does so through the combination of learned attention and vector orthogonalization. Our model works well on our newly introduced dataset, Hierarchical-CIFAR, that contains different level of labels granularity. It also substantially improved the performance on fine-grained classification dataset, CUB; whereas staying competitive on standard benchmarks such as mini-Imagenet, Omniglot, and few-shot dataset derived from CIFAR. |
Tasks | Few-Shot Learning, Omniglot |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1xwv1StvS |
https://openreview.net/pdf?id=B1xwv1StvS | |
PWC | https://paperswithcode.com/paper/few-shot-learning-by-focusing-on-differences |
Repo | |
Framework | |
Role of two learning rates in convergence of model-agnostic meta-learning
Title | Role of two learning rates in convergence of model-agnostic meta-learning |
Authors | Anonymous |
Abstract | Model-agnostic meta-learning (MAML) is known as a powerful meta-learning method. However, MAML is notorious for being hard to train because of the existence of two learning rates. Therefore, in this paper, we derive the conditions that inner learning rate $\alpha$ and meta-learning rate $\beta$ must satisfy for MAML to converge to minima with some simplifications. We find that the upper bound of $\beta$ depends on $ \alpha$, in contrast to the case of using the normal gradient descent method. Moreover, we show that the threshold of $\beta$ increases as $\alpha$ approaches its own upper bound. This result is verified by experiments on various few-shot tasks and architectures; specifically, we perform sinusoid regression and classification of Omniglot and MiniImagenet datasets with a multilayer perceptron and a convolutional neural network. Based on this outcome, we present a guideline for determining the learning rates: first, search for the largest possible $\alpha$; next, tune $\beta$ based on the chosen value of $\alpha$. |
Tasks | Meta-Learning, Omniglot |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1e8qpVKPS |
https://openreview.net/pdf?id=r1e8qpVKPS | |
PWC | https://paperswithcode.com/paper/role-of-two-learning-rates-in-convergence-of |
Repo | |
Framework | |
Unsupervised Few Shot Learning via Self-supervised Training
Title | Unsupervised Few Shot Learning via Self-supervised Training |
Authors | Anonymous |
Abstract | Learning from limited exemplars (few-shot learning) is a fundamental, unsolved problem that has been laboriously explored in the machine learning community. However, current few-shot learners are mostly supervised and rely heavily on a large amount of labeled examples. Unsupervised learning is a more natural procedure for cognitive mammals and has produced promising results in many machine learning tasks. In the current study, we develop a method to learn an unsupervised few-shot learner via self-supervised training (UFLST), which can effectively generalize to novel but related classes. The proposed model consists of two alternate processes, progressive clustering and episodic training. The former generates pseudo-labeled training examples for constructing episodic tasks; and the later trains the few-shot learner using the generated episodic tasks which further optimizes the feature representations of data. The two processes facilitate with each other, and eventually produce a high quality few-shot learner. Using the benchmark dataset Omniglot, we show that our model outperforms other unsupervised few-shot learning methods to a large extend and approaches to the performances of supervised methods. Using the benchmark dataset Market1501, we further demonstrate the feasibility of our model to a real-world application on person re-identification. |
Tasks | Few-Shot Learning, Omniglot, Person Re-Identification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Ske-ih4FPS |
https://openreview.net/pdf?id=Ske-ih4FPS | |
PWC | https://paperswithcode.com/paper/unsupervised-few-shot-learning-via-self |
Repo | |
Framework | |
Gradientless Descent: High-Dimensional Zeroth-Order Optimization
Title | Gradientless Descent: High-Dimensional Zeroth-Order Optimization |
Authors | Anonymous |
Abstract | Zeroth-order optimization is the process of minimizing an objective $f(x)$, given oracle access to evaluations at adaptively chosen inputs $x$. In this paper, we present two simple yet powerful GradientLess Descent (GLD) algorithms that do not rely on an underlying gradient estimate and are numerically stable. We analyze our algorithm from a novel geometric perspective and we show that for {\it any monotone transform} of a smooth and strongly convex objective with latent dimension $k \ge n$, we present a novel analysis that shows convergence within an $\epsilon$-ball of the optimum in $O(kQ\log(n)\log(R/\epsilon))$ evaluations, where the input dimension is $n$, $R$ is the diameter of the input space and $Q$ is the condition number. Our rates are the first of its kind to be both 1) poly-logarithmically dependent on dimensionality and 2) invariant under monotone transformations. We further leverage our geometric perspective to show that our analysis is optimal. Both monotone invariance and its ability to utilize a low latent dimensionality are key to the empirical success of our algorithms, as demonstrated on synthetic and MuJoCo benchmarks. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Skep6TVYDB |
https://openreview.net/pdf?id=Skep6TVYDB | |
PWC | https://paperswithcode.com/paper/gradientless-descent-high-dimensional-zeroth |
Repo | |
Framework | |