Paper Group NANR 77
Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration. Teaching GAN to generate per-pixel annotation. RNA Secondary Structure Prediction By Learning Unrolled Algorithms. Meta Label Correction for Learning with Weak Supervision. Asynchronous Stochastic Subgradient Methods for General Nonsmooth Nonconvex Optimization. …
Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration
Title | Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration |
Authors | Anonymous |
Abstract | Exploration in sparse reward reinforcement learning remains an open challenge. Many state-of-the-art methods use intrinsic motivation to complement the sparse extrinsic reward signal, giving the agent more opportunities to receive feedback during exploration. Commonly these signals are added as bonus rewards, which results in a mixture policy that neither conducts exploration nor task fulfillment resolutely. In this paper, we instead learn separate intrinsic and extrinsic task policies and schedule between these different drives to accelerate exploration and stabilize learning. Moreover, we introduce a new type of intrinsic reward denoted as successor feature control (SFC), which is general and not task-specific. It takes into account statistics over complete trajectories and thus differs from previous methods that only use local information to evaluate intrinsic motivation. We evaluate our proposed scheduled intrinsic drive (SID) agent using three different environments with pure visual inputs: VizDoom, DeepMind Lab and DeepMind Control Suite. The results show a substantially improved exploration efficiency with SFC and the hierarchical usage of the intrinsic drives. A video of our experimental results can be found at https://gofile.io/?c=HpEwTd. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SklSQgHFDS |
https://openreview.net/pdf?id=SklSQgHFDS | |
PWC | https://paperswithcode.com/paper/scheduled-intrinsic-drive-a-hierarchical-take-1 |
Repo | |
Framework | |
Teaching GAN to generate per-pixel annotation
Title | Teaching GAN to generate per-pixel annotation |
Authors | Anonymous |
Abstract | We propose a method for joint image and per-pixel annotation synthesis with GAN. We demonstrate that GAN has good high-level representation of target data that can be easily projected to semantic segmentation masks. This method can be used to create a training dataset for teaching separate semantic segmentation network. Our experiments show that such segmentation network successfully generalizes on real data. Additionally, the method outperforms supervised training when the number of training samples is small, and works on variety of different scenes and classes. The source code of the proposed method will be publicly available. |
Tasks | Semantic Segmentation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJllFpVYwS |
https://openreview.net/pdf?id=SJllFpVYwS | |
PWC | https://paperswithcode.com/paper/teaching-gan-to-generate-per-pixel-annotation |
Repo | |
Framework | |
RNA Secondary Structure Prediction By Learning Unrolled Algorithms
Title | RNA Secondary Structure Prediction By Learning Unrolled Algorithms |
Authors | Anonymous |
Abstract | In this paper, we propose an end-to-end deep learning model, called E2Efold, for RNA secondary structure prediction which can effectively take into account the inherent constraints in the problem. The key idea of E2Efold is to directly predict the RNA base-pairing matrix, and use an unrolled constrained programming algorithm as a building block in the architecture to enforce constraints. With comprehensive experiments on benchmark datasets, we demonstrate the superior performance of E2Efold: it predicts significantly better structures compared to previous SOTA (29.7% improvement in some cases in F1 scores and even larger improvement for pseudoknotted structures) and runs as efficient as the fastest algorithms in terms of inference time. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1eALyrYDH |
https://openreview.net/pdf?id=S1eALyrYDH | |
PWC | https://paperswithcode.com/paper/rna-secondary-structure-prediction-by |
Repo | |
Framework | |
Meta Label Correction for Learning with Weak Supervision
Title | Meta Label Correction for Learning with Weak Supervision |
Authors | Anonymous |
Abstract | Leveraging weak or noisy supervision for building effective machine learning models has long been an important research problem. The growing need for large-scale datasets to train deep learning models has increased its importance. Weak or noisy supervision could originate from multiple sources including non-expert annotators or automatic labeling based on heuristics or user interaction signals. Previous work on modeling and correcting weak labels have been focused on various aspects, including loss correction, training instance re-weighting, etc. In this paper, we approach this problem from a novel perspective based on meta-learning. We view the label correction procedure as a meta-process and propose a new meta-learning based framework termed MLC for learning with weak supervision. Experiments with different label noise levels on multiple datasets show that MLC can achieve large improvement over previous methods incorporating weak labels for learning. |
Tasks | Meta-Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJgQpgBKDH |
https://openreview.net/pdf?id=HJgQpgBKDH | |
PWC | https://paperswithcode.com/paper/meta-label-correction-for-learning-with-weak |
Repo | |
Framework | |
Asynchronous Stochastic Subgradient Methods for General Nonsmooth Nonconvex Optimization
Title | Asynchronous Stochastic Subgradient Methods for General Nonsmooth Nonconvex Optimization |
Authors | Anonymous |
Abstract | Asynchronous distributed methods are a popular way to reduce the communication and synchronization costs of large-scale optimization. Yet, for all their success, little is known about their convergence guarantees in the challenging case of general non-smooth, non-convex objectives, beyond cases where closed-form proximal operator solutions are available. This is all the more surprising since these objectives are the ones appearing in the training of deep neural networks. In this paper, we introduce the first convergence analysis covering asynchronous methods in the case of general non-smooth, non-convex objectives. Our analysis applies to stochastic sub-gradient descent methods both with and without block variable partitioning, and both with and without momentum. It is phrased in the context of a general probabilistic model of asynchronous scheduling accurately adapted to modern hardware properties. We validate our analysis experimentally in the context of training deep neural network architectures. We show their overall successful asymptotic convergence as well as exploring how momentum, synchronization, and partitioning all affect performance. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJlSPRVFwS |
https://openreview.net/pdf?id=BJlSPRVFwS | |
PWC | https://paperswithcode.com/paper/asynchronous-stochastic-subgradient-methods |
Repo | |
Framework | |
Chart Auto-Encoders for Manifold Structured Data
Title | Chart Auto-Encoders for Manifold Structured Data |
Authors | Anonymous |
Abstract | Auto-encoding and generative models have made tremendous successes in image and signal representation learning and generation. These models, however, generally employ the full Euclidean space or a bounded subset (such as $[0,1]^l$) as the latent space, whose trivial geometry is often too simplistic to meaningfully reflect the structure of the data. This paper aims at exploring a nontrivial geometric structure of the latent space for better data representation. Inspired by differential geometry, we propose \textbf{Chart Auto-Encoder (CAE)}, which captures the manifold structure of the data with multiple charts and transition functions among them. CAE translates the mathematical definition of manifold through parameterizing the entire data set as a collection of overlapping charts, creating local latent representations. These representations are an enhancement of the single-charted latent space commonly employed in auto-encoding models, as they reflect the intrinsic structure of the manifold. Therefore, CAE achieves a more accurate approximation of data and generates realistic new ones. We conduct experiments with synthetic and real-life data to demonstrate the effectiveness of the proposed CAE. |
Tasks | Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJeBJJBYDB |
https://openreview.net/pdf?id=rJeBJJBYDB | |
PWC | https://paperswithcode.com/paper/chart-auto-encoders-for-manifold-structured |
Repo | |
Framework | |
Minimizing Change in Classifier Likelihood to Mitigate Catastrophic Forgetting
Title | Minimizing Change in Classifier Likelihood to Mitigate Catastrophic Forgetting |
Authors | Anonymous |
Abstract | Continual learning is a longstanding goal of artificial intelligence, but is often counfounded by catastrophic forgetting that prevents neural networks from learning tasks sequentially. Previous methods in continual learning have demonstrated how to mitigate catastrophic forgetting, and learn new tasks while retaining performance on the previous tasks. We analyze catastrophic forgetting from the perspective of change in classifier likelihood and propose a simple L1 minimization criterion which can be adapted to different use cases. We further investigate two ways to minimize forgetting as quantified by this criterion and propose strategies to achieve finer control over forgetting. Finally, we evaluate our strategies on 3 datasets of varying difficulty and demonstrate improvements over previously known L2 strategies for mitigating catastrophic forgetting. |
Tasks | Continual Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJlLQlrFwS |
https://openreview.net/pdf?id=BJlLQlrFwS | |
PWC | https://paperswithcode.com/paper/minimizing-change-in-classifier-likelihood-to |
Repo | |
Framework | |
Consistency Regularization for Generative Adversarial Networks
Title | Consistency Regularization for Generative Adversarial Networks |
Authors | Anonymous |
Abstract | Generative Adversarial Networks are plagued by training instability, despite considerable research effort. Progress has been made on this topic, but many of the proposed interventions are complicated, computationally expensive, or both. In this work, we propose a simple and effective training stabilizer based on the notion of Consistency Regularization - a popular technique in the Semi-Supervised Learning literature. In particular, we augment data passing into the GAN discriminator and penalize the sensitivity of the ultimate layer of the discriminator to these augmentations. This regularization reduces memorization of the training data and demonstrably increases the robustness of the discriminator to input perturbations. We conduct a series of ablation studies to demonstrate that the consistency regularization is compatible with various GAN architectures and loss functions. Moreover, the proposed simple regularization can consistently improve these different GANs variants significantly. Finally, we show that applying consistency regularization to GANs improves state-of-the-art FID scores from 14.73 to 11.67 on the CIFAR-10 dataset. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1lxKlSKPH |
https://openreview.net/pdf?id=S1lxKlSKPH | |
PWC | https://paperswithcode.com/paper/consistency-regularization-for-generative |
Repo | |
Framework | |
Global-Local Network for Learning Depth with Very Sparse Supervision
Title | Global-Local Network for Learning Depth with Very Sparse Supervision |
Authors | Anonymous |
Abstract | Natural intelligent agents learn to perceive the three dimensional structure of the world without training on large datasets and are unlikely to have the precise equations of projective geometry hard-wired in the brain. Such skill would also be valuable to artificial systems in order to avoid the expensive collection of labeled datasets, as well as tedious tuning required by methods based on multi-view geometry. Inspired by natural agents, who interact with the environment via visual and haptic feedback, this paper explores a new approach to learning depth from images and very sparse depth measurements, just a few pixels per image. To learn from such extremely sparse supervision, we introduce an appropriate inductive bias by designing a specialized global-local network architecture. Experiments on several datasets show that the proposed model can learn monocular dense depth estimation when trained with very sparse ground truth, even a single pixel per image. Moreover, we find that the global parameters extracted by the network are predictive of the metric agent motion. |
Tasks | Depth Estimation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hke4_JrYDr |
https://openreview.net/pdf?id=Hke4_JrYDr | |
PWC | https://paperswithcode.com/paper/global-local-network-for-learning-depth-with |
Repo | |
Framework | |
wMAN: WEAKLY-SUPERVISED MOMENT ALIGNMENT NETWORK FOR TEXT-BASED VIDEO SEGMENT RETRIEVAL
Title | wMAN: WEAKLY-SUPERVISED MOMENT ALIGNMENT NETWORK FOR TEXT-BASED VIDEO SEGMENT RETRIEVAL |
Authors | Anonymous |
Abstract | Given a video and a sentence, the goal of weakly-supervised video moment retrieval is to locate the video segment which is described by the sentence without having access to temporal annotations during training. Instead, a model must learn how to identify the correct segment (i.e. moment) when only being provided with video-sentence pairs. Thus, an inherent challenge is automatically inferring the latent correspondence between visual and language representations. To facilitate this alignment, we propose our Weakly-supervised Moment Alignment Network (wMAN) which exploits a multi-level co-attention mechanism to learn richer multimodal representations. The aforementioned mechanism is comprised of a Frame-By-Word interaction module as well as a novel Word-Conditioned Visual Graph (WCVG). Our approach also incorporates a novel application of positional encodings, commonly used in Transformers, to learn visual-semantic representations that contain contextual information of their relative positions in the temporal sequence through iterative message-passing. Comprehensive experiments on the DiDeMo and Charades-STA datasets demonstrate the effectiveness of our learned representations: our combined wMAN model not only outperforms the state-of-the-art weakly-supervised method by a significant margin but also does better than strongly-supervised state-of-the-art methods on some metrics. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJx4rerFwB |
https://openreview.net/pdf?id=BJx4rerFwB | |
PWC | https://paperswithcode.com/paper/wman-weakly-supervised-moment-alignment |
Repo | |
Framework | |
From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech
Title | From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech |
Authors | Anonymous |
Abstract | This work seeks the possibility of generating the human face from voice solely based on the audio-visual data without any human-labeled annotations. To this end, we propose a multi-modal learning framework that links the inference stage and generation stage. First, the inference networks are trained to match the speaker identity between the two different modalities. Then the pre-trained inference networks cooperate with the generation network by giving conditional information about the voice. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1guaREYPr |
https://openreview.net/pdf?id=H1guaREYPr | |
PWC | https://paperswithcode.com/paper/from-inference-to-generation-end-to-end-fully |
Repo | |
Framework | |
Probing Emergent Semantics in Predictive Agents via Question Answering
Title | Probing Emergent Semantics in Predictive Agents via Question Answering |
Authors | Anonymous |
Abstract | Recent work has demonstrated how predictive modeling can endow agents with rich knowledge of their surroundings, improving their ability to act in complex environments. We propose question-answering as a general paradigm to decode and understand the representations that such agents develop, applying our method to two recent approaches to predictive modeling – action-conditional CPC (Guo et al., 2018) and SimCore (Gregor et al., 2019). After training agents with these predictive objectives in a visually-rich, 3D environment with an assortment of objects, colors, shapes, and spatial configurations, we probe their internal state representations with a host of synthetic (English) questions, without backpropagating gradients from the question-answering decoder into the agent. The performance of different agents when probed in this way reveals that they learn to encode detailed, and seemingly compositional, information about objects, properties and spatial relations from their physical environment. Our approach is intuitive, i.e. humans can easily interpret the responses of the model as opposed to inspecting continuous vectors, and model-agnostic, i.e. applicable to any modeling approach. By revealing the implicit knowledge of objects, quantities, properties and relations acquired by agents as they learn, question-conditional agent probing can stimulate the design and development of stronger predictive learning objectives. |
Tasks | Question Answering |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Bylh2krYPr |
https://openreview.net/pdf?id=Bylh2krYPr | |
PWC | https://paperswithcode.com/paper/probing-emergent-semantics-in-predictive |
Repo | |
Framework | |
Optimizing Data Usage via Differentiable Rewards
Title | Optimizing Data Usage via Differentiable Rewards |
Authors | Anonymous |
Abstract | To acquire a new skill, humans learn better and faster if a tutor, based on their current knowledge level, informs them of how much attention they should pay to particular content or practice problems. Similarly, a machine learning model could potentially be trained better with a scorer that “adapts” to its current learning state and estimates the importance of each training data instance. Training such an adaptive scorer efficiently is a challenging problem; in order to precisely quantify the effect of a data instance at a given time during the training, it is typically necessary to first complete the entire training process. To efficiently optimize data usage, we propose a reinforcement learning approach called Differentiable Data Selection (DDS). In DDS, we formulate a scorer network as a learnable function of the training data, which can be efficiently updated along with the main model being trained. Specifically, DDS updates the scorer with an intuitive reward signal: it should up-weigh the data that has a similar gradient with a dev set upon which we would finally like to perform well. Without significant computing overhead, DDS delivers strong and consistent improvements over several strong baselines on two very different tasks of machine translation and image classification. |
Tasks | Image Classification, Machine Translation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJxt2aVFPr |
https://openreview.net/pdf?id=BJxt2aVFPr | |
PWC | https://paperswithcode.com/paper/optimizing-data-usage-via-differentiable |
Repo | |
Framework | |
A Baseline for Few-Shot Image Classification
Title | A Baseline for Few-Shot Image Classification |
Authors | Anonymous |
Abstract | Fine-tuning a deep network trained with the standard cross-entropy loss is a strong baseline for few-shot learning. When fine-tuned transductively, this outperforms the current state-of-the-art on standard datasets such as Mini-Imagenet, Tiered-Imagenet, CIFAR-FS and FC-100 with the same hyper-parameters. The simplicity of this approach enables us to demonstrate the first few-shot learning results on the Imagenet-21k dataset. We find that using a large number of meta-training classes results in high few-shot accuracies even for a large number of few-shot classes. We do not advocate our approach as the solution for few-shot learning, but simply use the results to highlight limitations of current benchmarks and few-shot protocols. We perform extensive studies on benchmark datasets to propose a metric that quantifies the “hardness” of a few-shot episode. This metric can be used to report the performance of few-shot algorithms in a more systematic way. |
Tasks | Few-Shot Image Classification, Few-Shot Learning, Image Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rylXBkrYDS |
https://openreview.net/pdf?id=rylXBkrYDS | |
PWC | https://paperswithcode.com/paper/a-baseline-for-few-shot-image-classification-1 |
Repo | |
Framework | |
Improving Batch Normalization with Skewness Reduction for Deep Neural Networks
Title | Improving Batch Normalization with Skewness Reduction for Deep Neural Networks |
Authors | Pak Lun Kevin Ding, Sarah Martin, Baoxin Li |
Abstract | Batch Normalization (BN) is a well-known technique used in training deep neural networks. The main idea behind batch normalization is to normalize the features of the layers ($i.e.$, transforming them to have a mean equal to zero and a variance equal to one). Such a procedure encourages the optimization landscape of the loss function to be smoother, and improve the learning of the networks for both speed and performance. In this paper, we demonstrate that the performance of the network can be improved, if the distributions of the features of the output in the same layer are similar. As normalizing based on mean and variance does not necessarily make the features to have the same distribution, we propose a new normalization scheme: Batch Normalization with Skewness Reduction (BNSR). Comparing with other normalization approaches, BNSR transforms not just only the mean and variance, but also the skewness of the data. By tackling this property of a distribution, we are able to make the output distributions of the layers to be further similar. The nonlinearity of BNSR may further improve the expressiveness of the underlying network. Comparisons with other normalization schemes are tested on the CIFAR-100 and ImageNet datasets. Experimental results show that the proposed approach can outperform other state-of-the-arts that are not equipped with BNSR. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryx0nnEKwH |
https://openreview.net/pdf?id=ryx0nnEKwH | |
PWC | https://paperswithcode.com/paper/improving-batch-normalization-with-skewness |
Repo | |
Framework | |