April 1, 2020

2820 words 14 mins read

Paper Group NANR 50

Paper Group NANR 50

A shallow feature extraction network with a large receptive field for stereo matching tasks. DeepEnFM: Deep neural networks with Encoder enhanced Factorization Machine. Generating Robust Audio Adversarial Examples using Iterative Proportional Clipping. Limitations for Learning from Point Clouds. Adaptive Structural Fingerprints for Graph Attention …

A shallow feature extraction network with a large receptive field for stereo matching tasks

Title A shallow feature extraction network with a large receptive field for stereo matching tasks
Authors Jianguo Liu, Yunjian Feng, Guo Ji, Fuwu Yan
Abstract Stereo matching is one of the important basic tasks in the computer vision field. In recent years, stereo matching algorithms based on deep learning have achieved excellent performance and become the mainstream research direction. Existing algorithms generally use deep convolutional neural networks (DCNNs) to extract more abstract semantic information, but we believe that the detailed information of the spatial structure is more important for stereo matching tasks. Based on this point of view, this paper proposes a shallow feature extraction network with a large receptive field. The network consists of three parts: a primary feature extraction module, an atrous spatial pyramid pooling (ASPP) module and a feature fusion module. The primary feature extraction network contains only three convolution layers. This network utilizes the basic feature extraction ability of the shallow network to extract and retain the detailed information of the spatial structure. In this paper, the dilated convolution and atrous spatial pyramid pooling (ASPP) module is introduced to increase the size of receptive field. In addition, a feature fusion module is designed, which integrates the feature maps with multiscale receptive fields and mutually complements the feature information of different scales. We replaced the feature extraction part of the existing stereo matching algorithms with our shallow feature extraction network, and achieved state-of-the-art performance on the KITTI 2015 dataset. Compared with the reference network, the number of parameters is reduced by 42%, and the matching accuracy is improved by 1.9%.
Tasks Stereo Matching
Published 2020-01-01
URL https://openreview.net/forum?id=H1lKNp4Fvr
PDF https://openreview.net/pdf?id=H1lKNp4Fvr
PWC https://paperswithcode.com/paper/a-shallow-feature-extraction-network-with-a
Repo
Framework

DeepEnFM: Deep neural networks with Encoder enhanced Factorization Machine

Title DeepEnFM: Deep neural networks with Encoder enhanced Factorization Machine
Authors Anonymous
Abstract Click Through Rate (CTR) prediction is a critical task in industrial applications, especially for online social and commerce applications. It is challenging to find a proper way to automatically discover the effective cross features in CTR tasks. We propose a novel model for CTR tasks, called Deep neural networks with Encoder enhanced Factorization Machine (DeepEnFM). Instead of learning the cross features directly, DeepEnFM adopts the Transformer encoder as a backbone to align the feature embeddings with the clues of other fields. The embeddings generated from encoder are beneficial for the further feature interactions. Particularly, DeepEnFM utilizes a bilinear approach to generate different similarity functions with respect to different field pairs. Furthermore, the max-pooling method makes DeepEnFM feasible to capture both the supplementary and suppressing information among different attention heads. Our model is validated on the Criteo and Avazu datasets, and achieves state-of-art performance.
Tasks Click-Through Rate Prediction
Published 2020-01-01
URL https://openreview.net/forum?id=SJlyta4YPS
PDF https://openreview.net/pdf?id=SJlyta4YPS
PWC https://paperswithcode.com/paper/deepenfm-deep-neural-networks-with-encoder
Repo
Framework

Generating Robust Audio Adversarial Examples using Iterative Proportional Clipping

Title Generating Robust Audio Adversarial Examples using Iterative Proportional Clipping
Authors Anonymous
Abstract Audio adversarial examples, imperceptible to humans, have been constructed to attack automatic speech recognition (ASR) systems. However, the adversarial examples generated by existing approaches usually involve notable noise, especially during the periods of silence and pauses, which may lead to the detection of such attacks. This paper proposes a new approach to generate adversarial audios using Iterative Proportional Clipping (IPC), which exploits temporal dependency in original audios to significantly limit human-perceptible noise. Specifically, in every iteration of optimization, we use a backpropagation model to learn the raw perturbation on the original audio to construct our clipping. We then impose a constraint on the perturbation at the positions with lower sound intensity across the time domain to eliminate the perceptible noise during the silent periods or pauses. IPC preserves the linear proportionality between the original audio and the perturbed one to maintain the temporal dependency. We show that the proposed approach can successfully attack the latest state-of-the-art ASR model Wav2letter+, and only requires a few minutes to generate an audio adversarial example. Experimental results also demonstrate that our approach succeeds in preserving temporal dependency and can bypass temporal dependency based defense mechanisms.
Tasks Speech Recognition
Published 2020-01-01
URL https://openreview.net/forum?id=HJgFW6EKvH
PDF https://openreview.net/pdf?id=HJgFW6EKvH
PWC https://paperswithcode.com/paper/generating-robust-audio-adversarial-examples
Repo
Framework

Limitations for Learning from Point Clouds

Title Limitations for Learning from Point Clouds
Authors Anonymous
Abstract In this paper we prove new universal approximation theorems for deep learning on point clouds that do not assume fixed cardinality. We do this by first generalizing the classical universal approximation theorem to general compact Hausdorff spaces and then applying this to the permutation-invariant architectures presented in ‘PointNet’ (Qi et al) and ‘Deep Sets’ (Zaheer et al). Moreover, though both architectures operate on the same domain, we show that the constant functions are the only functions they can mutually uniformly approximate. In particular, DeepSets architectures cannot uniformly approximate the diameter function but can uniformly approximate the center of mass function but it is the other way around for PointNet.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=r1x63grFvH
PDF https://openreview.net/pdf?id=r1x63grFvH
PWC https://paperswithcode.com/paper/limitations-for-learning-from-point-clouds
Repo
Framework

Adaptive Structural Fingerprints for Graph Attention Networks

Title Adaptive Structural Fingerprints for Graph Attention Networks
Authors Anonymous
Abstract Many real-world data sets are represented as graphs, such as citation links, social media, and biological interaction. The volatile graph structure makes it non-trivial to employ convolutional neural networks (CNN’s) for graph data processing. Recently, graph attention network (GAT) has proven a promising attempt by combining graph neural networks with attention mechanism, so as to achieve massage passing in graphs with arbitrary structures. However, the attention in GAT is computed mainly based on the similarity between the node content, while the structures of the graph remains largely unemployed (except in masking the attention out of one-hop neighbors). In this paper, we propose an `````````````````````````````"ADaptive Structural Fingerprint” (ADSF) model to fully exploit both topological details of the graph and content features of the nodes. The key idea is to contextualize each node with a weighted, learnable receptive field encoding rich and diverse local graph structures. By doing this, structural interactions between the nodes can be inferred accurately, thus improving subsequent attention layer as well as the convergence of learning. Furthermore, our model provides a useful platform for different subspaces of node features and various scales of graph structures to ``cross-talk’’ with each other through the learning of multi-head attention, being particularly useful in handling complex real-world data. Encouraging performance is observed on a number of benchmark data sets in node classification. |
Tasks Node Classification
Published 2020-01-01
URL https://openreview.net/forum?id=BJxWx0NYPr
PDF https://openreview.net/pdf?id=BJxWx0NYPr
PWC https://paperswithcode.com/paper/adaptive-structural-fingerprints-for-graph
Repo
Framework

What Can Learned Intrinsic Rewards Capture?

Title What Can Learned Intrinsic Rewards Capture?
Authors Anonymous
Abstract Reinforcement learning agents can include different components, such as policies, value functions, state representations, and environment models. Any or all of these can be the loci of knowledge, i.e., structures where knowledge, whether given or learned, can be deposited and reused. Regardless of its composition, the objective of an agent is behave so as to maximise the sum of suitable scalar functions of state: the rewards. As far as the learning algorithm is concerned, these rewards are typically given and immutable. In this paper we instead consider the proposition that the reward function itself may be a good locus of knowledge. This is consistent with a common use, in the literature, of hand-designed intrinsic rewards to improve the learning dynamics of an agent. We adopt a multi-lifetime setting of the Optimal Rewards Framework, and investigate how meta-learning can be used to find good reward functions in a data-driven way. To this end, we propose to meta-learn an intrinsic reward function that allows agents to maximise their extrinsic rewards accumulated until the end of their lifetimes. This long-term lifetime objective allows our learned intrinsic reward to generate systematic multi-episode exploratory behaviour. Through proof-of-concept experiments, we elucidate interesting forms of knowledge that may be captured by a suitably trained intrinsic reward such as the usefulness of exploring uncertain states and rewards.
Tasks Meta-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=SkgbmyHFDS
PDF https://openreview.net/pdf?id=SkgbmyHFDS
PWC https://paperswithcode.com/paper/what-can-learned-intrinsic-rewards-capture
Repo
Framework

Convolutional Conditional Neural Processes

Title Convolutional Conditional Neural Processes
Authors Anonymous
Abstract We introduce the Convolutional Conditional Neural Process (ConvCNP), a new member of the Neural Process family that models translation equivariance in the data. Translation equivariance is an important inductive bias for many learning problems including time series modelling, spatial data, and images. The model embeds data sets into an infinite-dimensional function space, as opposed to finite-dimensional vector spaces. To formalize this notion, we extend the theory of neural representations of sets to include functional representations, and demonstrate that any translation-equivariant embedding can be represented using a convolutional deep-set. We evaluate ConvCNPs in several settings, demonstrating that they achieve state-of-the-art performance compared to existing NPs. We demonstrate that building in translation equivariance enables zero-shot generalization to challenging, out-of-domain tasks.
Tasks Time Series
Published 2020-01-01
URL https://openreview.net/forum?id=Skey4eBYPS
PDF https://openreview.net/pdf?id=Skey4eBYPS
PWC https://paperswithcode.com/paper/convolutional-conditional-neural-processes
Repo
Framework

Understanding Knowledge Distillation in Non-autoregressive Machine Translation

Title Understanding Knowledge Distillation in Non-autoregressive Machine Translation
Authors Anonymous
Abstract Non-autoregressive machine translation (NAT) systems predict a sequence of output tokens in parallel, achieving substantial improvements in generation speed compared to autoregressive models. Existing NAT models usually rely on the technique of knowledge distillation, which creates the training data from a pretrained autoregressive model for better performance. Knowledge distillation is empirically useful, leading to large gains in accuracy for NAT models, but the reason for this success has, as of yet, been unclear. In this paper, we first design systematic experiments to investigate why knowledge distillation is crucial to NAT training. We find that knowledge distillation can reduce the complexity of data sets and help NAT to model the variations in the output data. Furthermore, a strong correlation is observed between the capacity of an NAT model and the optimal complexity of the distilled data for the best translation quality. Based on these findings, we further propose several approaches that can alter the complexity of data sets to improve the performance of NAT models. We achieve the state-of-the-art performance for the NAT-based models, and close the gap with the autoregressive baseline on WMT14 En-De benchmark.
Tasks Machine Translation
Published 2020-01-01
URL https://openreview.net/forum?id=BygFVAEKDH
PDF https://openreview.net/pdf?id=BygFVAEKDH
PWC https://paperswithcode.com/paper/understanding-knowledge-distillation-in-non-1
Repo
Framework

Accelerating Reinforcement Learning Through GPU Atari Emulation

Title Accelerating Reinforcement Learning Through GPU Atari Emulation
Authors Anonymous
Abstract We introduce CuLE (CUDA Learning Environment), a CUDA port of the Atari Learning Environment (ALE) which is used for the development of deep reinforcement algorithms. CuLE overcomes many limitations of existing CPU-based emulators and scales naturally to multiple GPUs. It leverages GPU parallelization to run thousands of games simultaneously and it renders frames directly on the GPU, to avoid the bottleneck arising from the limited CPU-GPU communication bandwidth. CuLE generates up to 155M frames per hour on a single GPU, a finding previously achieved only through a cluster of CPUs. Beyond highlighting the differences between CPU and GPU emulators in the context of reinforcement learning, we show how to leverage the high throughput of CuLE by effective batching of the training data, and show accelerated convergence for A2C+V-trace. CuLE is available at [hidden URL].
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HJgS7p4FPH
PDF https://openreview.net/pdf?id=HJgS7p4FPH
PWC https://paperswithcode.com/paper/accelerating-reinforcement-learning-through
Repo
Framework

Towards Finding Longer Proofs

Title Towards Finding Longer Proofs
Authors Anonymous
Abstract We present a reinforcement learning (RL) based guidance system for automated theorem proving geared towards Finding Longer Proofs (FLoP). FLoP focuses on generalizing from short proofs to longer ones of similar structure. To achieve that, FLoP uses state-of-the-art RL approaches that were previously not applied in theorem proving. In particular, we show that curriculum learning significantly outperforms previous learning-based proof guidance on a synthetic dataset of increasingly difficult arithmetic problems.
Tasks Automated Theorem Proving
Published 2020-01-01
URL https://openreview.net/forum?id=Hkeh21BKPH
PDF https://openreview.net/pdf?id=Hkeh21BKPH
PWC https://paperswithcode.com/paper/towards-finding-longer-proofs-1
Repo
Framework

UNITER: Learning UNiversal Image-TExt Representations

Title UNITER: Learning UNiversal Image-TExt Representations
Authors Anonymous
Abstract Joint image-text embedding is the bedrock for most Vision-and-Language (V+L) tasks, where multimodality inputs are jointly processed for visual and textual understanding. In this paper, we introduce UNITER, a UNiversal Image-TExt Representation, learned through large-scale pre-training over four image-text datasets (COCO, Visual Genome, Conceptual Captions, and SBU Captions), which can power heterogeneous downstream V+L tasks with joint multimodal embeddings. We design three pre-training tasks: Masked Language Modeling (MLM), Image-Text Matching (ITM), and Masked Region Modeling (MRM, with three variants). Different from concurrent work on multimodal pre-training that apply joint random masking to both modalities, we use Conditioned Masking on pre-training tasks (i.e., masked language/region modeling is conditioned on full observation of image/text). Comprehensive analysis shows that conditioned masking yields better performance than unconditioned masking. We also conduct a thorough ablation study to find an optimal combination of pre-training tasks for UNITER. Extensive experiments show that UNITER achieves new state of the art across six V+L tasks over nine datasets, including Visual Question Answering, Image-Text Retrieval, Referring Expression Comprehension, Visual Commonsense Reasoning, Visual Entailment, and NLVR2.
Tasks Language Modelling, Question Answering, Text Matching, Visual Commonsense Reasoning, Visual Question Answering
Published 2020-01-01
URL https://openreview.net/forum?id=S1eL4kBYwr
PDF https://openreview.net/pdf?id=S1eL4kBYwr
PWC https://paperswithcode.com/paper/uniter-learning-universal-image-text
Repo
Framework

Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients

Title Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients
Authors Anonymous
Abstract In recent years, advances in deep learning have enabled the application of reinforcement learning algorithms in complex domains. However, they lack the theoretical guarantees which are present in the tabular setting and suffer from many stability and reproducibility problems \citep{henderson2018deep}. In this work, we suggest a simple approach for improving stability and providing probabilistic performance guarantees in off-policy actor-critic deep reinforcement learning regimes. Experiments on continuous action spaces, in the MuJoCo control suite, show that our proposed method reduces the variance of the process and improves the overall performance.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SJgn464tPB
PDF https://openreview.net/pdf?id=SJgn464tPB
PWC https://paperswithcode.com/paper/stabilizing-off-policy-reinforcement-learning-1
Repo
Framework

Deep Audio Priors Emerge From Harmonic Convolutional Networks

Title Deep Audio Priors Emerge From Harmonic Convolutional Networks
Authors Anonymous
Abstract Convolutional neural networks (CNNs) excel in image recognition and generation. Among many efforts to explain their effectiveness, experiments show that CNNs carry strong inductive biases that capture natural image priors. Do deep networks also have inductive biases for audio signals? In this paper, we empirically show that current network architectures for audio processing do not show strong evidence in capturing such priors. We propose Harmonic Convolution, an operation that helps deep networks distill priors in audio signals by explicitly utilizing the harmonic structure within. This is done by engineering the kernel to be supported by sets of harmonic series, instead of local neighborhoods for convolutional kernels. We show that networks using Harmonic Convolution can reliably model audio priors and achieve high performance in unsupervised audio restoration tasks. With Harmonic Convolution, they also achieve better generalization performance for sound source separation.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rygjHxrYDB
PDF https://openreview.net/pdf?id=rygjHxrYDB
PWC https://paperswithcode.com/paper/deep-audio-priors-emerge-from-harmonic
Repo
Framework

On Concept-Based Explanations in Deep Neural Networks

Title On Concept-Based Explanations in Deep Neural Networks
Authors Anonymous
Abstract Deep neural networks (DNNs) build high-level intelligence on low-level raw features. Understanding of this high-level intelligence can be enabled by deciphering the concepts they base their decisions on, as human-level thinking. In this paper, we study concept-based explainability for DNNs in a systematic framework. First, we define the notion of completeness, which quantifies how sufficient a particular set of concepts is in explaining a model’s prediction behavior. Based on performance and variability motivations, we propose two definitions to quantify completeness. We show that under degenerate conditions, our method is equivalent to Principal Component Analysis. Next, we propose a concept discovery method that considers two additional constraints to encourage the interpretability of the discovered concepts. We use game-theoretic notions to aggregate over sets to define an importance score for each discovered concept, which we call \emph{ConceptSHAP}. On specifically-designed synthetic datasets and real-world text and image datasets, we validate the effectiveness of our framework in finding concepts that are complete in explaining the decision, and interpretable.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=BylWYC4KwH
PDF https://openreview.net/pdf?id=BylWYC4KwH
PWC https://paperswithcode.com/paper/on-concept-based-explanations-in-deep-neural-1
Repo
Framework

Sentence embedding with contrastive multi-views learning

Title Sentence embedding with contrastive multi-views learning
Authors Anonymous
Abstract In this work, we propose a self-supervised method to learn sentence representations with an injection of linguistic knowledge. Multiple linguistic frameworks propose diverse sentence structures from which semantic meaning might be expressed out of compositional words operations. We aim to take advantage of this linguist diversity and learn to represent sentences by contrasting these diverse views. Formally, multiple views of the same sentence are mapped to close representations. On the contrary, views from other sentences are mapped further. By contrasting different linguistic views, we aim at building embeddings which better capture semantic and which are less sensitive to the sentence outward form.
Tasks Sentence Embedding
Published 2020-01-01
URL https://openreview.net/forum?id=rJxGGlSKwH
PDF https://openreview.net/pdf?id=rJxGGlSKwH
PWC https://paperswithcode.com/paper/sentence-embedding-with-contrastive-multi
Repo
Framework
comments powered by Disqus