April 1, 2020

2820 words 14 mins read

Paper Group NANR 50

A shallow feature extraction network with a large receptive field for stereo matching tasks. DeepEnFM: Deep neural networks with Encoder enhanced Factorization Machine. Generating Robust Audio Adversarial Examples using Iterative Proportional Clipping. Limitations for Learning from Point Clouds. Adaptive Structural Fingerprints for Graph Attention …

A shallow feature extraction network with a large receptive field for stereo matching tasks


Title	A shallow feature extraction network with a large receptive field for stereo matching tasks
Authors	Jianguo Liu, Yunjian Feng, Guo Ji, Fuwu Yan
Abstract	Stereo matching is one of the important basic tasks in the computer vision field. In recent years, stereo matching algorithms based on deep learning have achieved excellent performance and become the mainstream research direction. Existing algorithms generally use deep convolutional neural networks (DCNNs) to extract more abstract semantic information, but we believe that the detailed information of the spatial structure is more important for stereo matching tasks. Based on this point of view, this paper proposes a shallow feature extraction network with a large receptive field. The network consists of three parts: a primary feature extraction module, an atrous spatial pyramid pooling (ASPP) module and a feature fusion module. The primary feature extraction network contains only three convolution layers. This network utilizes the basic feature extraction ability of the shallow network to extract and retain the detailed information of the spatial structure. In this paper, the dilated convolution and atrous spatial pyramid pooling (ASPP) module is introduced to increase the size of receptive field. In addition, a feature fusion module is designed, which integrates the feature maps with multiscale receptive fields and mutually complements the feature information of different scales. We replaced the feature extraction part of the existing stereo matching algorithms with our shallow feature extraction network, and achieved state-of-the-art performance on the KITTI 2015 dataset. Compared with the reference network, the number of parameters is reduced by 42%, and the matching accuracy is improved by 1.9%.
Tasks	Stereo Matching
Published	2020-01-01
URL	https://openreview.net/forum?id=H1lKNp4Fvr
PDF	https://openreview.net/pdf?id=H1lKNp4Fvr
PWC	https://paperswithcode.com/paper/a-shallow-feature-extraction-network-with-a
Repo
Framework

DeepEnFM: Deep neural networks with Encoder enhanced Factorization Machine


Title	DeepEnFM: Deep neural networks with Encoder enhanced Factorization Machine
Authors	Anonymous
Abstract	Click Through Rate (CTR) prediction is a critical task in industrial applications, especially for online social and commerce applications. It is challenging to find a proper way to automatically discover the effective cross features in CTR tasks. We propose a novel model for CTR tasks, called Deep neural networks with Encoder enhanced Factorization Machine (DeepEnFM). Instead of learning the cross features directly, DeepEnFM adopts the Transformer encoder as a backbone to align the feature embeddings with the clues of other fields. The embeddings generated from encoder are beneficial for the further feature interactions. Particularly, DeepEnFM utilizes a bilinear approach to generate different similarity functions with respect to different field pairs. Furthermore, the max-pooling method makes DeepEnFM feasible to capture both the supplementary and suppressing information among different attention heads. Our model is validated on the Criteo and Avazu datasets, and achieves state-of-art performance.
Tasks	Click-Through Rate Prediction
Published	2020-01-01
URL	https://openreview.net/forum?id=SJlyta4YPS
PDF	https://openreview.net/pdf?id=SJlyta4YPS
PWC	https://paperswithcode.com/paper/deepenfm-deep-neural-networks-with-encoder
Repo
Framework

Generating Robust Audio Adversarial Examples using Iterative Proportional Clipping


Title	Generating Robust Audio Adversarial Examples using Iterative Proportional Clipping
Authors	Anonymous
Abstract	Audio adversarial examples, imperceptible to humans, have been constructed to attack automatic speech recognition (ASR) systems. However, the adversarial examples generated by existing approaches usually involve notable noise, especially during the periods of silence and pauses, which may lead to the detection of such attacks. This paper proposes a new approach to generate adversarial audios using Iterative Proportional Clipping (IPC), which exploits temporal dependency in original audios to significantly limit human-perceptible noise. Specifically, in every iteration of optimization, we use a backpropagation model to learn the raw perturbation on the original audio to construct our clipping. We then impose a constraint on the perturbation at the positions with lower sound intensity across the time domain to eliminate the perceptible noise during the silent periods or pauses. IPC preserves the linear proportionality between the original audio and the perturbed one to maintain the temporal dependency. We show that the proposed approach can successfully attack the latest state-of-the-art ASR model Wav2letter+, and only requires a few minutes to generate an audio adversarial example. Experimental results also demonstrate that our approach succeeds in preserving temporal dependency and can bypass temporal dependency based defense mechanisms.
Tasks	Speech Recognition
Published	2020-01-01
URL	https://openreview.net/forum?id=HJgFW6EKvH
PDF	https://openreview.net/pdf?id=HJgFW6EKvH
PWC	https://paperswithcode.com/paper/generating-robust-audio-adversarial-examples
Repo
Framework

Limitations for Learning from Point Clouds


Title	Limitations for Learning from Point Clouds
Authors	Anonymous
Abstract	In this paper we prove new universal approximation theorems for deep learning on point clouds that do not assume fixed cardinality. We do this by first generalizing the classical universal approximation theorem to general compact Hausdorff spaces and then applying this to the permutation-invariant architectures presented in ‘PointNet’ (Qi et al) and ‘Deep Sets’ (Zaheer et al). Moreover, though both architectures operate on the same domain, we show that the constant functions are the only functions they can mutually uniformly approximate. In particular, DeepSets architectures cannot uniformly approximate the diameter function but can uniformly approximate the center of mass function but it is the other way around for PointNet.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=r1x63grFvH
PDF	https://openreview.net/pdf?id=r1x63grFvH
PWC	https://paperswithcode.com/paper/limitations-for-learning-from-point-clouds
Repo
Framework

Adaptive Structural Fingerprints for Graph Attention Networks


Title	Adaptive Structural Fingerprints for Graph Attention Networks
Authors	Anonymous
Abstract	Many real-world data sets are represented as graphs, such as citation links, social media, and biological interaction. The volatile graph structure makes it non-trivial to employ convolutional neural networks (CNN’s) for graph data processing. Recently, graph attention network (GAT) has proven a promising attempt by combining graph neural networks with attention mechanism, so as to achieve massage passing in graphs with arbitrary structures. However, the attention in GAT is computed mainly based on the similarity between the node content, while the structures of the graph remains largely unemployed (except in masking the attention out of one-hop neighbors). In this paper, we propose an `````````````````````````````"ADaptive Structural Fingerprint” (ADSF) model to fully exploit both topological details of the graph and content features of the nodes. The key idea is to contextualize each node with a weighted, learnable receptive field encoding rich and diverse local graph structures. By doing this, structural interactions between the nodes can be inferred accurately, thus improving subsequent attention layer as well as the convergence of learning. Furthermore, our model provides a useful platform for different subspaces of node features and various scales of graph structures to ``cross-talk’’ with each other through the learning of multi-head attention, being particularly useful in handling complex real-world data. Encouraging performance is observed on a number of benchmark data sets in node classification. \|
Tasks	Node Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=BJxWx0NYPr
PDF	https://openreview.net/pdf?id=BJxWx0NYPr
PWC	https://paperswithcode.com/paper/adaptive-structural-fingerprints-for-graph
Repo
Framework

What Can Learned Intrinsic Rewards Capture?


Title	What Can Learned Intrinsic Rewards Capture?
Authors	Anonymous
Abstract	Reinforcement learning agents can include different components, such as policies, value functions, state representations, and environment models. Any or all of these can be the loci of knowledge, i.e., structures where knowledge, whether given or learned, can be deposited and reused. Regardless of its composition, the objective of an agent is behave so as to maximise the sum of suitable scalar functions of state: the rewards. As far as the learning algorithm is concerned, these rewards are typically given and immutable. In this paper we instead consider the proposition that the reward function itself may be a good locus of knowledge. This is consistent with a common use, in the literature, of hand-designed intrinsic rewards to improve the learning dynamics of an agent. We adopt a multi-lifetime setting of the Optimal Rewards Framework, and investigate how meta-learning can be used to find good reward functions in a data-driven way. To this end, we propose to meta-learn an intrinsic reward function that allows agents to maximise their extrinsic rewards accumulated until the end of their lifetimes. This long-term lifetime objective allows our learned intrinsic reward to generate systematic multi-episode exploratory behaviour. Through proof-of-concept experiments, we elucidate interesting forms of knowledge that may be captured by a suitably trained intrinsic reward such as the usefulness of exploring uncertain states and rewards.
Tasks	Meta-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=SkgbmyHFDS
PDF	https://openreview.net/pdf?id=SkgbmyHFDS
PWC	https://paperswithcode.com/paper/what-can-learned-intrinsic-rewards-capture
Repo
Framework

Convolutional Conditional Neural Processes


Title	Convolutional Conditional Neural Processes
Authors	Anonymous
Abstract	We introduce the Convolutional Conditional Neural Process (ConvCNP), a new member of the Neural Process family that models translation equivariance in the data. Translation equivariance is an important inductive bias for many learning problems including time series modelling, spatial data, and images. The model embeds data sets into an infinite-dimensional function space, as opposed to finite-dimensional vector spaces. To formalize this notion, we extend the theory of neural representations of sets to include functional representations, and demonstrate that any translation-equivariant embedding can be represented using a convolutional deep-set. We evaluate ConvCNPs in several settings, demonstrating that they achieve state-of-the-art performance compared to existing NPs. We demonstrate that building in translation equivariance enables zero-shot generalization to challenging, out-of-domain tasks.
Tasks	Time Series
Published	2020-01-01
URL	https://openreview.net/forum?id=Skey4eBYPS
PDF	https://openreview.net/pdf?id=Skey4eBYPS
PWC	https://paperswithcode.com/paper/convolutional-conditional-neural-processes
Repo
Framework

Understanding Knowledge Distillation in Non-autoregressive Machine Translation


Title	Understanding Knowledge Distillation in Non-autoregressive Machine Translation
Authors	Anonymous
Abstract	Non-autoregressive machine translation (NAT) systems predict a sequence of output tokens in parallel, achieving substantial improvements in generation speed compared to autoregressive models. Existing NAT models usually rely on the technique of knowledge distillation, which creates the training data from a pretrained autoregressive model for better performance. Knowledge distillation is empirically useful, leading to large gains in accuracy for NAT models, but the reason for this success has, as of yet, been unclear. In this paper, we first design systematic experiments to investigate why knowledge distillation is crucial to NAT training. We find that knowledge distillation can reduce the complexity of data sets and help NAT to model the variations in the output data. Furthermore, a strong correlation is observed between the capacity of an NAT model and the optimal complexity of the distilled data for the best translation quality. Based on these findings, we further propose several approaches that can alter the complexity of data sets to improve the performance of NAT models. We achieve the state-of-the-art performance for the NAT-based models, and close the gap with the autoregressive baseline on WMT14 En-De benchmark.
Tasks	Machine Translation
Published	2020-01-01
URL	https://openreview.net/forum?id=BygFVAEKDH
PDF	https://openreview.net/pdf?id=BygFVAEKDH
PWC	https://paperswithcode.com/paper/understanding-knowledge-distillation-in-non-1
Repo
Framework

Accelerating Reinforcement Learning Through GPU Atari Emulation


Title	Accelerating Reinforcement Learning Through GPU Atari Emulation
Authors	Anonymous
Abstract	We introduce CuLE (CUDA Learning Environment), a CUDA port of the Atari Learning Environment (ALE) which is used for the development of deep reinforcement algorithms. CuLE overcomes many limitations of existing CPU-based emulators and scales naturally to multiple GPUs. It leverages GPU parallelization to run thousands of games simultaneously and it renders frames directly on the GPU, to avoid the bottleneck arising from the limited CPU-GPU communication bandwidth. CuLE generates up to 155M frames per hour on a single GPU, a finding previously achieved only through a cluster of CPUs. Beyond highlighting the differences between CPU and GPU emulators in the context of reinforcement learning, we show how to leverage the high throughput of CuLE by effective batching of the training data, and show accelerated convergence for A2C+V-trace. CuLE is available at [hidden URL].
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HJgS7p4FPH
PDF	https://openreview.net/pdf?id=HJgS7p4FPH
PWC	https://paperswithcode.com/paper/accelerating-reinforcement-learning-through
Repo
Framework

Towards Finding Longer Proofs


Title	Towards Finding Longer Proofs
Authors	Anonymous
Abstract	We present a reinforcement learning (RL) based guidance system for automated theorem proving geared towards Finding Longer Proofs (FLoP). FLoP focuses on generalizing from short proofs to longer ones of similar structure. To achieve that, FLoP uses state-of-the-art RL approaches that were previously not applied in theorem proving. In particular, we show that curriculum learning significantly outperforms previous learning-based proof guidance on a synthetic dataset of increasingly difficult arithmetic problems.
Tasks	Automated Theorem Proving
Published	2020-01-01
URL	https://openreview.net/forum?id=Hkeh21BKPH
PDF	https://openreview.net/pdf?id=Hkeh21BKPH
PWC	https://paperswithcode.com/paper/towards-finding-longer-proofs-1
Repo
Framework

UNITER: Learning UNiversal Image-TExt Representations


Title	UNITER: Learning UNiversal Image-TExt Representations
Authors	Anonymous
Abstract	Joint image-text embedding is the bedrock for most Vision-and-Language (V+L) tasks, where multimodality inputs are jointly processed for visual and textual understanding. In this paper, we introduce UNITER, a UNiversal Image-TExt Representation, learned through large-scale pre-training over four image-text datasets (COCO, Visual Genome, Conceptual Captions, and SBU Captions), which can power heterogeneous downstream V+L tasks with joint multimodal embeddings. We design three pre-training tasks: Masked Language Modeling (MLM), Image-Text Matching (ITM), and Masked Region Modeling (MRM, with three variants). Different from concurrent work on multimodal pre-training that apply joint random masking to both modalities, we use Conditioned Masking on pre-training tasks (i.e., masked language/region modeling is conditioned on full observation of image/text). Comprehensive analysis shows that conditioned masking yields better performance than unconditioned masking. We also conduct a thorough ablation study to find an optimal combination of pre-training tasks for UNITER. Extensive experiments show that UNITER achieves new state of the art across six V+L tasks over nine datasets, including Visual Question Answering, Image-Text Retrieval, Referring Expression Comprehension, Visual Commonsense Reasoning, Visual Entailment, and NLVR2.
Tasks	Language Modelling, Question Answering, Text Matching, Visual Commonsense Reasoning, Visual Question Answering
Published	2020-01-01
URL	https://openreview.net/forum?id=S1eL4kBYwr
PDF	https://openreview.net/pdf?id=S1eL4kBYwr
PWC	https://paperswithcode.com/paper/uniter-learning-universal-image-text
Repo
Framework

Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients


Title	Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients
Authors	Anonymous
Abstract	In recent years, advances in deep learning have enabled the application of reinforcement learning algorithms in complex domains. However, they lack the theoretical guarantees which are present in the tabular setting and suffer from many stability and reproducibility problems \citep{henderson2018deep}. In this work, we suggest a simple approach for improving stability and providing probabilistic performance guarantees in off-policy actor-critic deep reinforcement learning regimes. Experiments on continuous action spaces, in the MuJoCo control suite, show that our proposed method reduces the variance of the process and improves the overall performance.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJgn464tPB
PDF	https://openreview.net/pdf?id=SJgn464tPB
PWC	https://paperswithcode.com/paper/stabilizing-off-policy-reinforcement-learning-1
Repo
Framework

Deep Audio Priors Emerge From Harmonic Convolutional Networks


Title	Deep Audio Priors Emerge From Harmonic Convolutional Networks
Authors	Anonymous
Abstract	Convolutional neural networks (CNNs) excel in image recognition and generation. Among many efforts to explain their effectiveness, experiments show that CNNs carry strong inductive biases that capture natural image priors. Do deep networks also have inductive biases for audio signals? In this paper, we empirically show that current network architectures for audio processing do not show strong evidence in capturing such priors. We propose Harmonic Convolution, an operation that helps deep networks distill priors in audio signals by explicitly utilizing the harmonic structure within. This is done by engineering the kernel to be supported by sets of harmonic series, instead of local neighborhoods for convolutional kernels. We show that networks using Harmonic Convolution can reliably model audio priors and achieve high performance in unsupervised audio restoration tasks. With Harmonic Convolution, they also achieve better generalization performance for sound source separation.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rygjHxrYDB
PDF	https://openreview.net/pdf?id=rygjHxrYDB
PWC	https://paperswithcode.com/paper/deep-audio-priors-emerge-from-harmonic
Repo
Framework

On Concept-Based Explanations in Deep Neural Networks


Title	On Concept-Based Explanations in Deep Neural Networks
Authors	Anonymous
Abstract	Deep neural networks (DNNs) build high-level intelligence on low-level raw features. Understanding of this high-level intelligence can be enabled by deciphering the concepts they base their decisions on, as human-level thinking. In this paper, we study concept-based explainability for DNNs in a systematic framework. First, we define the notion of completeness, which quantifies how sufficient a particular set of concepts is in explaining a model’s prediction behavior. Based on performance and variability motivations, we propose two definitions to quantify completeness. We show that under degenerate conditions, our method is equivalent to Principal Component Analysis. Next, we propose a concept discovery method that considers two additional constraints to encourage the interpretability of the discovered concepts. We use game-theoretic notions to aggregate over sets to define an importance score for each discovered concept, which we call \emph{ConceptSHAP}. On specifically-designed synthetic datasets and real-world text and image datasets, we validate the effectiveness of our framework in finding concepts that are complete in explaining the decision, and interpretable.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BylWYC4KwH
PDF	https://openreview.net/pdf?id=BylWYC4KwH
PWC	https://paperswithcode.com/paper/on-concept-based-explanations-in-deep-neural-1
Repo
Framework

Sentence embedding with contrastive multi-views learning


Title	Sentence embedding with contrastive multi-views learning
Authors	Anonymous
Abstract	In this work, we propose a self-supervised method to learn sentence representations with an injection of linguistic knowledge. Multiple linguistic frameworks propose diverse sentence structures from which semantic meaning might be expressed out of compositional words operations. We aim to take advantage of this linguist diversity and learn to represent sentences by contrasting these diverse views. Formally, multiple views of the same sentence are mapped to close representations. On the contrary, views from other sentences are mapped further. By contrasting different linguistic views, we aim at building embeddings which better capture semantic and which are less sensitive to the sentence outward form.
Tasks	Sentence Embedding
Published	2020-01-01
URL	https://openreview.net/forum?id=rJxGGlSKwH
PDF	https://openreview.net/pdf?id=rJxGGlSKwH
PWC	https://paperswithcode.com/paper/sentence-embedding-with-contrastive-multi
Repo
Framework