April 1, 2020

2994 words 15 mins read

Paper Group NAWR 2

Quantifying the Cost of Reliable Photo Authentication via High-Performance Learned Lossy Representations. NAS evaluation is frustratingly hard. Discourse-Based Evaluation of Language Understanding. Meta-learning curiosity algorithms. Black-box Adversarial Attacks with Bayesian Optimization. Neural Tangents: Fast and Easy Infinite Neural Networks in …

Quantifying the Cost of Reliable Photo Authentication via High-Performance Learned Lossy Representations


Title	Quantifying the Cost of Reliable Photo Authentication via High-Performance Learned Lossy Representations
Authors	Anonymous
Abstract	Detection of photo manipulation relies on subtle statistical traces, notoriously removed by aggressive lossy compression employed online. We demonstrate that end-to-end modeling of complex photo dissemination channels allows for codec optimization with explicit provenance objectives. We design a lightweight trainable lossy image codec, that delivers competitive rate-distortion performance, on par with best hand-engineered alternatives, but has lower computational footprint on modern GPU-enabled platforms. Our results show that significant improvements in manipulation detection accuracy are possible at fractional costs in bandwidth/storage. Our codec improved the accuracy from 37% to 86% even at very low bit-rates, well below the practicality of JPEG (QF 20).
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HyxG3p4twS
PDF	https://openreview.net/pdf?id=HyxG3p4twS
PWC	https://paperswithcode.com/paper/quantifying-the-cost-of-reliable-photo
Repo	https://github.com/pkorus/neural-imaging
Framework	tf

NAS evaluation is frustratingly hard


Title	NAS evaluation is frustratingly hard
Authors	Anonymous
Abstract	Neural Architecture Search (NAS) is an exciting new field which promises to be as much as a game-changer as Convolutional Neural Networks were in 2012. Despite many great works leading to substantial improvements on a variety of tasks, comparison between different methods is still very much an open issue. While most algorithms are tested on the same datasets, there is no shared experimental protocol followed by all. As such, and due to the under-use of ablation studies, there is a lack of clarity regarding why certain methods are more effective than others. Our first contribution is a benchmark of 8 NAS methods on 5 datasets. To overcome the hurdle of comparing methods with different search spaces, we propose using a method’s relative improvement over the randomly sampled average architecture, which effectively removes advantages arising from expertly engineered search spaces or training protocols. Surprisingly, we find that many NAS techniques struggle to significantly beat the average architecture baseline. We perform further experiments with the commonly used DARTS search space in order to understand the contribution of each component in the NAS pipeline. These experiments highlight that: (i) the use of tricks in the evaluation protocol has a predominant impact on the reported performance of architectures; (ii) the cell-based search space has a very narrow accuracy range, such that the seed has a considerable impact on architecture rankings; (iii) the hand-designed macrostructure (cells) is more important than the searched micro-structure (operations); and (iv) the depth-gap is a real phenomenon, evidenced by the change in rankings between 8 and 20 cell architectures. To conclude, we suggest best practices, that we hope will prove useful for the community and help mitigate current NAS pitfalls, e.g. difficulties in reproducibility and comparison of search methods. We provide the code used for our experiments at link-to-come.
Tasks	Neural Architecture Search
Published	2020-01-01
URL	https://openreview.net/forum?id=HygrdpVKvr
PDF	https://openreview.net/pdf?id=HygrdpVKvr
PWC	https://paperswithcode.com/paper/nas-evaluation-is-frustratingly-hard
Repo	https://github.com/antoyang/NAS-Benchmark
Framework	pytorch

Discourse-Based Evaluation of Language Understanding


Title	Discourse-Based Evaluation of Language Understanding
Authors	Anonymous
Abstract	New models for natural language understanding have made unusual progress recently, leading to claims of universal text representations. However, current benchmarks are predominantly targeting semantic phenomena; we make the case that discourse and pragmatics need to take center stage in the evaluation of natural language understanding. We introduce DiscEval, a new benchmark for the evaluation of natural language understanding, that unites 11 discourse-focused evaluation datasets. DiscEval can be used as supplementary training data in a multi-task learning setup, and is publicly available, alongside the code for gathering and preprocessing the datasets. Using our evaluation suite, we show that natural language inference, a widely used pretraining task, does not result in genuinely universal representations, which opens a new challenge for multi-task learning.
Tasks	Multi-Task Learning, Natural Language Inference
Published	2020-01-01
URL	https://openreview.net/forum?id=B1em8TVtPr
PDF	https://openreview.net/pdf?id=B1em8TVtPr
PWC	https://paperswithcode.com/paper/discourse-based-evaluation-of-language-1
Repo	https://github.com/disceval/DiscEval
Framework	none

Meta-learning curiosity algorithms


Title	Meta-learning curiosity algorithms
Authors	Anonymous
Abstract	Exploration is a key component of successful reinforcement learning, but optimal approaches are computationally intractable, so researchers have focused on hand-designing mechanisms based on exploration bonuses and intrinsic reward, some inspired by curious behavior in natural systems. In this work, we propose a strategy for encoding curiosity algorithms as programs in a domain-specific language and searching, during a meta-learning phase, for algorithms that enable RL agents to perform well in new domains. Our rich language of programs, which can combine neural networks with other building blocks including nearest-neighbor modules and can choose its own loss functions, enables the expression of highly generalizable programs that perform well in domains as disparate as grid navigation with image input, acrobot, lunar lander, ant and hopper. To make this approach feasible, we develop several pruning techniques, including learning to predict a program’s success based on its syntactic properties. We demonstrate the effectiveness of the approach empirically, finding curiosity strategies that are similar to those in published literature, as well as novel strategies that are competitive with them and generalize well.
Tasks	Meta-Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=BygdyxHFDS
PDF	https://openreview.net/pdf?id=BygdyxHFDS
PWC	https://paperswithcode.com/paper/meta-learning-curiosity-algorithms
Repo	https://github.com/mfranzs/meta-learning-curiosity-algorithms
Framework	tf

Black-box Adversarial Attacks with Bayesian Optimization


Title	Black-box Adversarial Attacks with Bayesian Optimization
Authors	Anonymous
Abstract	We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples using information limited to loss function evaluations of input-output pairs. We use Bayesian optimization (BO) to specifically cater to scenarios involving low query budgets to develop query efficient adversarial attacks. We alleviate the issues surrounding BO in regards to optimizing high dimensional deep learning models by effective dimension upsampling techniques. Our proposed approach achieves performance comparable to the state of the art black-box adversarial attacks albeit with a much lower average query count. In particular, in low query budget regimes, our proposed method reduces the query count up to 80% with respect to the state of the art methods.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=H1xKBCEYDr
PDF	https://openreview.net/pdf?id=H1xKBCEYDr
PWC	https://paperswithcode.com/paper/black-box-adversarial-attacks-with-bayesian-1
Repo	https://github.com/snu-mllab/parsimonious-blackbox-attack
Framework	tf

Neural Tangents: Fast and Easy Infinite Neural Networks in Python


Title	Neural Tangents: Fast and Easy Infinite Neural Networks in Python
Authors	Anonymous
Abstract	Neural Tangents is a library for working with infinite-width neural networks. It provides a high-level API for specifying complex and hierarchical neural network architectures. These networks can then be trained and evaluated either at finite-width as usual, or in their infinite-width limit. For the infinite-width networks, Neural Tangents performs exact inference either via Bayes’ rule or gradient descent, and generates the corresponding Neural Network Gaussian Process and Neural Tangent kernels. Additionally, Neural Tangents provides tools to study gradient descent training dynamics of wide but finite networks. The entire library runs out-of-the-box on CPU, GPU, or TPU. All computations can be automatically distributed over multiple accelerators with near-linear scaling in the number of devices. In addition to the repository below, we provide an accompanying interactive Colab notebook at https://colab.sandbox.google.com/github/neural-tangents/neural-tangents/blob/master/notebooks/neural_tangents_cookbook.ipynb
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SklD9yrFPS
PDF	https://openreview.net/pdf?id=SklD9yrFPS
PWC	https://paperswithcode.com/paper/neural-tangents-fast-and-easy-infinite-neural
Repo	https://github.com/neural-tangents/neural-tangents
Framework	jax

HiLLoC: lossless image compression with hierarchical latent variable models


Title	HiLLoC: lossless image compression with hierarchical latent variable models
Authors	Anonymous
Abstract	We make the following striking observation: fully convolutional VAE models trained on 32x32 ImageNet can generalize well, not just to 64x64 but also to far larger photographs, with no changes to the model. We use this property, applying fully convolutional models to lossless compression, demonstrating a method to scale the VAE-based ‘Bits-Back with ANS’ algorithm for lossless compression to large color photographs, and achieving state of the art for compression of full size ImageNet images. We release Craystack, an open source library for convenient prototyping of lossless compression using probabilistic models, along with full implementations of all of our compression results.
Tasks	Image Compression, Latent Variable Models
Published	2020-01-01
URL	https://openreview.net/forum?id=r1lZgyBYwS
PDF	https://openreview.net/pdf?id=r1lZgyBYwS
PWC	https://paperswithcode.com/paper/hilloc-lossless-image-compression-with
Repo	https://github.com/hilloc-submission/hilloc
Framework	tf

Self-Supervised State-Control through Intrinsic Mutual Information Rewards


Title	Self-Supervised State-Control through Intrinsic Mutual Information Rewards
Authors	Anonymous
Abstract	Learning to discover useful skills without a manually-designed reward function would have many applications, yet is still a challenge for reinforcement learning. In this paper, we propose Mutual Information-based State-Control (MISC), a new self-supervised Reinforcement Learning approach for learning to control states of interest without any external reward function. We formulate the intrinsic objective as rewarding the skills that maximize the mutual information between the context states and the states of interest. For example, in robotic manipulation tasks, the context states are the robot states and the states of interest are the states of an object. We evaluate our approach for different simulated robotic manipulation tasks from OpenAI Gym. We show that our method is able to learn to manipulate the object, such as pushing and picking up, purely based on the intrinsic mutual information rewards. Furthermore, the pre-trained policy and mutual information discriminator can be used to accelerate learning to achieve high task rewards. Our results show that the mutual information between the context states and the states of interest can be an effective ingredient for overcoming challenges in robotic manipulation tasks with sparse rewards. A video showing experimental results is available at https://youtu.be/cLRrkd3Y7vU
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HygSq3VFvH
PDF	https://openreview.net/pdf?id=HygSq3VFvH
PWC	https://paperswithcode.com/paper/self-supervised-state-control-through
Repo	https://github.com/misc-project/misc
Framework	none

LambdaNet: Probabilistic Type Inference using Graph Neural Networks


Title	LambdaNet: Probabilistic Type Inference using Graph Neural Networks
Authors	Jiayi Wei, Maruth Goyal, Greg Durrett, Isil Dillig
Abstract	As gradual typing becomes increasingly popular in languages like Python and TypeScript, there is a growing need to infer type annotations automatically. While type annotations help with tasks like code completion and static error catching, these annotations cannot be fully inferred by compilers and are tedious to annotate by hand. This paper proposes a probabilistic type inference scheme for TypeScript based on a graph neural network. Our approach first uses lightweight source code analysis to generate a program abstraction called a type dependency graph, which links type variables with logical constraints as well as name and usage information. Given this program abstraction, we then use a graph neural network to propagate information between related type variables and eventually make type predictions. Our neural architecture can predict both standard types, like number or string, as well as user-defined types that have not been encountered during training. Our experimental results show that our approach outperforms prior work in this space by 14% (absolute) on library types, while having the ability to make type predictions that are out of scope for existing techniques.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Hkx6hANtwH
PDF	https://openreview.net/pdf?id=Hkx6hANtwH
PWC	https://paperswithcode.com/paper/lambdanet-probabilistic-type-inference-using
Repo	https://github.com/MrVPlusOne/LambdaNet
Framework	none

MUSE: Multi-Scale Attention Model for Sequence to Sequence Learning


Title	MUSE: Multi-Scale Attention Model for Sequence to Sequence Learning
Authors	Anonymous
Abstract	Transformers have achieved state-of-the-art results on a variety of natural language processing tasks. Despite good performance, Transformers are still weak in long sentence modeling where the global attention map is too dispersed to capture valuable information. In such case, the local/token features that are also significant to sequence modeling are omitted to some extent. To address this problem, we propose a Multi-scale attention model (MUSE) by concatenating attention networks with convolutional networks and position-wise feed-forward networks to explicitly capture local and token features. Considering the parameter size and computation efficiency, we re-use the feed-forward layer in the original Transformer and adopt a lightweight dynamic convolution as implementation. Experimental results show that the proposed model achieves substantial performance improvements over Transformer, especially on long sentences, and pushes the state-of-the-art from 35.6 to 36.2 on IWSLT 2014 German to English translation task, from 30.6 to 31.3 on IWSLT 2015 English to Vietnamese translation task. We also reach the state-of-art performance on WMT 2014 English to French translation dataset, with a BLEU score of 43.2.
Tasks	Machine Translation
Published	2020-01-01
URL	https://openreview.net/forum?id=SJe-3REFwr
PDF	https://openreview.net/pdf?id=SJe-3REFwr
PWC	https://paperswithcode.com/paper/muse-multi-scale-attention-model-for-sequence
Repo	https://github.com/lancopku/MUSE
Framework	pytorch

Recurrent Independent Mechanisms


Title	Recurrent Independent Mechanisms
Authors	Anonymous
Abstract	Learning modular structures which reflect the dynamics of the environment can lead to better generalization and robustness to changes which only affect a few of the underlying causes. We propose Recurrent Independent Mechanisms (RIMs), a new recurrent architecture in which multiple groups of recurrent cells operate with nearly independent transition dynamics, communicate only sparingly through the bottleneck of attention, and are only updated at time steps where they are most relevant. We show that this leads to specialization amongst the RIMs, which in turn allows for dramatically improved generalization on tasks where some factors of variation differ systematically between training and evaluation.
Tasks	Atari Games
Published	2020-01-01
URL	https://openreview.net/forum?id=BylaUTNtPS
PDF	https://openreview.net/pdf?id=BylaUTNtPS
PWC	https://paperswithcode.com/paper/recurrent-independent-mechanisms-1
Repo	https://github.com/maximecb/gym-minigrid
Framework	pytorch

Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification


Title	Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification
Authors	Anonymous
Abstract	Person re-identification (re-ID) aims at identifying the same persons’ images across different cameras. However, domain diversities between different datasets pose an evident challenge for adapting the re-ID model trained on one dataset to another one. State-of-the-art unsupervised domain adaptation methods for person re-ID transferred the learned knowledge from the source domain by optimizing with pseudo labels created by clustering algorithms on the target domain. Although they achieved state-of-the-art performances, the inevitable label noise caused by the clustering procedure was ignored. Such noisy pseudo labels substantially hinders the model’s capability on further improving feature representations on the target domain. In order to mitigate the effects of noisy pseudo labels, we propose to softly refine the pseudo labels in the target domain by proposing an unsupervised framework, Mutual Mean-Teaching (MMT), to learn better features from the target domain via off-line refined hard pseudo labels and on-line refined soft pseudo labels in an alternative training manner. In addition, the common practice is to adopt both the classification loss and the triplet loss jointly for achieving optimal performances in person re-ID models. However, conventional triplet loss cannot work with softly refined labels. To solve this problem, a novel soft softmax-triplet loss is proposed to support learning with soft pseudo triplet labels for achieving the optimal domain adaptation performance. The proposed MMT framework achieves considerable improvements of 14.4%, 18.2%, 13.1% and 16.4% mAP on Market-to-Duke, Duke-to-Market, Market-to-MSMT and Duke-to-MSMT unsupervised domain adaptation tasks.
Tasks	Domain Adaptation, Person Re-Identification, Unsupervised Domain Adaptation
Published	2020-01-01
URL	https://openreview.net/forum?id=rJlnOhVYPS
PDF	https://openreview.net/pdf?id=rJlnOhVYPS
PWC	https://paperswithcode.com/paper/mutual-mean-teaching-pseudo-label-refinery
Repo	https://github.com/yxgeee/MMT
Framework	pytorch

AHash: A Load-Balanced One Permutation Hash


Title	AHash: A Load-Balanced One Permutation Hash
Authors	Anonymous
Abstract	Minwise Hashing (MinHash) is a fundamental method to compute set similarities and compact high-dimensional data for efficient learning and searching. The bottleneck of MinHash is computing k (usually hundreds) MinHash values. One Permutation Hashing (OPH) only requires one permutation (hash function) to get k MinHash values by dividing elements into k bins. One drawback of OPH is that the load of the bins (the number of elements in a bin) could be unbalanced, which leads to the existence of empty bins and false similarity computation. Several strategies for densification, that is, filling empty bins, have been proposed. However, the densification is just a remedial strategy and cannot eliminate the error incurred by the unbalanced load. Unlike the densification to fill the empty bins after they undesirably occur, our design goal is to balance the load so as to reduce the empty bins in advance. In this paper, we propose a load-balanced hashing, Amortization Hashing (AHash), which can generate as few empty bins as possible. Therefore, AHash is more load-balanced and accurate without hurting runtime efficiency compared with OPH and densification strategies. Our experiments on real datasets validate the claim. All source codes and datasets have been provided as Supplementary Materials and released on GitHub anonymously.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJe9fTNtPS
PDF	https://openreview.net/pdf?id=rJe9fTNtPS
PWC	https://paperswithcode.com/paper/ahash-a-load-balanced-one-permutation-hash
Repo	https://github.com/AHashCodes/AHash
Framework	none

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia


Title	WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia
Authors	Anonymous
Abstract	We present an approach based on multilingual sentence embeddings to automatically extract parallel sentences from the content of Wikipedia articles in 85 languages, including several dialects or low-resource languages. We do not limit the extraction process to alignments with English, but systematically consider all possible language pairs. In total, we are able to extract 135M parallel sentences for 1620 different language pairs, out of which only 34M are aligned with English. This corpus of parallel sentences is freely available (URL anonymized) To get an indication on the quality of the extracted bitexts, we train neural MT baseline systems on the mined data only for 1886 languages pairs, and evaluate them on the TED corpus, achieving strong BLEU scores for many language pairs. The WikiMatrix bitexts seem to be particularly interesting to train MT systems between distant languages without the need to pivot through English.
Tasks	Sentence Embeddings
Published	2020-01-01
URL	https://openreview.net/forum?id=rkeYL1SFvH
PDF	https://openreview.net/pdf?id=rkeYL1SFvH
PWC	https://paperswithcode.com/paper/wikimatrix-mining-135m-parallel-sentences-in-1
Repo	https://github.com/facebookresearch/LASER
Framework	pytorch

Network Embedding with Completely-imbalanced Labels


Title	Network Embedding with Completely-imbalanced Labels
Authors	Zheng Wang, Xiaojun Ye, Chaokun Wang, Jian Cui, and Philip S. Yu, Fellow, IEEE
Abstract	Network embedding, aiming to project a network into a low-dimensional space, is increasingly becoming a focus of network research. Semi-supervised network embedding takes advantage of labeled data, and has shown promising performance. However, existing semi-supervised methods would get unappealing results in the completely-imbalanced label setting where some classes have no labeled nodes at all. To alleviate this, we propose two novel semi-supervised network embedding methods. The first one is a shallow method named RSDNE. Specifically, to benefit from the completely-imbalanced labels, RSDNE guarantees both intra-class similarity and inter-class dissimilarity in an approximate way. The other method is RECT which is a new class of graph neural networks. Different from RSDNE, to benefit from the completely-imbalanced labels, RECT explores the knowledge of class-semantic descriptions. This enables RECT to handle networks with node features and multi-label setting. Experimental results on several real-world datasets demonstrate the superiority of the proposed methods.
Tasks	Network Embedding
Published	2020-02-03
URL	https://ieeexplore.ieee.xilesou.top/abstract/document/8979355
PDF	https://ieeexplore.ieee.xilesou.top/abstract/document/8979355
PWC	https://paperswithcode.com/paper/network-embedding-with-completely-imbalanced
Repo	https://github.com/zhengwang100/RECT
Framework	pytorch