April 1, 2020

2994 words 15 mins read

Paper Group NAWR 2

Paper Group NAWR 2

Quantifying the Cost of Reliable Photo Authentication via High-Performance Learned Lossy Representations. NAS evaluation is frustratingly hard. Discourse-Based Evaluation of Language Understanding. Meta-learning curiosity algorithms. Black-box Adversarial Attacks with Bayesian Optimization. Neural Tangents: Fast and Easy Infinite Neural Networks in …

Quantifying the Cost of Reliable Photo Authentication via High-Performance Learned Lossy Representations

Title Quantifying the Cost of Reliable Photo Authentication via High-Performance Learned Lossy Representations
Authors Anonymous
Abstract Detection of photo manipulation relies on subtle statistical traces, notoriously removed by aggressive lossy compression employed online. We demonstrate that end-to-end modeling of complex photo dissemination channels allows for codec optimization with explicit provenance objectives. We design a lightweight trainable lossy image codec, that delivers competitive rate-distortion performance, on par with best hand-engineered alternatives, but has lower computational footprint on modern GPU-enabled platforms. Our results show that significant improvements in manipulation detection accuracy are possible at fractional costs in bandwidth/storage. Our codec improved the accuracy from 37% to 86% even at very low bit-rates, well below the practicality of JPEG (QF 20).
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HyxG3p4twS
PDF https://openreview.net/pdf?id=HyxG3p4twS
PWC https://paperswithcode.com/paper/quantifying-the-cost-of-reliable-photo
Repo https://github.com/pkorus/neural-imaging
Framework tf

NAS evaluation is frustratingly hard

Title NAS evaluation is frustratingly hard
Authors Anonymous
Abstract Neural Architecture Search (NAS) is an exciting new field which promises to be as much as a game-changer as Convolutional Neural Networks were in 2012. Despite many great works leading to substantial improvements on a variety of tasks, comparison between different methods is still very much an open issue. While most algorithms are tested on the same datasets, there is no shared experimental protocol followed by all. As such, and due to the under-use of ablation studies, there is a lack of clarity regarding why certain methods are more effective than others. Our first contribution is a benchmark of 8 NAS methods on 5 datasets. To overcome the hurdle of comparing methods with different search spaces, we propose using a method’s relative improvement over the randomly sampled average architecture, which effectively removes advantages arising from expertly engineered search spaces or training protocols. Surprisingly, we find that many NAS techniques struggle to significantly beat the average architecture baseline. We perform further experiments with the commonly used DARTS search space in order to understand the contribution of each component in the NAS pipeline. These experiments highlight that: (i) the use of tricks in the evaluation protocol has a predominant impact on the reported performance of architectures; (ii) the cell-based search space has a very narrow accuracy range, such that the seed has a considerable impact on architecture rankings; (iii) the hand-designed macrostructure (cells) is more important than the searched micro-structure (operations); and (iv) the depth-gap is a real phenomenon, evidenced by the change in rankings between 8 and 20 cell architectures. To conclude, we suggest best practices, that we hope will prove useful for the community and help mitigate current NAS pitfalls, e.g. difficulties in reproducibility and comparison of search methods. We provide the code used for our experiments at link-to-come.
Tasks Neural Architecture Search
Published 2020-01-01
URL https://openreview.net/forum?id=HygrdpVKvr
PDF https://openreview.net/pdf?id=HygrdpVKvr
PWC https://paperswithcode.com/paper/nas-evaluation-is-frustratingly-hard
Repo https://github.com/antoyang/NAS-Benchmark
Framework pytorch

Discourse-Based Evaluation of Language Understanding

Title Discourse-Based Evaluation of Language Understanding
Authors Anonymous
Abstract New models for natural language understanding have made unusual progress recently, leading to claims of universal text representations. However, current benchmarks are predominantly targeting semantic phenomena; we make the case that discourse and pragmatics need to take center stage in the evaluation of natural language understanding. We introduce DiscEval, a new benchmark for the evaluation of natural language understanding, that unites 11 discourse-focused evaluation datasets. DiscEval can be used as supplementary training data in a multi-task learning setup, and is publicly available, alongside the code for gathering and preprocessing the datasets. Using our evaluation suite, we show that natural language inference, a widely used pretraining task, does not result in genuinely universal representations, which opens a new challenge for multi-task learning.
Tasks Multi-Task Learning, Natural Language Inference
Published 2020-01-01
URL https://openreview.net/forum?id=B1em8TVtPr
PDF https://openreview.net/pdf?id=B1em8TVtPr
PWC https://paperswithcode.com/paper/discourse-based-evaluation-of-language-1
Repo https://github.com/disceval/DiscEval
Framework none

Meta-learning curiosity algorithms

Title Meta-learning curiosity algorithms
Authors Anonymous
Abstract Exploration is a key component of successful reinforcement learning, but optimal approaches are computationally intractable, so researchers have focused on hand-designing mechanisms based on exploration bonuses and intrinsic reward, some inspired by curious behavior in natural systems. In this work, we propose a strategy for encoding curiosity algorithms as programs in a domain-specific language and searching, during a meta-learning phase, for algorithms that enable RL agents to perform well in new domains. Our rich language of programs, which can combine neural networks with other building blocks including nearest-neighbor modules and can choose its own loss functions, enables the expression of highly generalizable programs that perform well in domains as disparate as grid navigation with image input, acrobot, lunar lander, ant and hopper. To make this approach feasible, we develop several pruning techniques, including learning to predict a program’s success based on its syntactic properties. We demonstrate the effectiveness of the approach empirically, finding curiosity strategies that are similar to those in published literature, as well as novel strategies that are competitive with them and generalize well.
Tasks Meta-Learning
Published 2020-01-01
URL https://openreview.net/forum?id=BygdyxHFDS
PDF https://openreview.net/pdf?id=BygdyxHFDS
PWC https://paperswithcode.com/paper/meta-learning-curiosity-algorithms
Repo https://github.com/mfranzs/meta-learning-curiosity-algorithms
Framework tf

Black-box Adversarial Attacks with Bayesian Optimization

Title Black-box Adversarial Attacks with Bayesian Optimization
Authors Anonymous
Abstract We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples using information limited to loss function evaluations of input-output pairs. We use Bayesian optimization (BO) to specifically cater to scenarios involving low query budgets to develop query efficient adversarial attacks. We alleviate the issues surrounding BO in regards to optimizing high dimensional deep learning models by effective dimension upsampling techniques. Our proposed approach achieves performance comparable to the state of the art black-box adversarial attacks albeit with a much lower average query count. In particular, in low query budget regimes, our proposed method reduces the query count up to 80% with respect to the state of the art methods.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=H1xKBCEYDr
PDF https://openreview.net/pdf?id=H1xKBCEYDr
PWC https://paperswithcode.com/paper/black-box-adversarial-attacks-with-bayesian-1
Repo https://github.com/snu-mllab/parsimonious-blackbox-attack
Framework tf

Neural Tangents: Fast and Easy Infinite Neural Networks in Python

Title Neural Tangents: Fast and Easy Infinite Neural Networks in Python
Authors Anonymous
Abstract Neural Tangents is a library for working with infinite-width neural networks. It provides a high-level API for specifying complex and hierarchical neural network architectures. These networks can then be trained and evaluated either at finite-width as usual, or in their infinite-width limit. For the infinite-width networks, Neural Tangents performs exact inference either via Bayes’ rule or gradient descent, and generates the corresponding Neural Network Gaussian Process and Neural Tangent kernels. Additionally, Neural Tangents provides tools to study gradient descent training dynamics of wide but finite networks. The entire library runs out-of-the-box on CPU, GPU, or TPU. All computations can be automatically distributed over multiple accelerators with near-linear scaling in the number of devices. In addition to the repository below, we provide an accompanying interactive Colab notebook at https://colab.sandbox.google.com/github/neural-tangents/neural-tangents/blob/master/notebooks/neural_tangents_cookbook.ipynb
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=SklD9yrFPS
PDF https://openreview.net/pdf?id=SklD9yrFPS
PWC https://paperswithcode.com/paper/neural-tangents-fast-and-easy-infinite-neural
Repo https://github.com/neural-tangents/neural-tangents
Framework jax

HiLLoC: lossless image compression with hierarchical latent variable models

Title HiLLoC: lossless image compression with hierarchical latent variable models
Authors Anonymous
Abstract We make the following striking observation: fully convolutional VAE models trained on 32x32 ImageNet can generalize well, not just to 64x64 but also to far larger photographs, with no changes to the model. We use this property, applying fully convolutional models to lossless compression, demonstrating a method to scale the VAE-based ‘Bits-Back with ANS’ algorithm for lossless compression to large color photographs, and achieving state of the art for compression of full size ImageNet images. We release Craystack, an open source library for convenient prototyping of lossless compression using probabilistic models, along with full implementations of all of our compression results.
Tasks Image Compression, Latent Variable Models
Published 2020-01-01
URL https://openreview.net/forum?id=r1lZgyBYwS
PDF https://openreview.net/pdf?id=r1lZgyBYwS
PWC https://paperswithcode.com/paper/hilloc-lossless-image-compression-with
Repo https://github.com/hilloc-submission/hilloc
Framework tf

Self-Supervised State-Control through Intrinsic Mutual Information Rewards

Title Self-Supervised State-Control through Intrinsic Mutual Information Rewards
Authors Anonymous
Abstract Learning to discover useful skills without a manually-designed reward function would have many applications, yet is still a challenge for reinforcement learning. In this paper, we propose Mutual Information-based State-Control (MISC), a new self-supervised Reinforcement Learning approach for learning to control states of interest without any external reward function. We formulate the intrinsic objective as rewarding the skills that maximize the mutual information between the context states and the states of interest. For example, in robotic manipulation tasks, the context states are the robot states and the states of interest are the states of an object. We evaluate our approach for different simulated robotic manipulation tasks from OpenAI Gym. We show that our method is able to learn to manipulate the object, such as pushing and picking up, purely based on the intrinsic mutual information rewards. Furthermore, the pre-trained policy and mutual information discriminator can be used to accelerate learning to achieve high task rewards. Our results show that the mutual information between the context states and the states of interest can be an effective ingredient for overcoming challenges in robotic manipulation tasks with sparse rewards. A video showing experimental results is available at https://youtu.be/cLRrkd3Y7vU
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=HygSq3VFvH
PDF https://openreview.net/pdf?id=HygSq3VFvH
PWC https://paperswithcode.com/paper/self-supervised-state-control-through
Repo https://github.com/misc-project/misc
Framework none

LambdaNet: Probabilistic Type Inference using Graph Neural Networks

Title LambdaNet: Probabilistic Type Inference using Graph Neural Networks
Authors Jiayi Wei, Maruth Goyal, Greg Durrett, Isil Dillig
Abstract As gradual typing becomes increasingly popular in languages like Python and TypeScript, there is a growing need to infer type annotations automatically. While type annotations help with tasks like code completion and static error catching, these annotations cannot be fully inferred by compilers and are tedious to annotate by hand. This paper proposes a probabilistic type inference scheme for TypeScript based on a graph neural network. Our approach first uses lightweight source code analysis to generate a program abstraction called a type dependency graph, which links type variables with logical constraints as well as name and usage information. Given this program abstraction, we then use a graph neural network to propagate information between related type variables and eventually make type predictions. Our neural architecture can predict both standard types, like number or string, as well as user-defined types that have not been encountered during training. Our experimental results show that our approach outperforms prior work in this space by 14% (absolute) on library types, while having the ability to make type predictions that are out of scope for existing techniques.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=Hkx6hANtwH
PDF https://openreview.net/pdf?id=Hkx6hANtwH
PWC https://paperswithcode.com/paper/lambdanet-probabilistic-type-inference-using
Repo https://github.com/MrVPlusOne/LambdaNet
Framework none

MUSE: Multi-Scale Attention Model for Sequence to Sequence Learning

Title MUSE: Multi-Scale Attention Model for Sequence to Sequence Learning
Authors Anonymous
Abstract Transformers have achieved state-of-the-art results on a variety of natural language processing tasks. Despite good performance, Transformers are still weak in long sentence modeling where the global attention map is too dispersed to capture valuable information. In such case, the local/token features that are also significant to sequence modeling are omitted to some extent. To address this problem, we propose a Multi-scale attention model (MUSE) by concatenating attention networks with convolutional networks and position-wise feed-forward networks to explicitly capture local and token features. Considering the parameter size and computation efficiency, we re-use the feed-forward layer in the original Transformer and adopt a lightweight dynamic convolution as implementation. Experimental results show that the proposed model achieves substantial performance improvements over Transformer, especially on long sentences, and pushes the state-of-the-art from 35.6 to 36.2 on IWSLT 2014 German to English translation task, from 30.6 to 31.3 on IWSLT 2015 English to Vietnamese translation task. We also reach the state-of-art performance on WMT 2014 English to French translation dataset, with a BLEU score of 43.2.
Tasks Machine Translation
Published 2020-01-01
URL https://openreview.net/forum?id=SJe-3REFwr
PDF https://openreview.net/pdf?id=SJe-3REFwr
PWC https://paperswithcode.com/paper/muse-multi-scale-attention-model-for-sequence
Repo https://github.com/lancopku/MUSE
Framework pytorch

Recurrent Independent Mechanisms

Title Recurrent Independent Mechanisms
Authors Anonymous
Abstract Learning modular structures which reflect the dynamics of the environment can lead to better generalization and robustness to changes which only affect a few of the underlying causes. We propose Recurrent Independent Mechanisms (RIMs), a new recurrent architecture in which multiple groups of recurrent cells operate with nearly independent transition dynamics, communicate only sparingly through the bottleneck of attention, and are only updated at time steps where they are most relevant. We show that this leads to specialization amongst the RIMs, which in turn allows for dramatically improved generalization on tasks where some factors of variation differ systematically between training and evaluation.
Tasks Atari Games
Published 2020-01-01
URL https://openreview.net/forum?id=BylaUTNtPS
PDF https://openreview.net/pdf?id=BylaUTNtPS
PWC https://paperswithcode.com/paper/recurrent-independent-mechanisms-1
Repo https://github.com/maximecb/gym-minigrid
Framework pytorch

Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification

Title Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification
Authors Anonymous
Abstract Person re-identification (re-ID) aims at identifying the same persons’ images across different cameras. However, domain diversities between different datasets pose an evident challenge for adapting the re-ID model trained on one dataset to another one. State-of-the-art unsupervised domain adaptation methods for person re-ID transferred the learned knowledge from the source domain by optimizing with pseudo labels created by clustering algorithms on the target domain. Although they achieved state-of-the-art performances, the inevitable label noise caused by the clustering procedure was ignored. Such noisy pseudo labels substantially hinders the model’s capability on further improving feature representations on the target domain. In order to mitigate the effects of noisy pseudo labels, we propose to softly refine the pseudo labels in the target domain by proposing an unsupervised framework, Mutual Mean-Teaching (MMT), to learn better features from the target domain via off-line refined hard pseudo labels and on-line refined soft pseudo labels in an alternative training manner. In addition, the common practice is to adopt both the classification loss and the triplet loss jointly for achieving optimal performances in person re-ID models. However, conventional triplet loss cannot work with softly refined labels. To solve this problem, a novel soft softmax-triplet loss is proposed to support learning with soft pseudo triplet labels for achieving the optimal domain adaptation performance. The proposed MMT framework achieves considerable improvements of 14.4%, 18.2%, 13.1% and 16.4% mAP on Market-to-Duke, Duke-to-Market, Market-to-MSMT and Duke-to-MSMT unsupervised domain adaptation tasks.
Tasks Domain Adaptation, Person Re-Identification, Unsupervised Domain Adaptation
Published 2020-01-01
URL https://openreview.net/forum?id=rJlnOhVYPS
PDF https://openreview.net/pdf?id=rJlnOhVYPS
PWC https://paperswithcode.com/paper/mutual-mean-teaching-pseudo-label-refinery
Repo https://github.com/yxgeee/MMT
Framework pytorch

AHash: A Load-Balanced One Permutation Hash

Title AHash: A Load-Balanced One Permutation Hash
Authors Anonymous
Abstract Minwise Hashing (MinHash) is a fundamental method to compute set similarities and compact high-dimensional data for efficient learning and searching. The bottleneck of MinHash is computing k (usually hundreds) MinHash values. One Permutation Hashing (OPH) only requires one permutation (hash function) to get k MinHash values by dividing elements into k bins. One drawback of OPH is that the load of the bins (the number of elements in a bin) could be unbalanced, which leads to the existence of empty bins and false similarity computation. Several strategies for densification, that is, filling empty bins, have been proposed. However, the densification is just a remedial strategy and cannot eliminate the error incurred by the unbalanced load. Unlike the densification to fill the empty bins after they undesirably occur, our design goal is to balance the load so as to reduce the empty bins in advance. In this paper, we propose a load-balanced hashing, Amortization Hashing (AHash), which can generate as few empty bins as possible. Therefore, AHash is more load-balanced and accurate without hurting runtime efficiency compared with OPH and densification strategies. Our experiments on real datasets validate the claim. All source codes and datasets have been provided as Supplementary Materials and released on GitHub anonymously.
Tasks
Published 2020-01-01
URL https://openreview.net/forum?id=rJe9fTNtPS
PDF https://openreview.net/pdf?id=rJe9fTNtPS
PWC https://paperswithcode.com/paper/ahash-a-load-balanced-one-permutation-hash
Repo https://github.com/AHashCodes/AHash
Framework none

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

Title WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia
Authors Anonymous
Abstract We present an approach based on multilingual sentence embeddings to automatically extract parallel sentences from the content of Wikipedia articles in 85 languages, including several dialects or low-resource languages. We do not limit the extraction process to alignments with English, but systematically consider all possible language pairs. In total, we are able to extract 135M parallel sentences for 1620 different language pairs, out of which only 34M are aligned with English. This corpus of parallel sentences is freely available (URL anonymized) To get an indication on the quality of the extracted bitexts, we train neural MT baseline systems on the mined data only for 1886 languages pairs, and evaluate them on the TED corpus, achieving strong BLEU scores for many language pairs. The WikiMatrix bitexts seem to be particularly interesting to train MT systems between distant languages without the need to pivot through English.
Tasks Sentence Embeddings
Published 2020-01-01
URL https://openreview.net/forum?id=rkeYL1SFvH
PDF https://openreview.net/pdf?id=rkeYL1SFvH
PWC https://paperswithcode.com/paper/wikimatrix-mining-135m-parallel-sentences-in-1
Repo https://github.com/facebookresearch/LASER
Framework pytorch

Network Embedding with Completely-imbalanced Labels

Title Network Embedding with Completely-imbalanced Labels
Authors Zheng Wang, Xiaojun Ye, Chaokun Wang, Jian Cui, and Philip S. Yu, Fellow, IEEE
Abstract Network embedding, aiming to project a network into a low-dimensional space, is increasingly becoming a focus of network research. Semi-supervised network embedding takes advantage of labeled data, and has shown promising performance. However, existing semi-supervised methods would get unappealing results in the completely-imbalanced label setting where some classes have no labeled nodes at all. To alleviate this, we propose two novel semi-supervised network embedding methods. The first one is a shallow method named RSDNE. Specifically, to benefit from the completely-imbalanced labels, RSDNE guarantees both intra-class similarity and inter-class dissimilarity in an approximate way. The other method is RECT which is a new class of graph neural networks. Different from RSDNE, to benefit from the completely-imbalanced labels, RECT explores the knowledge of class-semantic descriptions. This enables RECT to handle networks with node features and multi-label setting. Experimental results on several real-world datasets demonstrate the superiority of the proposed methods.
Tasks Network Embedding
Published 2020-02-03
URL https://ieeexplore.ieee.xilesou.top/abstract/document/8979355
PDF https://ieeexplore.ieee.xilesou.top/abstract/document/8979355
PWC https://paperswithcode.com/paper/network-embedding-with-completely-imbalanced
Repo https://github.com/zhengwang100/RECT
Framework pytorch
comments powered by Disqus