February 1, 2020

3140 words 15 mins read

Paper Group AWR 229

MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning. Unsupervised Learning of Landmarks by Descriptor Vector Exchange. Beyond English-Only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian. Katecheo: A Portable and Modular System for Multi-Topic Question Answering. Addressing Sample Complexity in …

MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning


Title	MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
Authors	Guangxiang Zhao, Xu Sun, Jingjing Xu, Zhiyuan Zhang, Liangchen Luo
Abstract	In sequence to sequence learning, the self-attention mechanism proves to be highly effective, and achieves significant improvements in many tasks. However, the self-attention mechanism is not without its own flaws. Although self-attention can model extremely long dependencies, the attention in deep layers tends to overconcentrate on a single token, leading to insufficient use of local information and difficultly in representing long sequences. In this work, we explore parallel multi-scale representation learning on sequence data, striving to capture both long-range and short-range language structures. To this end, we propose the Parallel MUlti-Scale attEntion (MUSE) and MUSE-simple. MUSE-simple contains the basic idea of parallel multi-scale sequence representation learning, and it encodes the sequence in parallel, in terms of different scales with the help from self-attention, and pointwise transformation. MUSE builds on MUSE-simple and explores combining convolution and self-attention for learning sequence representations from more different scales. We focus on machine translation and the proposed approach achieves substantial performance improvements over Transformer, especially on long sequences. More importantly, we find that although conceptually simple, its success in practice requires intricate considerations, and the multi-scale attention must build on unified semantic space. Under common setting, the proposed model achieves substantial performance and outperforms all previous models on three main machine translation tasks. In addition, MUSE has potential for accelerating inference due to its parallelism. Code will be available at https://github.com/lancopku/MUSE
Tasks	Machine Translation, Representation Learning
Published	2019-11-17
URL	https://arxiv.org/abs/1911.09483v1
PDF	https://arxiv.org/pdf/1911.09483v1.pdf
PWC	https://paperswithcode.com/paper/muse-parallel-multi-scale-attention-for
Repo	https://github.com/lancopku/MUSE
Framework	pytorch

Unsupervised Learning of Landmarks by Descriptor Vector Exchange


Title	Unsupervised Learning of Landmarks by Descriptor Vector Exchange
Authors	James Thewlis, Samuel Albanie, Hakan Bilen, Andrea Vedaldi
Abstract	Equivariance to random image transformations is an effective method to learn landmarks of object categories, such as the eyes and the nose in faces, without manual supervision. However, this method does not explicitly guarantee that the learned landmarks are consistent with changes between different instances of the same object, such as different facial identities. In this paper, we develop a new perspective on the equivariance approach by noting that dense landmark detectors can be interpreted as local image descriptors equipped with invariance to intra-category variations. We then propose a direct method to enforce such an invariance in the standard equivariant loss. We do so by exchanging descriptor vectors between images of different object instances prior to matching them geometrically. In this manner, the same vectors must work regardless of the specific object identity considered. We use this approach to learn vectors that can simultaneously be interpreted as local descriptors and dense landmarks, combining the advantages of both. Experiments on standard benchmarks show that this approach can match, and in some cases surpass state-of-the-art performance amongst existing methods that learn landmarks without supervision. Code is available at www.robots.ox.ac.uk/~vgg/research/DVE/.
Tasks	Facial Landmark Detection, Unsupervised Facial Landmark Detection
Published	2019-08-18
URL	https://arxiv.org/abs/1908.06427v1
PDF	https://arxiv.org/pdf/1908.06427v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-learning-of-landmarks-by
Repo	https://github.com/jamt9000/DVE
Framework	pytorch

Beyond English-Only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian


Title	Beyond English-Only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian
Authors	Momchil Hardalov, Ivan Koychev, Preslav Nakov
Abstract	Recently, reading comprehension models achieved near-human performance on large-scale datasets such as SQuAD, CoQA, MS Macro, RACE, etc. This is largely due to the release of pre-trained contextualized representations such as BERT and ELMo, which can be fine-tuned for the target task. Despite those advances and the creation of more challenging datasets, most of the work is still done for English. Here, we study the effectiveness of multilingual BERT fine-tuned on large-scale English datasets for reading comprehension (e.g., for RACE), and we apply it to Bulgarian multiple-choice reading comprehension. We propose a new dataset containing 2,221 questions from matriculation exams for twelfth grade in various subjects -history, biology, geography and philosophy-, and 412 additional questions from online quizzes in history. While the quiz authors gave no relevant context, we incorporate knowledge from Wikipedia, retrieving documents matching the combination of question + each answer option. Moreover, we experiment with different indexing and pre-training strategies. The evaluation results show accuracy of 42.23%, which is well above the baseline of 24.89%.
Tasks	Reading Comprehension
Published	2019-08-05
URL	https://arxiv.org/abs/1908.01519v2
PDF	https://arxiv.org/pdf/1908.01519v2.pdf
PWC	https://paperswithcode.com/paper/beyond-english-only-reading-comprehension
Repo	https://github.com/mhardalov/bg-reason-BERT
Framework	none

Katecheo: A Portable and Modular System for Multi-Topic Question Answering


Title	Katecheo: A Portable and Modular System for Multi-Topic Question Answering
Authors	Shirish Hirekodi, Seban Sunny, Leonard Topno, Alwin Daniel, Daniel Whitenack, Reuben Skewes, Stuart Cranney
Abstract	We introduce a modular system that can be deployed on any Kubernetes cluster for question answering via REST API. This system, called Katecheo, includes three configurable modules that collectively enable identification of questions, classification of those questions into topics, document search, and reading comprehension. We demonstrate the system using publicly available knowledge base articles extracted from Stack Exchange sites. However, users can extend the system to any number of topics, or domains, without the need to modify any of the model serving code or train their own models. All components of the system are open source and available under a permissive Apache 2 License.
Tasks	Question Answering, Reading Comprehension
Published	2019-07-01
URL	https://arxiv.org/abs/1907.00854v2
PDF	https://arxiv.org/pdf/1907.00854v2.pdf
PWC	https://paperswithcode.com/paper/katecheo-a-portable-and-modular-system-for
Repo	https://github.com/cvdigitalai/katecheo
Framework	none

Addressing Sample Complexity in Visual Tasks Using HER and Hallucinatory GANs


Title	Addressing Sample Complexity in Visual Tasks Using HER and Hallucinatory GANs
Authors	Himanshu Sahni, Toby Buckley, Pieter Abbeel, Ilya Kuzovkin
Abstract	Reinforcement Learning (RL) algorithms typically require millions of environment interactions to learn successful policies in sparse reward settings. Hindsight Experience Replay (HER) was introduced as a technique to increase sample efficiency by reimagining unsuccessful trajectories as successful ones by altering the originally intended goals. However, it cannot be directly applied to visual environments where goal states are often characterized by the presence of distinct visual features. In this work, we show how visual trajectories can be hallucinated to appear successful by altering agent observations using a generative model trained on relatively few snapshots of the goal. We then use this model in combination with HER to train RL agents in visual settings. We validate our approach on 3D navigation tasks and a simulated robotics application and show marked improvement over baselines derived from previous work.
Tasks
Published	2019-01-31
URL	https://arxiv.org/abs/1901.11529v2
PDF	https://arxiv.org/pdf/1901.11529v2.pdf
PWC	https://paperswithcode.com/paper/visual-hindsight-experience-replay
Repo	https://github.com/maximecb/gym-miniworld
Framework	pytorch

ProBO: Versatile Bayesian Optimization Using Any Probabilistic Programming Language


Title	ProBO: Versatile Bayesian Optimization Using Any Probabilistic Programming Language
Authors	Willie Neiswanger, Kirthevasan Kandasamy, Barnabas Poczos, Jeff Schneider, Eric Xing
Abstract	Optimizing an expensive-to-query function is a common task in science and engineering, where it is beneficial to keep the number of queries to a minimum. A popular strategy is Bayesian optimization (BO), which leverages probabilistic models for this task. Most BO today uses Gaussian processes (GPs), or a few other surrogate models. However, there is a broad set of Bayesian modeling techniques that could be used to capture complex systems and reduce the number of queries in BO. Probabilistic programming languages (PPLs) are modern tools that allow for flexible model definition, prior specification, model composition, and automatic inference. In this paper, we develop ProBO, a BO procedure that uses only standard operations common to most PPLs. This allows a user to drop in a model built with an arbitrary PPL and use it directly in BO. We describe acquisition functions for ProBO, and strategies for efficiently optimizing these functions given complex models or costly inference procedures. Using existing PPLs, we implement new models to aid in a few challenging optimization settings, and demonstrate these on model hyperparameter and architecture search tasks.
Tasks	Gaussian Processes, Probabilistic Programming
Published	2019-01-31
URL	https://arxiv.org/abs/1901.11515v2
PDF	https://arxiv.org/pdf/1901.11515v2.pdf
PWC	https://paperswithcode.com/paper/probo-a-framework-for-using-probabilistic
Repo	https://github.com/willieneis/ProBO
Framework	none

Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm


Title	Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm
Authors	Giulia Luise, Saverio Salzo, Massimiliano Pontil, Carlo Ciliberto
Abstract	We present a novel algorithm to estimate the barycenter of arbitrary probability distributions with respect to the Sinkhorn divergence. Based on a Frank-Wolfe optimization strategy, our approach proceeds by populating the support of the barycenter incrementally, without requiring any pre-allocation. We consider discrete as well as continuous distributions, proving convergence rates of the proposed algorithm in both settings. Key elements of our analysis are a new result showing that the Sinkhorn divergence on compact domains has Lipschitz continuous gradient with respect to the Total Variation and a characterization of the sample complexity of Sinkhorn potentials. Experiments validate the effectiveness of our method in practice.
Tasks
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13194v1
PDF	https://arxiv.org/pdf/1905.13194v1.pdf
PWC	https://paperswithcode.com/paper/sinkhorn-barycenters-with-free-support-via
Repo	https://github.com/GiulsLu/Sinkhorn-Barycenters
Framework	pytorch

ParticleNet: Jet Tagging via Particle Clouds


Title	ParticleNet: Jet Tagging via Particle Clouds
Authors	Huilin Qu, Loukas Gouskos
Abstract	How to represent a jet is at the core of machine learning on jet physics. Inspired by the notion of point clouds, we propose a new approach that considers a jet as an unordered set of its constituent particles, effectively a “particle cloud”. Such a particle cloud representation of jets is efficient in incorporating raw information of jets and also explicitly respects the permutation symmetry. Based on the particle cloud representation, we propose ParticleNet, a customized neural network architecture using Dynamic Graph Convolutional Neural Network for jet tagging problems. The ParticleNet architecture achieves state-of-the-art performance on two representative jet tagging benchmarks and is improved significantly over existing methods.
Tasks
Published	2019-02-22
URL	https://arxiv.org/abs/1902.08570v3
PDF	https://arxiv.org/pdf/1902.08570v3.pdf
PWC	https://paperswithcode.com/paper/particlenet-jet-tagging-via-particle-clouds
Repo	https://github.com/hqucms/ParticleNet
Framework	tf

RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation


Title	RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation
Authors	Bastian Wandt, Bodo Rosenhahn
Abstract	This paper addresses the problem of 3D human pose estimation from single images. While for a long time human skeletons were parameterized and fitted to the observation by satisfying a reprojection error, nowadays researchers directly use neural networks to infer the 3D pose from the observations. However, most of these approaches ignore the fact that a reprojection constraint has to be satisfied and are sensitive to overfitting. We tackle the overfitting problem by ignoring 2D to 3D correspondences. This efficiently avoids a simple memorization of the training data and allows for a weakly supervised training. One part of the proposed reprojection network (RepNet) learns a mapping from a distribution of 2D poses to a distribution of 3D poses using an adversarial training approach. Another part of the network estimates the camera. This allows for the definition of a network layer that performs the reprojection of the estimated 3D pose back to 2D which results in a reprojection loss function. Our experiments show that RepNet generalizes well to unknown data and outperforms state-of-the-art methods when applied to unseen data. Moreover, our implementation runs in real-time on a standard desktop PC.
Tasks	3D Human Pose Estimation, Pose Estimation
Published	2019-02-26
URL	http://arxiv.org/abs/1902.09868v2
PDF	http://arxiv.org/pdf/1902.09868v2.pdf
PWC	https://paperswithcode.com/paper/repnet-weakly-supervised-training-of-an
Repo	https://github.com/bastianwandt/RepNet
Framework	tf

Fast and Robust Spectrally Sparse Signal Recovery: A Provable Non-Convex Approach via Robust Low-Rank Hankel Matrix Reconstruction


Title	Fast and Robust Spectrally Sparse Signal Recovery: A Provable Non-Convex Approach via Robust Low-Rank Hankel Matrix Reconstruction
Authors	HanQin Cai, Jian-Feng Cai, Tianming Wang, Guojian Yin
Abstract	Consider a spectrally sparse signal $\boldsymbol{x}$ that consists of $r$ complex sinusoids with or without damping. We study the robust recovery problem for the spectrally sparse signal under the fully observed setting, which is about recovering $\boldsymbol{x}$ and a sparse corruption vector $\boldsymbol{s}$ from their sum $\boldsymbol{z}=\boldsymbol{x}+\boldsymbol{s}$. In this paper, we exploit the low-rank property of the Hankel matrix constructed from $\boldsymbol{x}$, and develop an efficient non-convex algorithm, coined Accelerated Alternating Projections for Robust Low-Rank Hankel Matrix Reconstruction (AAP-Hankel). The high computational efficiency and low space complexity of AAP-Hankel are achieved by fast computations involving structured matrices, and a subspace projection method for accelerated low-rank approximation. Theoretical recovery guarantee with a linear convergence rate has been established for AAP-Hankel. Empirical performance comparisons on synthetic and real-world datasets demonstrate the computational advantages of AAP-Hankel, in both efficiency and robustness aspects.
Tasks
Published	2019-10-13
URL	https://arxiv.org/abs/1910.05859v1
PDF	https://arxiv.org/pdf/1910.05859v1.pdf
PWC	https://paperswithcode.com/paper/fast-and-robust-spectrally-sparse-signal
Repo	https://github.com/caesarcai/AAP-Hankel
Framework	none

Mining YouTube - A dataset for learning fine-grained action concepts from webly supervised video data


Title	Mining YouTube - A dataset for learning fine-grained action concepts from webly supervised video data
Authors	Hilde Kuehne, Ahsan Iqbal, Alexander Richard, Juergen Gall
Abstract	Action recognition is so far mainly focusing on the problem of classification of hand selected preclipped actions and reaching impressive results in this field. But with the performance even ceiling on current datasets, it also appears that the next steps in the field will have to go beyond this fully supervised classification. One way to overcome those problems is to move towards less restricted scenarios. In this context we present a large-scale real-world dataset designed to evaluate learning techniques for human action recognition beyond hand-crafted datasets. To this end we put the process of collecting data on its feet again and start with the annotation of a test set of 250 cooking videos. The training data is then gathered by searching for the respective annotated classes within the subtitles of freely available videos. The uniqueness of the dataset is attributed to the fact that the whole process of collecting the data and training does not involve any human intervention. To address the problem of semantic inconsistencies that arise with this kind of training data, we further propose a semantical hierarchical structure for the mined classes.
Tasks	Temporal Action Localization
Published	2019-06-03
URL	https://arxiv.org/abs/1906.01012v1
PDF	https://arxiv.org/pdf/1906.01012v1.pdf
PWC	https://paperswithcode.com/paper/mining-youtube-a-dataset-for-learning-fine
Repo	https://github.com/hildekuehne/Weak_YouTube_dataset
Framework	none

Just Jump: Dynamic Neighborhood Aggregation in Graph Neural Networks


Title	Just Jump: Dynamic Neighborhood Aggregation in Graph Neural Networks
Authors	Matthias Fey
Abstract	We propose a dynamic neighborhood aggregation (DNA) procedure guided by (multi-head) attention for representation learning on graphs. In contrast to current graph neural networks which follow a simple neighborhood aggregation scheme, our DNA procedure allows for a selective and node-adaptive aggregation of neighboring embeddings of potentially differing locality. In order to avoid overfitting, we propose to control the channel-wise connections between input and output by making use of grouped linear projections. In a number of transductive node-classification experiments, we demonstrate the effectiveness of our approach.
Tasks	Node Classification, Representation Learning
Published	2019-04-09
URL	http://arxiv.org/abs/1904.04849v2
PDF	http://arxiv.org/pdf/1904.04849v2.pdf
PWC	https://paperswithcode.com/paper/just-jump-dynamic-neighborhood-aggregation-in
Repo	https://github.com/rusty1s/pytorch_geometric
Framework	pytorch

JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds


Title	JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds
Authors	Lin Zhao, Wenbing Tao
Abstract	In this paper, we propose a novel joint instance and semantic segmentation approach, which is called JSNet, in order to address the instance and semantic segmentation of 3D point clouds simultaneously. Firstly, we build an effective backbone network to extract robust features from the raw point clouds. Secondly, to obtain more discriminative features, a point cloud feature fusion module is proposed to fuse the different layer features of the backbone network. Furthermore, a joint instance semantic segmentation module is developed to transform semantic features into instance embedding space, and then the transformed features are further fused with instance features to facilitate instance segmentation. Meanwhile, this module also aggregates instance features into semantic feature space to promote semantic segmentation. Finally, the instance predictions are generated by applying a simple mean-shift clustering on instance embeddings. As a result, we evaluate the proposed JSNet on a large-scale 3D indoor point cloud dataset S3DIS and a part dataset ShapeNet, and compare it with existing approaches. Experimental results demonstrate our approach outperforms the state-of-the-art method in 3D instance segmentation with a significant improvement in 3D semantic prediction and our method is also beneficial for part segmentation. The source code for this work is available at https://github.com/dlinzhao/JSNet.
Tasks	3D Instance Segmentation, Instance Segmentation, Semantic Segmentation
Published	2019-12-20
URL	https://arxiv.org/abs/1912.09654v1
PDF	https://arxiv.org/pdf/1912.09654v1.pdf
PWC	https://paperswithcode.com/paper/191209654
Repo	https://github.com/dlinzhao/JSNet
Framework	tf

Training-Free Artificial Neural Networks


Title	Training-Free Artificial Neural Networks
Authors	Nikolaos P. Bakas, Savvas Chatzichristofis
Abstract	This paper presents a numerical scheme for the computation of Artificial Neural Networks’ weights, without a laborious iterative procedure. The proposed algorithm adheres to the underlying theory, is highly fast, and results in remarkably low errors when applied for regression and classification of complex data-sets, such as the Griewank function of multiple variables $\mathbf{x} \in \mathbb{R}^{100}$ with random noise addition, and MNIST database for handwritten digits recognition, with $7\times10^4$ images. Interestingly, the same mathematical formulation found capable of approximating highly nonlinear functions in multiple dimensions, with low errors (e.g. $10^{-10}$) for the test set of the unknown functions, their higher-order partial derivatives, as well as numerically solving Partial Differential Equations. The method is based on the calculation of the weights of each neuron, in small neighborhoods of data, such that the corresponding local approximation matrix is invertible. Accordingly, the hyperparameters optimization is not necessary, as the neurons’ number stems directly from the dimensions of the data, further improving the algorithmic speed. The overfitting is inherently eliminated, and the results are interpretable and reproducible. The complexity of the proposed algorithm is of class P with $\mathcal{O}(mn^3)$ computing time, which is linear for the observations and cubic for the features, in contrast with the NP-Complete class of standard algorithms for training ANNs. The performance of the method is high, for small as well as big datasets, and the test-set errors are similar or smaller than the train errors indicating the generalization efficiency. The supplementary computer code in Julia Language, may reproduce the validation examples, and run for other data-sets
Tasks
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13563v2
PDF	https://arxiv.org/pdf/1909.13563v2.pdf
PWC	https://paperswithcode.com/paper/training-free-artificial-neural-networks
Repo	https://github.com/nbakas/ANNBN.jl
Framework	none

EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing


Title	EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing
Authors	Yue Dong, Zichao Li, Mehdi Rezagholizadeh, Jackie Chi Kit Cheung
Abstract	We present the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-interpreter approach. Most current neural sentence simplification systems are variants of sequence-to-sequence models adopted from machine translation. These methods learn to simplify sentences as a byproduct of the fact that they are trained on complex-simple sentence pairs. By contrast, our neural programmer-interpreter is directly trained to predict explicit edit operations on targeted parts of the input sentence, resembling the way that humans might perform simplification and revision. Our model outperforms previous state-of-the-art neural sentence simplification models (without external knowledge) by large margins on three benchmark text simplification corpora in terms of SARI (+0.95 WikiLarge, +1.89 WikiSmall, +1.41 Newsela), and is judged by humans to produce overall better and simpler output sentences.
Tasks	Machine Translation, Text Simplification
Published	2019-06-19
URL	https://arxiv.org/abs/1906.08104v1
PDF	https://arxiv.org/pdf/1906.08104v1.pdf
PWC	https://paperswithcode.com/paper/editnts-an-neural-programmer-interpreter
Repo	https://github.com/yuedongP/EditNTS
Framework	pytorch