Paper Group AWR 229
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning. Unsupervised Learning of Landmarks by Descriptor Vector Exchange. Beyond English-Only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian. Katecheo: A Portable and Modular System for Multi-Topic Question Answering. Addressing Sample Complexity in …
MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning
Title | MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning |
Authors | Guangxiang Zhao, Xu Sun, Jingjing Xu, Zhiyuan Zhang, Liangchen Luo |
Abstract | In sequence to sequence learning, the self-attention mechanism proves to be highly effective, and achieves significant improvements in many tasks. However, the self-attention mechanism is not without its own flaws. Although self-attention can model extremely long dependencies, the attention in deep layers tends to overconcentrate on a single token, leading to insufficient use of local information and difficultly in representing long sequences. In this work, we explore parallel multi-scale representation learning on sequence data, striving to capture both long-range and short-range language structures. To this end, we propose the Parallel MUlti-Scale attEntion (MUSE) and MUSE-simple. MUSE-simple contains the basic idea of parallel multi-scale sequence representation learning, and it encodes the sequence in parallel, in terms of different scales with the help from self-attention, and pointwise transformation. MUSE builds on MUSE-simple and explores combining convolution and self-attention for learning sequence representations from more different scales. We focus on machine translation and the proposed approach achieves substantial performance improvements over Transformer, especially on long sequences. More importantly, we find that although conceptually simple, its success in practice requires intricate considerations, and the multi-scale attention must build on unified semantic space. Under common setting, the proposed model achieves substantial performance and outperforms all previous models on three main machine translation tasks. In addition, MUSE has potential for accelerating inference due to its parallelism. Code will be available at https://github.com/lancopku/MUSE |
Tasks | Machine Translation, Representation Learning |
Published | 2019-11-17 |
URL | https://arxiv.org/abs/1911.09483v1 |
https://arxiv.org/pdf/1911.09483v1.pdf | |
PWC | https://paperswithcode.com/paper/muse-parallel-multi-scale-attention-for |
Repo | https://github.com/lancopku/MUSE |
Framework | pytorch |
Unsupervised Learning of Landmarks by Descriptor Vector Exchange
Title | Unsupervised Learning of Landmarks by Descriptor Vector Exchange |
Authors | James Thewlis, Samuel Albanie, Hakan Bilen, Andrea Vedaldi |
Abstract | Equivariance to random image transformations is an effective method to learn landmarks of object categories, such as the eyes and the nose in faces, without manual supervision. However, this method does not explicitly guarantee that the learned landmarks are consistent with changes between different instances of the same object, such as different facial identities. In this paper, we develop a new perspective on the equivariance approach by noting that dense landmark detectors can be interpreted as local image descriptors equipped with invariance to intra-category variations. We then propose a direct method to enforce such an invariance in the standard equivariant loss. We do so by exchanging descriptor vectors between images of different object instances prior to matching them geometrically. In this manner, the same vectors must work regardless of the specific object identity considered. We use this approach to learn vectors that can simultaneously be interpreted as local descriptors and dense landmarks, combining the advantages of both. Experiments on standard benchmarks show that this approach can match, and in some cases surpass state-of-the-art performance amongst existing methods that learn landmarks without supervision. Code is available at www.robots.ox.ac.uk/~vgg/research/DVE/. |
Tasks | Facial Landmark Detection, Unsupervised Facial Landmark Detection |
Published | 2019-08-18 |
URL | https://arxiv.org/abs/1908.06427v1 |
https://arxiv.org/pdf/1908.06427v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-of-landmarks-by |
Repo | https://github.com/jamt9000/DVE |
Framework | pytorch |
Beyond English-Only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian
Title | Beyond English-Only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for Bulgarian |
Authors | Momchil Hardalov, Ivan Koychev, Preslav Nakov |
Abstract | Recently, reading comprehension models achieved near-human performance on large-scale datasets such as SQuAD, CoQA, MS Macro, RACE, etc. This is largely due to the release of pre-trained contextualized representations such as BERT and ELMo, which can be fine-tuned for the target task. Despite those advances and the creation of more challenging datasets, most of the work is still done for English. Here, we study the effectiveness of multilingual BERT fine-tuned on large-scale English datasets for reading comprehension (e.g., for RACE), and we apply it to Bulgarian multiple-choice reading comprehension. We propose a new dataset containing 2,221 questions from matriculation exams for twelfth grade in various subjects -history, biology, geography and philosophy-, and 412 additional questions from online quizzes in history. While the quiz authors gave no relevant context, we incorporate knowledge from Wikipedia, retrieving documents matching the combination of question + each answer option. Moreover, we experiment with different indexing and pre-training strategies. The evaluation results show accuracy of 42.23%, which is well above the baseline of 24.89%. |
Tasks | Reading Comprehension |
Published | 2019-08-05 |
URL | https://arxiv.org/abs/1908.01519v2 |
https://arxiv.org/pdf/1908.01519v2.pdf | |
PWC | https://paperswithcode.com/paper/beyond-english-only-reading-comprehension |
Repo | https://github.com/mhardalov/bg-reason-BERT |
Framework | none |
Katecheo: A Portable and Modular System for Multi-Topic Question Answering
Title | Katecheo: A Portable and Modular System for Multi-Topic Question Answering |
Authors | Shirish Hirekodi, Seban Sunny, Leonard Topno, Alwin Daniel, Daniel Whitenack, Reuben Skewes, Stuart Cranney |
Abstract | We introduce a modular system that can be deployed on any Kubernetes cluster for question answering via REST API. This system, called Katecheo, includes three configurable modules that collectively enable identification of questions, classification of those questions into topics, document search, and reading comprehension. We demonstrate the system using publicly available knowledge base articles extracted from Stack Exchange sites. However, users can extend the system to any number of topics, or domains, without the need to modify any of the model serving code or train their own models. All components of the system are open source and available under a permissive Apache 2 License. |
Tasks | Question Answering, Reading Comprehension |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.00854v2 |
https://arxiv.org/pdf/1907.00854v2.pdf | |
PWC | https://paperswithcode.com/paper/katecheo-a-portable-and-modular-system-for |
Repo | https://github.com/cvdigitalai/katecheo |
Framework | none |
Addressing Sample Complexity in Visual Tasks Using HER and Hallucinatory GANs
Title | Addressing Sample Complexity in Visual Tasks Using HER and Hallucinatory GANs |
Authors | Himanshu Sahni, Toby Buckley, Pieter Abbeel, Ilya Kuzovkin |
Abstract | Reinforcement Learning (RL) algorithms typically require millions of environment interactions to learn successful policies in sparse reward settings. Hindsight Experience Replay (HER) was introduced as a technique to increase sample efficiency by reimagining unsuccessful trajectories as successful ones by altering the originally intended goals. However, it cannot be directly applied to visual environments where goal states are often characterized by the presence of distinct visual features. In this work, we show how visual trajectories can be hallucinated to appear successful by altering agent observations using a generative model trained on relatively few snapshots of the goal. We then use this model in combination with HER to train RL agents in visual settings. We validate our approach on 3D navigation tasks and a simulated robotics application and show marked improvement over baselines derived from previous work. |
Tasks | |
Published | 2019-01-31 |
URL | https://arxiv.org/abs/1901.11529v2 |
https://arxiv.org/pdf/1901.11529v2.pdf | |
PWC | https://paperswithcode.com/paper/visual-hindsight-experience-replay |
Repo | https://github.com/maximecb/gym-miniworld |
Framework | pytorch |
ProBO: Versatile Bayesian Optimization Using Any Probabilistic Programming Language
Title | ProBO: Versatile Bayesian Optimization Using Any Probabilistic Programming Language |
Authors | Willie Neiswanger, Kirthevasan Kandasamy, Barnabas Poczos, Jeff Schneider, Eric Xing |
Abstract | Optimizing an expensive-to-query function is a common task in science and engineering, where it is beneficial to keep the number of queries to a minimum. A popular strategy is Bayesian optimization (BO), which leverages probabilistic models for this task. Most BO today uses Gaussian processes (GPs), or a few other surrogate models. However, there is a broad set of Bayesian modeling techniques that could be used to capture complex systems and reduce the number of queries in BO. Probabilistic programming languages (PPLs) are modern tools that allow for flexible model definition, prior specification, model composition, and automatic inference. In this paper, we develop ProBO, a BO procedure that uses only standard operations common to most PPLs. This allows a user to drop in a model built with an arbitrary PPL and use it directly in BO. We describe acquisition functions for ProBO, and strategies for efficiently optimizing these functions given complex models or costly inference procedures. Using existing PPLs, we implement new models to aid in a few challenging optimization settings, and demonstrate these on model hyperparameter and architecture search tasks. |
Tasks | Gaussian Processes, Probabilistic Programming |
Published | 2019-01-31 |
URL | https://arxiv.org/abs/1901.11515v2 |
https://arxiv.org/pdf/1901.11515v2.pdf | |
PWC | https://paperswithcode.com/paper/probo-a-framework-for-using-probabilistic |
Repo | https://github.com/willieneis/ProBO |
Framework | none |
Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm
Title | Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm |
Authors | Giulia Luise, Saverio Salzo, Massimiliano Pontil, Carlo Ciliberto |
Abstract | We present a novel algorithm to estimate the barycenter of arbitrary probability distributions with respect to the Sinkhorn divergence. Based on a Frank-Wolfe optimization strategy, our approach proceeds by populating the support of the barycenter incrementally, without requiring any pre-allocation. We consider discrete as well as continuous distributions, proving convergence rates of the proposed algorithm in both settings. Key elements of our analysis are a new result showing that the Sinkhorn divergence on compact domains has Lipschitz continuous gradient with respect to the Total Variation and a characterization of the sample complexity of Sinkhorn potentials. Experiments validate the effectiveness of our method in practice. |
Tasks | |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.13194v1 |
https://arxiv.org/pdf/1905.13194v1.pdf | |
PWC | https://paperswithcode.com/paper/sinkhorn-barycenters-with-free-support-via |
Repo | https://github.com/GiulsLu/Sinkhorn-Barycenters |
Framework | pytorch |
ParticleNet: Jet Tagging via Particle Clouds
Title | ParticleNet: Jet Tagging via Particle Clouds |
Authors | Huilin Qu, Loukas Gouskos |
Abstract | How to represent a jet is at the core of machine learning on jet physics. Inspired by the notion of point clouds, we propose a new approach that considers a jet as an unordered set of its constituent particles, effectively a “particle cloud”. Such a particle cloud representation of jets is efficient in incorporating raw information of jets and also explicitly respects the permutation symmetry. Based on the particle cloud representation, we propose ParticleNet, a customized neural network architecture using Dynamic Graph Convolutional Neural Network for jet tagging problems. The ParticleNet architecture achieves state-of-the-art performance on two representative jet tagging benchmarks and is improved significantly over existing methods. |
Tasks | |
Published | 2019-02-22 |
URL | https://arxiv.org/abs/1902.08570v3 |
https://arxiv.org/pdf/1902.08570v3.pdf | |
PWC | https://paperswithcode.com/paper/particlenet-jet-tagging-via-particle-clouds |
Repo | https://github.com/hqucms/ParticleNet |
Framework | tf |
RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation
Title | RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation |
Authors | Bastian Wandt, Bodo Rosenhahn |
Abstract | This paper addresses the problem of 3D human pose estimation from single images. While for a long time human skeletons were parameterized and fitted to the observation by satisfying a reprojection error, nowadays researchers directly use neural networks to infer the 3D pose from the observations. However, most of these approaches ignore the fact that a reprojection constraint has to be satisfied and are sensitive to overfitting. We tackle the overfitting problem by ignoring 2D to 3D correspondences. This efficiently avoids a simple memorization of the training data and allows for a weakly supervised training. One part of the proposed reprojection network (RepNet) learns a mapping from a distribution of 2D poses to a distribution of 3D poses using an adversarial training approach. Another part of the network estimates the camera. This allows for the definition of a network layer that performs the reprojection of the estimated 3D pose back to 2D which results in a reprojection loss function. Our experiments show that RepNet generalizes well to unknown data and outperforms state-of-the-art methods when applied to unseen data. Moreover, our implementation runs in real-time on a standard desktop PC. |
Tasks | 3D Human Pose Estimation, Pose Estimation |
Published | 2019-02-26 |
URL | http://arxiv.org/abs/1902.09868v2 |
http://arxiv.org/pdf/1902.09868v2.pdf | |
PWC | https://paperswithcode.com/paper/repnet-weakly-supervised-training-of-an |
Repo | https://github.com/bastianwandt/RepNet |
Framework | tf |
Fast and Robust Spectrally Sparse Signal Recovery: A Provable Non-Convex Approach via Robust Low-Rank Hankel Matrix Reconstruction
Title | Fast and Robust Spectrally Sparse Signal Recovery: A Provable Non-Convex Approach via Robust Low-Rank Hankel Matrix Reconstruction |
Authors | HanQin Cai, Jian-Feng Cai, Tianming Wang, Guojian Yin |
Abstract | Consider a spectrally sparse signal $\boldsymbol{x}$ that consists of $r$ complex sinusoids with or without damping. We study the robust recovery problem for the spectrally sparse signal under the fully observed setting, which is about recovering $\boldsymbol{x}$ and a sparse corruption vector $\boldsymbol{s}$ from their sum $\boldsymbol{z}=\boldsymbol{x}+\boldsymbol{s}$. In this paper, we exploit the low-rank property of the Hankel matrix constructed from $\boldsymbol{x}$, and develop an efficient non-convex algorithm, coined Accelerated Alternating Projections for Robust Low-Rank Hankel Matrix Reconstruction (AAP-Hankel). The high computational efficiency and low space complexity of AAP-Hankel are achieved by fast computations involving structured matrices, and a subspace projection method for accelerated low-rank approximation. Theoretical recovery guarantee with a linear convergence rate has been established for AAP-Hankel. Empirical performance comparisons on synthetic and real-world datasets demonstrate the computational advantages of AAP-Hankel, in both efficiency and robustness aspects. |
Tasks | |
Published | 2019-10-13 |
URL | https://arxiv.org/abs/1910.05859v1 |
https://arxiv.org/pdf/1910.05859v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-and-robust-spectrally-sparse-signal |
Repo | https://github.com/caesarcai/AAP-Hankel |
Framework | none |
Mining YouTube - A dataset for learning fine-grained action concepts from webly supervised video data
Title | Mining YouTube - A dataset for learning fine-grained action concepts from webly supervised video data |
Authors | Hilde Kuehne, Ahsan Iqbal, Alexander Richard, Juergen Gall |
Abstract | Action recognition is so far mainly focusing on the problem of classification of hand selected preclipped actions and reaching impressive results in this field. But with the performance even ceiling on current datasets, it also appears that the next steps in the field will have to go beyond this fully supervised classification. One way to overcome those problems is to move towards less restricted scenarios. In this context we present a large-scale real-world dataset designed to evaluate learning techniques for human action recognition beyond hand-crafted datasets. To this end we put the process of collecting data on its feet again and start with the annotation of a test set of 250 cooking videos. The training data is then gathered by searching for the respective annotated classes within the subtitles of freely available videos. The uniqueness of the dataset is attributed to the fact that the whole process of collecting the data and training does not involve any human intervention. To address the problem of semantic inconsistencies that arise with this kind of training data, we further propose a semantical hierarchical structure for the mined classes. |
Tasks | Temporal Action Localization |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.01012v1 |
https://arxiv.org/pdf/1906.01012v1.pdf | |
PWC | https://paperswithcode.com/paper/mining-youtube-a-dataset-for-learning-fine |
Repo | https://github.com/hildekuehne/Weak_YouTube_dataset |
Framework | none |
Just Jump: Dynamic Neighborhood Aggregation in Graph Neural Networks
Title | Just Jump: Dynamic Neighborhood Aggregation in Graph Neural Networks |
Authors | Matthias Fey |
Abstract | We propose a dynamic neighborhood aggregation (DNA) procedure guided by (multi-head) attention for representation learning on graphs. In contrast to current graph neural networks which follow a simple neighborhood aggregation scheme, our DNA procedure allows for a selective and node-adaptive aggregation of neighboring embeddings of potentially differing locality. In order to avoid overfitting, we propose to control the channel-wise connections between input and output by making use of grouped linear projections. In a number of transductive node-classification experiments, we demonstrate the effectiveness of our approach. |
Tasks | Node Classification, Representation Learning |
Published | 2019-04-09 |
URL | http://arxiv.org/abs/1904.04849v2 |
http://arxiv.org/pdf/1904.04849v2.pdf | |
PWC | https://paperswithcode.com/paper/just-jump-dynamic-neighborhood-aggregation-in |
Repo | https://github.com/rusty1s/pytorch_geometric |
Framework | pytorch |
JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds
Title | JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds |
Authors | Lin Zhao, Wenbing Tao |
Abstract | In this paper, we propose a novel joint instance and semantic segmentation approach, which is called JSNet, in order to address the instance and semantic segmentation of 3D point clouds simultaneously. Firstly, we build an effective backbone network to extract robust features from the raw point clouds. Secondly, to obtain more discriminative features, a point cloud feature fusion module is proposed to fuse the different layer features of the backbone network. Furthermore, a joint instance semantic segmentation module is developed to transform semantic features into instance embedding space, and then the transformed features are further fused with instance features to facilitate instance segmentation. Meanwhile, this module also aggregates instance features into semantic feature space to promote semantic segmentation. Finally, the instance predictions are generated by applying a simple mean-shift clustering on instance embeddings. As a result, we evaluate the proposed JSNet on a large-scale 3D indoor point cloud dataset S3DIS and a part dataset ShapeNet, and compare it with existing approaches. Experimental results demonstrate our approach outperforms the state-of-the-art method in 3D instance segmentation with a significant improvement in 3D semantic prediction and our method is also beneficial for part segmentation. The source code for this work is available at https://github.com/dlinzhao/JSNet. |
Tasks | 3D Instance Segmentation, Instance Segmentation, Semantic Segmentation |
Published | 2019-12-20 |
URL | https://arxiv.org/abs/1912.09654v1 |
https://arxiv.org/pdf/1912.09654v1.pdf | |
PWC | https://paperswithcode.com/paper/191209654 |
Repo | https://github.com/dlinzhao/JSNet |
Framework | tf |
Training-Free Artificial Neural Networks
Title | Training-Free Artificial Neural Networks |
Authors | Nikolaos P. Bakas, Savvas Chatzichristofis |
Abstract | This paper presents a numerical scheme for the computation of Artificial Neural Networks’ weights, without a laborious iterative procedure. The proposed algorithm adheres to the underlying theory, is highly fast, and results in remarkably low errors when applied for regression and classification of complex data-sets, such as the Griewank function of multiple variables $\mathbf{x} \in \mathbb{R}^{100}$ with random noise addition, and MNIST database for handwritten digits recognition, with $7\times10^4$ images. Interestingly, the same mathematical formulation found capable of approximating highly nonlinear functions in multiple dimensions, with low errors (e.g. $10^{-10}$) for the test set of the unknown functions, their higher-order partial derivatives, as well as numerically solving Partial Differential Equations. The method is based on the calculation of the weights of each neuron, in small neighborhoods of data, such that the corresponding local approximation matrix is invertible. Accordingly, the hyperparameters optimization is not necessary, as the neurons’ number stems directly from the dimensions of the data, further improving the algorithmic speed. The overfitting is inherently eliminated, and the results are interpretable and reproducible. The complexity of the proposed algorithm is of class P with $\mathcal{O}(mn^3)$ computing time, which is linear for the observations and cubic for the features, in contrast with the NP-Complete class of standard algorithms for training ANNs. The performance of the method is high, for small as well as big datasets, and the test-set errors are similar or smaller than the train errors indicating the generalization efficiency. The supplementary computer code in Julia Language, may reproduce the validation examples, and run for other data-sets |
Tasks | |
Published | 2019-09-30 |
URL | https://arxiv.org/abs/1909.13563v2 |
https://arxiv.org/pdf/1909.13563v2.pdf | |
PWC | https://paperswithcode.com/paper/training-free-artificial-neural-networks |
Repo | https://github.com/nbakas/ANNBN.jl |
Framework | none |
EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing
Title | EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing |
Authors | Yue Dong, Zichao Li, Mehdi Rezagholizadeh, Jackie Chi Kit Cheung |
Abstract | We present the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-interpreter approach. Most current neural sentence simplification systems are variants of sequence-to-sequence models adopted from machine translation. These methods learn to simplify sentences as a byproduct of the fact that they are trained on complex-simple sentence pairs. By contrast, our neural programmer-interpreter is directly trained to predict explicit edit operations on targeted parts of the input sentence, resembling the way that humans might perform simplification and revision. Our model outperforms previous state-of-the-art neural sentence simplification models (without external knowledge) by large margins on three benchmark text simplification corpora in terms of SARI (+0.95 WikiLarge, +1.89 WikiSmall, +1.41 Newsela), and is judged by humans to produce overall better and simpler output sentences. |
Tasks | Machine Translation, Text Simplification |
Published | 2019-06-19 |
URL | https://arxiv.org/abs/1906.08104v1 |
https://arxiv.org/pdf/1906.08104v1.pdf | |
PWC | https://paperswithcode.com/paper/editnts-an-neural-programmer-interpreter |
Repo | https://github.com/yuedongP/EditNTS |
Framework | pytorch |