February 1, 2020

3186 words 15 mins read

Paper Group AWR 253

Paper Group AWR 253

Variational Prototyping-Encoder: One-Shot Learning with Prototypical Images. ‘Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-term Tracking. Where are the Masks: Instance Segmentation with Image-level Supervision. Supervised Multimodal Bitransformers for Classifying Images and Text. Attentive Neural Processes. Exchangeable Gen …

Variational Prototyping-Encoder: One-Shot Learning with Prototypical Images

Title Variational Prototyping-Encoder: One-Shot Learning with Prototypical Images
Authors Junsik Kim, Tae-Hyun Oh, Seokju Lee, Fei Pan, In So Kweon
Abstract In daily life, graphic symbols, such as traffic signs and brand logos, are ubiquitously utilized around us due to its intuitive expression beyond language boundary. We tackle an open-set graphic symbol recognition problem by one-shot classification with prototypical images as a single training example for each novel class. We take an approach to learn a generalizable embedding space for novel tasks. We propose a new approach called variational prototyping-encoder (VPE) that learns the image translation task from real-world input images to their corresponding prototypical images as a meta-task. As a result, VPE learns image similarity as well as prototypical concepts which differs from widely used metric learning based approaches. Our experiments with diverse datasets demonstrate that the proposed VPE performs favorably against competing metric learning based one-shot methods. Also, our qualitative analyses show that our meta-task induces an effective embedding space suitable for unseen data representation.
Tasks Metric Learning, One-Shot Learning
Published 2019-04-17
URL http://arxiv.org/abs/1904.08482v1
PDF http://arxiv.org/pdf/1904.08482v1.pdf
PWC https://paperswithcode.com/paper/variational-prototyping-encoder-one-shot
Repo https://github.com/mibastro/VPE
Framework pytorch

‘Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-term Tracking

Title ‘Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-term Tracking
Authors Bin Yan, Haojie Zhao, Dong Wang, Huchuan Lu, Xiaoyun Yang
Abstract Compared with traditional short-term tracking, long-term tracking poses more challenges and is much closer to realistic applications. However, few works have been done and their performance have also been limited. In this work, we present a novel robust and real-time long-term tracking framework based on the proposed skimming and perusal modules. The perusal module consists of an effective bounding box regressor to generate a series of candidate proposals and a robust target verifier to infer the optimal candidate with its confidence score. Based on this score, our tracker determines whether the tracked object being present or absent, and then chooses the tracking strategies of local search or global search respectively in the next frame. To speed up the image-wide global search, a novel skimming module is designed to efficiently choose the most possible regions from a large number of sliding windows. Numerous experimental results on the VOT-2018 long-term and OxUvA long-term benchmarks demonstrate that the proposed method achieves the best performance and runs in real-time. The source codes are available at https://github.com/iiau-tracker/SPLT.
Tasks
Published 2019-09-04
URL https://arxiv.org/abs/1909.01840v1
PDF https://arxiv.org/pdf/1909.01840v1.pdf
PWC https://paperswithcode.com/paper/skimming-perusal-tracking-a-framework-for
Repo https://github.com/iiau-tracker/SPLT
Framework tf

Where are the Masks: Instance Segmentation with Image-level Supervision

Title Where are the Masks: Instance Segmentation with Image-level Supervision
Authors Issam H. Laradji, David Vazquez, Mark Schmidt
Abstract A major obstacle in instance segmentation is that existing methods often need many per-pixel labels in order to be effective. These labels require large human effort and for certain applications, such labels are not readily available. To address this limitation, we propose a novel framework that can effectively train with image-level labels, which are significantly cheaper to acquire. For instance, one can do an internet search for the term “car” and obtain many images where a car is present with minimal effort. Our framework consists of two stages: (1) train a classifier to generate pseudo masks for the objects of interest; (2) train a fully supervised Mask R-CNN on these pseudo masks. Our two main contribution are proposing a pipeline that is simple to implement and is amenable to different segmentation methods; and achieves new state-of-the-art results for this problem setup. Our results are based on evaluating our method on PASCAL VOC 2012, a standard dataset for weakly supervised methods, where we demonstrate major performance gains compared to existing methods with respect to mean average precision.
Tasks Instance Segmentation, Semantic Segmentation
Published 2019-07-02
URL https://arxiv.org/abs/1907.01430v1
PDF https://arxiv.org/pdf/1907.01430v1.pdf
PWC https://paperswithcode.com/paper/where-are-the-masks-instance-segmentation
Repo https://github.com/ElementAI/wise_ils
Framework pytorch

Supervised Multimodal Bitransformers for Classifying Images and Text

Title Supervised Multimodal Bitransformers for Classifying Images and Text
Authors Douwe Kiela, Suvrat Bhooshan, Hamed Firooz, Davide Testuggine
Abstract Self-supervised bidirectional transformer models such as BERT have led to dramatic improvements in a wide variety of textual classification tasks. The modern digital world is increasingly multimodal, however, and textual information is often accompanied by other modalities such as images. We introduce a supervised multimodal bitransformer model that fuses information from text and image encoders, and obtain state-of-the-art performance on various multimodal classification benchmark tasks, outperforming strong baselines, including on hard test sets specifically designed to measure multimodal performance.
Tasks
Published 2019-09-06
URL https://arxiv.org/abs/1909.02950v1
PDF https://arxiv.org/pdf/1909.02950v1.pdf
PWC https://paperswithcode.com/paper/supervised-multimodal-bitransformers-for
Repo https://github.com/facebookresearch/mmbt
Framework pytorch

Attentive Neural Processes

Title Attentive Neural Processes
Authors Hyunjik Kim, Andriy Mnih, Jonathan Schwarz, Marta Garnelo, Ali Eslami, Dan Rosenbaum, Oriol Vinyals, Yee Whye Teh
Abstract Neural Processes (NPs) (Garnelo et al 2018a;b) approach regression by learning to map a context set of observed input-output pairs to a distribution over regression functions. Each function models the distribution of the output given an input, conditioned on the context. NPs have the benefit of fitting observed data efficiently with linear complexity in the number of context input-output pairs, and can learn a wide family of conditional distributions; they learn predictive distributions conditioned on context sets of arbitrary size. Nonetheless, we show that NPs suffer a fundamental drawback of underfitting, giving inaccurate predictions at the inputs of the observed data they condition on. We address this issue by incorporating attention into NPs, allowing each input location to attend to the relevant context points for the prediction. We show that this greatly improves the accuracy of predictions, results in noticeably faster training, and expands the range of functions that can be modelled.
Tasks
Published 2019-01-17
URL https://arxiv.org/abs/1901.05761v2
PDF https://arxiv.org/pdf/1901.05761v2.pdf
PWC https://paperswithcode.com/paper/attentive-neural-processes
Repo https://github.com/3springs/np_vs_kriging
Framework none

Exchangeable Generative Models with Flow Scans

Title Exchangeable Generative Models with Flow Scans
Authors Christopher Bender, Kevin O’Connor, Yang Li, Juan Jose Garcia, Manzil Zaheer, Junier Oliva
Abstract In this work, we develop a new approach to generative density estimation for exchangeable, non-i.i.d. data. The proposed framework, FlowScan, combines invertible flow transformations with a sorted scan to flexibly model the data while preserving exchangeability. Unlike most existing methods, FlowScan exploits the intradependencies within sets to learn both global and local structure. FlowScan represents the first approach that is able to apply sequential methods to exchangeable density estimation without resorting to averaging over all possible permutations. We achieve new state-of-the-art performance on point cloud and image set modeling.
Tasks Density Estimation
Published 2019-02-05
URL https://arxiv.org/abs/1902.01967v3
PDF https://arxiv.org/pdf/1902.01967v3.pdf
PWC https://paperswithcode.com/paper/permutation-invariant-likelihoods-and
Repo https://github.com/lupalab/flowscan
Framework tf

Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout

Title Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout
Authors Hao Tan, Licheng Yu, Mohit Bansal
Abstract A grand goal in AI is to build a robot that can accurately navigate based on natural language instructions, which requires the agent to perceive the scene, understand and ground language, and act in the real-world environment. One key challenge here is to learn to navigate in new environments that are unseen during training. Most of the existing approaches perform dramatically worse in unseen environments as compared to seen ones. In this paper, we present a generalizable navigational agent. Our agent is trained in two stages. The first stage is training via mixed imitation and reinforcement learning, combining the benefits from both off-policy and on-policy optimization. The second stage is fine-tuning via newly-introduced ‘unseen’ triplets (environment, path, instruction). To generate these unseen triplets, we propose a simple but effective ‘environmental dropout’ method to mimic unseen environments, which overcomes the problem of limited seen environment variability. Next, we apply semi-supervised learning (via back-translation) on these dropped-out environments to generate new paths and instructions. Empirically, we show that our agent is substantially better at generalizability when fine-tuned with these triplets, outperforming the state-of-art approaches by a large margin on the private unseen test set of the Room-to-Room task, and achieving the top rank on the leaderboard.
Tasks Vision-Language Navigation
Published 2019-04-08
URL http://arxiv.org/abs/1904.04195v1
PDF http://arxiv.org/pdf/1904.04195v1.pdf
PWC https://paperswithcode.com/paper/learning-to-navigate-unseen-environments-back
Repo https://github.com/airsplay/R2R-EnvDrop
Framework pytorch

Provably Powerful Graph Networks

Title Provably Powerful Graph Networks
Authors Haggai Maron, Heli Ben-Hamu, Hadar Serviansky, Yaron Lipman
Abstract Recently, the Weisfeiler-Lehman (WL) graph isomorphism test was used to measure the expressive power of graph neural networks (GNN). It was shown that the popular message passing GNN cannot distinguish between graphs that are indistinguishable by the 1-WL test (Morris et al. 2018; Xu et al. 2019). Unfortunately, many simple instances of graphs are indistinguishable by the 1-WL test. In search for more expressive graph learning models we build upon the recent k-order invariant and equivariant graph neural networks (Maron et al. 2019a,b) and present two results: First, we show that such k-order networks can distinguish between non-isomorphic graphs as good as the k-WL tests, which are provably stronger than the 1-WL test for k>2. This makes these models strictly stronger than message passing models. Unfortunately, the higher expressiveness of these models comes with a computational cost of processing high order tensors. Second, setting our goal at building a provably stronger, simple and scalable model we show that a reduced 2-order network containing just scaled identity operator, augmented with a single quadratic operation (matrix multiplication) has a provable 3-WL expressive power. Differently put, we suggest a simple model that interleaves applications of standard Multilayer-Perceptron (MLP) applied to the feature dimension and matrix multiplication. We validate this model by presenting state of the art results on popular graph classification and regression tasks. To the best of our knowledge, this is the first practical invariant/equivariant model with guaranteed 3-WL expressiveness, strictly stronger than message passing models.
Tasks Graph Classification
Published 2019-05-27
URL https://arxiv.org/abs/1905.11136v2
PDF https://arxiv.org/pdf/1905.11136v2.pdf
PWC https://paperswithcode.com/paper/provably-powerful-graph-networks
Repo https://github.com/hadarser/ProvablyPowerfulGraphNetworks_torch
Framework pytorch

Attacking Optical Flow

Title Attacking Optical Flow
Authors Anurag Ranjan, Joel Janai, Andreas Geiger, Michael J. Black
Abstract Deep neural nets achieve state-of-the-art performance on the problem of optical flow estimation. Since optical flow is used in several safety-critical applications like self-driving cars, it is important to gain insights into the robustness of those techniques. Recently, it has been shown that adversarial attacks easily fool deep neural networks to misclassify objects. The robustness of optical flow networks to adversarial attacks, however, has not been studied so far. In this paper, we extend adversarial patch attacks to optical flow networks and show that such attacks can compromise their performance. We show that corrupting a small patch of less than 1% of the image size can significantly affect optical flow estimates. Our attacks lead to noisy flow estimates that extend significantly beyond the region of the attack, in many cases even completely erasing the motion of objects in the scene. While networks using an encoder-decoder architecture are very sensitive to these attacks, we found that networks using a spatial pyramid architecture are less affected. We analyse the success and failure of attacking both architectures by visualizing their feature maps and comparing them to classical optical flow techniques which are robust to these attacks. We also demonstrate that such attacks are practical by placing a printed pattern into real scenes.
Tasks Optical Flow Estimation, Self-Driving Cars
Published 2019-10-22
URL https://arxiv.org/abs/1910.10053v1
PDF https://arxiv.org/pdf/1910.10053v1.pdf
PWC https://paperswithcode.com/paper/attacking-optical-flow
Repo https://github.com/anuragranj/flowattack
Framework pytorch

Meta-Amortized Variational Inference and Learning

Title Meta-Amortized Variational Inference and Learning
Authors Mike Wu, Kristy Choi, Noah Goodman, Stefano Ermon
Abstract Despite the recent success in probabilistic modeling and their applications, generative models trained using traditional inference techniques struggle to adapt to new distributions, even when the target distribution may be closely related to the ones seen during training. In this work, we present a doubly-amortized variational inference procedure as a way to address this challenge. By sharing computation across not only a set of query inputs, but also a set of different, related probabilistic models, we learn transferable latent representations that generalize across several related distributions. In particular, given a set of distributions over images, we find the learned representations to transfer to different data transformations. We empirically demonstrate the effectiveness of our method by introducing the MetaVAE, and show that it significantly outperforms baselines on downstream image classification tasks on MNIST (10-50%) and NORB (10-35%).
Tasks Density Estimation, Image Classification
Published 2019-02-05
URL https://arxiv.org/abs/1902.01950v2
PDF https://arxiv.org/pdf/1902.01950v2.pdf
PWC https://paperswithcode.com/paper/meta-amortized-variational-inference-and
Repo https://github.com/mhw32/meta-inference-public
Framework pytorch
Title FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search
Authors Xiangxiang Chu, Bo Zhang, Ruijun Xu, Jixiang Li
Abstract One of the most critical problems in two-stage weight-sharing neural architecture search is the evaluation of candidate models. A faithful ranking certainly leads to accurate searching results. However, current methods are prone to making misjudgments. In this paper, we prove that they inevitably give biased evaluations due to inherent unfairness in the supernet training. In view of this, we propose two levels of constraints: expectation fairness and strict fairness. Particularly, strict fairness ensures equal optimization opportunities for all choice blocks throughout the training, which neither overestimates nor underestimates their capacity. We demonstrate this is crucial to improving confidence in models’ ranking. Incorporating our supernet trained under fairness constraints with a multi-objective evolutionary search algorithm, we obtain various state-of-the-art models on ImageNet. Especially, FairNAS-A attains 77.5% top-1 accuracy. The models and their evaluation codes are made publicly available online http://github.com/fairnas/FairNAS .
Tasks Neural Architecture Search
Published 2019-07-03
URL https://arxiv.org/abs/1907.01845v4
PDF https://arxiv.org/pdf/1907.01845v4.pdf
PWC https://paperswithcode.com/paper/fairnas-rethinking-evaluation-fairness-of
Repo https://github.com/xiaomi-automl/FairNAS
Framework pytorch

Topic Grouper: An Agglomerative Clustering Approach to Topic Modeling

Title Topic Grouper: An Agglomerative Clustering Approach to Topic Modeling
Authors Daniel Pfeifer, Jochen L. Leidner
Abstract We introduce Topic Grouper as a complementary approach in the field of probabilistic topic modeling. Topic Grouper creates a disjunctive partitioning of the training vocabulary in a stepwise manner such that resulting partitions represent topics. It is governed by a simple generative model, where the likelihood to generate the training documents via topics is optimized. The algorithm starts with one-word topics and joins two topics at every step. It therefore generates a solution for every desired number of topics ranging between the size of the training vocabulary and one. The process represents an agglomerative clustering that corresponds to a binary tree of topics. A resulting tree may act as a containment hierarchy, typically with more general topics towards the root of tree and more specific topics towards the leaves. Topic Grouper is not governed by a background distribution such as the Dirichlet and avoids hyper parameter optimizations. We show that Topic Grouper has reasonable predictive power and also a reasonable theoretical and practical complexity. Topic Grouper can deal well with stop words and function words and tends to push them into their own topics. Also, it can handle topic distributions, where some topics are more frequent than others. We present typical examples of computed topics from evaluation datasets, where topics appear conclusive and coherent. In this context, the fact that each word belongs to exactly one topic is not a major limitation; in some scenarios this can even be a genuine advantage, e.g.~a related shopping basket analysis may aid in optimizing groupings of articles in sales catalogs.
Tasks
Published 2019-04-13
URL http://arxiv.org/abs/1904.06483v1
PDF http://arxiv.org/pdf/1904.06483v1.pdf
PWC https://paperswithcode.com/paper/topic-grouper-an-agglomerative-clustering
Repo https://github.com/pfeiferd/TopicGrouperJ
Framework none

sql4ml A declarative end-to-end workflow for machine learning

Title sql4ml A declarative end-to-end workflow for machine learning
Authors Nantia Makrynioti, Ruy Ley-Wild, Vasilis Vassalos
Abstract We present sql4ml, a system for expressing supervised machine learning (ML) models in SQL and automatically training them in TensorFlow. The primary motivation for this work stems from the observation that in many data science tasks there is a back-and-forth between a relational database that stores the data and a machine learning framework. Data preprocessing and feature engineering typically happen in a database, whereas learning is usually executed in separate ML libraries. This fragmented workflow requires from the users to juggle between different programming paradigms and software systems. With sql4ml the user can express both feature engineering and ML algorithms in SQL, while the system translates this code to an appropriate representation for training inside a machine learning framework. We describe our translation method, present experimental results from applying it on three well-known ML algorithms and discuss the usability benefits from concentrating the entire workflow on the database side.
Tasks Feature Engineering
Published 2019-07-29
URL https://arxiv.org/abs/1907.12415v2
PDF https://arxiv.org/pdf/1907.12415v2.pdf
PWC https://paperswithcode.com/paper/sql4ml-a-declarative-end-to-end-workflow-for
Repo https://github.com/nantiamak/sql4ml
Framework tf

Attention-based Multi-Input Deep Learning Architecture for Biological Activity Prediction: An Application in EGFR Inhibitors

Title Attention-based Multi-Input Deep Learning Architecture for Biological Activity Prediction: An Application in EGFR Inhibitors
Authors Huy Ngoc Pham, Trung Hoang Le
Abstract Machine learning and deep learning have gained popularity and achieved immense success in Drug discovery in recent decades. Historically, machine learning and deep learning models were trained on either structural data or chemical properties by separated model. In this study, we proposed an architecture training simultaneously both type of data in order to improve the overall performance. Given the molecular structure in the form of SMILES notation and their label, we generated the SMILES-based feature matrix and molecular descriptors. These data were trained on a deep learning model which was also integrated with the Attention mechanism to facilitate training and interpreting. Experiments showed that our model could raise the performance of prediction comparing to the reference. With the maximum MCC 0.58 and AUC 90% by cross-validation on EGFR inhibitors dataset, our architecture was outperforming the referring model. We also successfully integrated Attention mechanism into our model, which helped to interpret the contribution of chemical structures on bioactivity.
Tasks Activity Prediction, Drug Discovery
Published 2019-06-12
URL https://arxiv.org/abs/1906.05168v3
PDF https://arxiv.org/pdf/1906.05168v3.pdf
PWC https://paperswithcode.com/paper/attention-based-multi-input-deep-learning
Repo https://github.com/lehgtrung/egfr-att
Framework pytorch

Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning

Title Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning
Authors Ravichandra Addanki, Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao, Mohammad Alizadeh
Abstract We present Placeto, a reinforcement learning (RL) approach to efficiently find device placements for distributed neural network training. Unlike prior approaches that only find a device placement for a specific computation graph, Placeto can learn generalizable device placement policies that can be applied to any graph. We propose two key ideas in our approach: (1) we represent the policy as performing iterative placement improvements, rather than outputting a placement in one shot; (2) we use graph embeddings to capture relevant information about the structure of the computation graph, without relying on node labels for indexing. These ideas allow Placeto to train efficiently and generalize to unseen graphs. Our experiments show that Placeto requires up to 6.1x fewer training steps to find placements that are on par with or better than the best placements found by prior approaches. Moreover, Placeto is able to learn a generalizable placement policy for any given family of graphs, which can then be used without any retraining to predict optimized placements for unseen graphs from the same family. This eliminates the large overhead incurred by prior RL approaches whose lack of generalizability necessitates re-training from scratch every time a new graph is to be placed.
Tasks
Published 2019-06-20
URL https://arxiv.org/abs/1906.08879v1
PDF https://arxiv.org/pdf/1906.08879v1.pdf
PWC https://paperswithcode.com/paper/placeto-learning-generalizable-device
Repo https://github.com/aravic/generalizable-device-placement
Framework tf
comments powered by Disqus