February 1, 2020

3186 words 15 mins read

Paper Group AWR 253

Variational Prototyping-Encoder: One-Shot Learning with Prototypical Images. ‘Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-term Tracking. Where are the Masks: Instance Segmentation with Image-level Supervision. Supervised Multimodal Bitransformers for Classifying Images and Text. Attentive Neural Processes. Exchangeable Gen …

Variational Prototyping-Encoder: One-Shot Learning with Prototypical Images


Title	Variational Prototyping-Encoder: One-Shot Learning with Prototypical Images
Authors	Junsik Kim, Tae-Hyun Oh, Seokju Lee, Fei Pan, In So Kweon
Abstract	In daily life, graphic symbols, such as traffic signs and brand logos, are ubiquitously utilized around us due to its intuitive expression beyond language boundary. We tackle an open-set graphic symbol recognition problem by one-shot classification with prototypical images as a single training example for each novel class. We take an approach to learn a generalizable embedding space for novel tasks. We propose a new approach called variational prototyping-encoder (VPE) that learns the image translation task from real-world input images to their corresponding prototypical images as a meta-task. As a result, VPE learns image similarity as well as prototypical concepts which differs from widely used metric learning based approaches. Our experiments with diverse datasets demonstrate that the proposed VPE performs favorably against competing metric learning based one-shot methods. Also, our qualitative analyses show that our meta-task induces an effective embedding space suitable for unseen data representation.
Tasks	Metric Learning, One-Shot Learning
Published	2019-04-17
URL	http://arxiv.org/abs/1904.08482v1
PDF	http://arxiv.org/pdf/1904.08482v1.pdf
PWC	https://paperswithcode.com/paper/variational-prototyping-encoder-one-shot
Repo	https://github.com/mibastro/VPE
Framework	pytorch

‘Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-term Tracking


Title	‘Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-term Tracking
Authors	Bin Yan, Haojie Zhao, Dong Wang, Huchuan Lu, Xiaoyun Yang
Abstract	Compared with traditional short-term tracking, long-term tracking poses more challenges and is much closer to realistic applications. However, few works have been done and their performance have also been limited. In this work, we present a novel robust and real-time long-term tracking framework based on the proposed skimming and perusal modules. The perusal module consists of an effective bounding box regressor to generate a series of candidate proposals and a robust target verifier to infer the optimal candidate with its confidence score. Based on this score, our tracker determines whether the tracked object being present or absent, and then chooses the tracking strategies of local search or global search respectively in the next frame. To speed up the image-wide global search, a novel skimming module is designed to efficiently choose the most possible regions from a large number of sliding windows. Numerous experimental results on the VOT-2018 long-term and OxUvA long-term benchmarks demonstrate that the proposed method achieves the best performance and runs in real-time. The source codes are available at https://github.com/iiau-tracker/SPLT.
Tasks
Published	2019-09-04
URL	https://arxiv.org/abs/1909.01840v1
PDF	https://arxiv.org/pdf/1909.01840v1.pdf
PWC	https://paperswithcode.com/paper/skimming-perusal-tracking-a-framework-for
Repo	https://github.com/iiau-tracker/SPLT
Framework	tf

Where are the Masks: Instance Segmentation with Image-level Supervision


Title	Where are the Masks: Instance Segmentation with Image-level Supervision
Authors	Issam H. Laradji, David Vazquez, Mark Schmidt
Abstract	A major obstacle in instance segmentation is that existing methods often need many per-pixel labels in order to be effective. These labels require large human effort and for certain applications, such labels are not readily available. To address this limitation, we propose a novel framework that can effectively train with image-level labels, which are significantly cheaper to acquire. For instance, one can do an internet search for the term “car” and obtain many images where a car is present with minimal effort. Our framework consists of two stages: (1) train a classifier to generate pseudo masks for the objects of interest; (2) train a fully supervised Mask R-CNN on these pseudo masks. Our two main contribution are proposing a pipeline that is simple to implement and is amenable to different segmentation methods; and achieves new state-of-the-art results for this problem setup. Our results are based on evaluating our method on PASCAL VOC 2012, a standard dataset for weakly supervised methods, where we demonstrate major performance gains compared to existing methods with respect to mean average precision.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01430v1
PDF	https://arxiv.org/pdf/1907.01430v1.pdf
PWC	https://paperswithcode.com/paper/where-are-the-masks-instance-segmentation
Repo	https://github.com/ElementAI/wise_ils
Framework	pytorch

Supervised Multimodal Bitransformers for Classifying Images and Text


Title	Supervised Multimodal Bitransformers for Classifying Images and Text
Authors	Douwe Kiela, Suvrat Bhooshan, Hamed Firooz, Davide Testuggine
Abstract	Self-supervised bidirectional transformer models such as BERT have led to dramatic improvements in a wide variety of textual classification tasks. The modern digital world is increasingly multimodal, however, and textual information is often accompanied by other modalities such as images. We introduce a supervised multimodal bitransformer model that fuses information from text and image encoders, and obtain state-of-the-art performance on various multimodal classification benchmark tasks, outperforming strong baselines, including on hard test sets specifically designed to measure multimodal performance.
Tasks
Published	2019-09-06
URL	https://arxiv.org/abs/1909.02950v1
PDF	https://arxiv.org/pdf/1909.02950v1.pdf
PWC	https://paperswithcode.com/paper/supervised-multimodal-bitransformers-for
Repo	https://github.com/facebookresearch/mmbt
Framework	pytorch

Attentive Neural Processes


Title	Attentive Neural Processes
Authors	Hyunjik Kim, Andriy Mnih, Jonathan Schwarz, Marta Garnelo, Ali Eslami, Dan Rosenbaum, Oriol Vinyals, Yee Whye Teh
Abstract	Neural Processes (NPs) (Garnelo et al 2018a;b) approach regression by learning to map a context set of observed input-output pairs to a distribution over regression functions. Each function models the distribution of the output given an input, conditioned on the context. NPs have the benefit of fitting observed data efficiently with linear complexity in the number of context input-output pairs, and can learn a wide family of conditional distributions; they learn predictive distributions conditioned on context sets of arbitrary size. Nonetheless, we show that NPs suffer a fundamental drawback of underfitting, giving inaccurate predictions at the inputs of the observed data they condition on. We address this issue by incorporating attention into NPs, allowing each input location to attend to the relevant context points for the prediction. We show that this greatly improves the accuracy of predictions, results in noticeably faster training, and expands the range of functions that can be modelled.
Tasks
Published	2019-01-17
URL	https://arxiv.org/abs/1901.05761v2
PDF	https://arxiv.org/pdf/1901.05761v2.pdf
PWC	https://paperswithcode.com/paper/attentive-neural-processes
Repo	https://github.com/3springs/np_vs_kriging
Framework	none

Exchangeable Generative Models with Flow Scans


Title	Exchangeable Generative Models with Flow Scans
Authors	Christopher Bender, Kevin O’Connor, Yang Li, Juan Jose Garcia, Manzil Zaheer, Junier Oliva
Abstract	In this work, we develop a new approach to generative density estimation for exchangeable, non-i.i.d. data. The proposed framework, FlowScan, combines invertible flow transformations with a sorted scan to flexibly model the data while preserving exchangeability. Unlike most existing methods, FlowScan exploits the intradependencies within sets to learn both global and local structure. FlowScan represents the first approach that is able to apply sequential methods to exchangeable density estimation without resorting to averaging over all possible permutations. We achieve new state-of-the-art performance on point cloud and image set modeling.
Tasks	Density Estimation
Published	2019-02-05
URL	https://arxiv.org/abs/1902.01967v3
PDF	https://arxiv.org/pdf/1902.01967v3.pdf
PWC	https://paperswithcode.com/paper/permutation-invariant-likelihoods-and
Repo	https://github.com/lupalab/flowscan
Framework	tf

Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout


Title	Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout
Authors	Hao Tan, Licheng Yu, Mohit Bansal
Abstract	A grand goal in AI is to build a robot that can accurately navigate based on natural language instructions, which requires the agent to perceive the scene, understand and ground language, and act in the real-world environment. One key challenge here is to learn to navigate in new environments that are unseen during training. Most of the existing approaches perform dramatically worse in unseen environments as compared to seen ones. In this paper, we present a generalizable navigational agent. Our agent is trained in two stages. The first stage is training via mixed imitation and reinforcement learning, combining the benefits from both off-policy and on-policy optimization. The second stage is fine-tuning via newly-introduced ‘unseen’ triplets (environment, path, instruction). To generate these unseen triplets, we propose a simple but effective ‘environmental dropout’ method to mimic unseen environments, which overcomes the problem of limited seen environment variability. Next, we apply semi-supervised learning (via back-translation) on these dropped-out environments to generate new paths and instructions. Empirically, we show that our agent is substantially better at generalizability when fine-tuned with these triplets, outperforming the state-of-art approaches by a large margin on the private unseen test set of the Room-to-Room task, and achieving the top rank on the leaderboard.
Tasks	Vision-Language Navigation
Published	2019-04-08
URL	http://arxiv.org/abs/1904.04195v1
PDF	http://arxiv.org/pdf/1904.04195v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-navigate-unseen-environments-back
Repo	https://github.com/airsplay/R2R-EnvDrop
Framework	pytorch

Provably Powerful Graph Networks


Title	Provably Powerful Graph Networks
Authors	Haggai Maron, Heli Ben-Hamu, Hadar Serviansky, Yaron Lipman
Abstract	Recently, the Weisfeiler-Lehman (WL) graph isomorphism test was used to measure the expressive power of graph neural networks (GNN). It was shown that the popular message passing GNN cannot distinguish between graphs that are indistinguishable by the 1-WL test (Morris et al. 2018; Xu et al. 2019). Unfortunately, many simple instances of graphs are indistinguishable by the 1-WL test. In search for more expressive graph learning models we build upon the recent k-order invariant and equivariant graph neural networks (Maron et al. 2019a,b) and present two results: First, we show that such k-order networks can distinguish between non-isomorphic graphs as good as the k-WL tests, which are provably stronger than the 1-WL test for k>2. This makes these models strictly stronger than message passing models. Unfortunately, the higher expressiveness of these models comes with a computational cost of processing high order tensors. Second, setting our goal at building a provably stronger, simple and scalable model we show that a reduced 2-order network containing just scaled identity operator, augmented with a single quadratic operation (matrix multiplication) has a provable 3-WL expressive power. Differently put, we suggest a simple model that interleaves applications of standard Multilayer-Perceptron (MLP) applied to the feature dimension and matrix multiplication. We validate this model by presenting state of the art results on popular graph classification and regression tasks. To the best of our knowledge, this is the first practical invariant/equivariant model with guaranteed 3-WL expressiveness, strictly stronger than message passing models.
Tasks	Graph Classification
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11136v2
PDF	https://arxiv.org/pdf/1905.11136v2.pdf
PWC	https://paperswithcode.com/paper/provably-powerful-graph-networks
Repo	https://github.com/hadarser/ProvablyPowerfulGraphNetworks_torch
Framework	pytorch

Attacking Optical Flow


Title	Attacking Optical Flow
Authors	Anurag Ranjan, Joel Janai, Andreas Geiger, Michael J. Black
Abstract	Deep neural nets achieve state-of-the-art performance on the problem of optical flow estimation. Since optical flow is used in several safety-critical applications like self-driving cars, it is important to gain insights into the robustness of those techniques. Recently, it has been shown that adversarial attacks easily fool deep neural networks to misclassify objects. The robustness of optical flow networks to adversarial attacks, however, has not been studied so far. In this paper, we extend adversarial patch attacks to optical flow networks and show that such attacks can compromise their performance. We show that corrupting a small patch of less than 1% of the image size can significantly affect optical flow estimates. Our attacks lead to noisy flow estimates that extend significantly beyond the region of the attack, in many cases even completely erasing the motion of objects in the scene. While networks using an encoder-decoder architecture are very sensitive to these attacks, we found that networks using a spatial pyramid architecture are less affected. We analyse the success and failure of attacking both architectures by visualizing their feature maps and comparing them to classical optical flow techniques which are robust to these attacks. We also demonstrate that such attacks are practical by placing a printed pattern into real scenes.
Tasks	Optical Flow Estimation, Self-Driving Cars
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10053v1
PDF	https://arxiv.org/pdf/1910.10053v1.pdf
PWC	https://paperswithcode.com/paper/attacking-optical-flow
Repo	https://github.com/anuragranj/flowattack
Framework	pytorch

Meta-Amortized Variational Inference and Learning


Title	Meta-Amortized Variational Inference and Learning
Authors	Mike Wu, Kristy Choi, Noah Goodman, Stefano Ermon
Abstract	Despite the recent success in probabilistic modeling and their applications, generative models trained using traditional inference techniques struggle to adapt to new distributions, even when the target distribution may be closely related to the ones seen during training. In this work, we present a doubly-amortized variational inference procedure as a way to address this challenge. By sharing computation across not only a set of query inputs, but also a set of different, related probabilistic models, we learn transferable latent representations that generalize across several related distributions. In particular, given a set of distributions over images, we find the learned representations to transfer to different data transformations. We empirically demonstrate the effectiveness of our method by introducing the MetaVAE, and show that it significantly outperforms baselines on downstream image classification tasks on MNIST (10-50%) and NORB (10-35%).
Tasks	Density Estimation, Image Classification
Published	2019-02-05
URL	https://arxiv.org/abs/1902.01950v2
PDF	https://arxiv.org/pdf/1902.01950v2.pdf
PWC	https://paperswithcode.com/paper/meta-amortized-variational-inference-and
Repo	https://github.com/mhw32/meta-inference-public
Framework	pytorch


Title	FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search
Authors	Xiangxiang Chu, Bo Zhang, Ruijun Xu, Jixiang Li
Abstract	One of the most critical problems in two-stage weight-sharing neural architecture search is the evaluation of candidate models. A faithful ranking certainly leads to accurate searching results. However, current methods are prone to making misjudgments. In this paper, we prove that they inevitably give biased evaluations due to inherent unfairness in the supernet training. In view of this, we propose two levels of constraints: expectation fairness and strict fairness. Particularly, strict fairness ensures equal optimization opportunities for all choice blocks throughout the training, which neither overestimates nor underestimates their capacity. We demonstrate this is crucial to improving confidence in models’ ranking. Incorporating our supernet trained under fairness constraints with a multi-objective evolutionary search algorithm, we obtain various state-of-the-art models on ImageNet. Especially, FairNAS-A attains 77.5% top-1 accuracy. The models and their evaluation codes are made publicly available online http://github.com/fairnas/FairNAS .
Tasks	Neural Architecture Search
Published	2019-07-03
URL	https://arxiv.org/abs/1907.01845v4
PDF	https://arxiv.org/pdf/1907.01845v4.pdf
PWC	https://paperswithcode.com/paper/fairnas-rethinking-evaluation-fairness-of
Repo	https://github.com/xiaomi-automl/FairNAS
Framework	pytorch

Topic Grouper: An Agglomerative Clustering Approach to Topic Modeling


Title	Topic Grouper: An Agglomerative Clustering Approach to Topic Modeling
Authors	Daniel Pfeifer, Jochen L. Leidner
Abstract	We introduce Topic Grouper as a complementary approach in the field of probabilistic topic modeling. Topic Grouper creates a disjunctive partitioning of the training vocabulary in a stepwise manner such that resulting partitions represent topics. It is governed by a simple generative model, where the likelihood to generate the training documents via topics is optimized. The algorithm starts with one-word topics and joins two topics at every step. It therefore generates a solution for every desired number of topics ranging between the size of the training vocabulary and one. The process represents an agglomerative clustering that corresponds to a binary tree of topics. A resulting tree may act as a containment hierarchy, typically with more general topics towards the root of tree and more specific topics towards the leaves. Topic Grouper is not governed by a background distribution such as the Dirichlet and avoids hyper parameter optimizations. We show that Topic Grouper has reasonable predictive power and also a reasonable theoretical and practical complexity. Topic Grouper can deal well with stop words and function words and tends to push them into their own topics. Also, it can handle topic distributions, where some topics are more frequent than others. We present typical examples of computed topics from evaluation datasets, where topics appear conclusive and coherent. In this context, the fact that each word belongs to exactly one topic is not a major limitation; in some scenarios this can even be a genuine advantage, e.g.~a related shopping basket analysis may aid in optimizing groupings of articles in sales catalogs.
Tasks
Published	2019-04-13
URL	http://arxiv.org/abs/1904.06483v1
PDF	http://arxiv.org/pdf/1904.06483v1.pdf
PWC	https://paperswithcode.com/paper/topic-grouper-an-agglomerative-clustering
Repo	https://github.com/pfeiferd/TopicGrouperJ
Framework	none

sql4ml A declarative end-to-end workflow for machine learning


Title	sql4ml A declarative end-to-end workflow for machine learning
Authors	Nantia Makrynioti, Ruy Ley-Wild, Vasilis Vassalos
Abstract	We present sql4ml, a system for expressing supervised machine learning (ML) models in SQL and automatically training them in TensorFlow. The primary motivation for this work stems from the observation that in many data science tasks there is a back-and-forth between a relational database that stores the data and a machine learning framework. Data preprocessing and feature engineering typically happen in a database, whereas learning is usually executed in separate ML libraries. This fragmented workflow requires from the users to juggle between different programming paradigms and software systems. With sql4ml the user can express both feature engineering and ML algorithms in SQL, while the system translates this code to an appropriate representation for training inside a machine learning framework. We describe our translation method, present experimental results from applying it on three well-known ML algorithms and discuss the usability benefits from concentrating the entire workflow on the database side.
Tasks	Feature Engineering
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12415v2
PDF	https://arxiv.org/pdf/1907.12415v2.pdf
PWC	https://paperswithcode.com/paper/sql4ml-a-declarative-end-to-end-workflow-for
Repo	https://github.com/nantiamak/sql4ml
Framework	tf

Attention-based Multi-Input Deep Learning Architecture for Biological Activity Prediction: An Application in EGFR Inhibitors


Title	Attention-based Multi-Input Deep Learning Architecture for Biological Activity Prediction: An Application in EGFR Inhibitors
Authors	Huy Ngoc Pham, Trung Hoang Le
Abstract	Machine learning and deep learning have gained popularity and achieved immense success in Drug discovery in recent decades. Historically, machine learning and deep learning models were trained on either structural data or chemical properties by separated model. In this study, we proposed an architecture training simultaneously both type of data in order to improve the overall performance. Given the molecular structure in the form of SMILES notation and their label, we generated the SMILES-based feature matrix and molecular descriptors. These data were trained on a deep learning model which was also integrated with the Attention mechanism to facilitate training and interpreting. Experiments showed that our model could raise the performance of prediction comparing to the reference. With the maximum MCC 0.58 and AUC 90% by cross-validation on EGFR inhibitors dataset, our architecture was outperforming the referring model. We also successfully integrated Attention mechanism into our model, which helped to interpret the contribution of chemical structures on bioactivity.
Tasks	Activity Prediction, Drug Discovery
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05168v3
PDF	https://arxiv.org/pdf/1906.05168v3.pdf
PWC	https://paperswithcode.com/paper/attention-based-multi-input-deep-learning
Repo	https://github.com/lehgtrung/egfr-att
Framework	pytorch

Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning


Title	Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning
Authors	Ravichandra Addanki, Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao, Mohammad Alizadeh
Abstract	We present Placeto, a reinforcement learning (RL) approach to efficiently find device placements for distributed neural network training. Unlike prior approaches that only find a device placement for a specific computation graph, Placeto can learn generalizable device placement policies that can be applied to any graph. We propose two key ideas in our approach: (1) we represent the policy as performing iterative placement improvements, rather than outputting a placement in one shot; (2) we use graph embeddings to capture relevant information about the structure of the computation graph, without relying on node labels for indexing. These ideas allow Placeto to train efficiently and generalize to unseen graphs. Our experiments show that Placeto requires up to 6.1x fewer training steps to find placements that are on par with or better than the best placements found by prior approaches. Moreover, Placeto is able to learn a generalizable placement policy for any given family of graphs, which can then be used without any retraining to predict optimized placements for unseen graphs from the same family. This eliminates the large overhead incurred by prior RL approaches whose lack of generalizability necessitates re-training from scratch every time a new graph is to be placed.
Tasks
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08879v1
PDF	https://arxiv.org/pdf/1906.08879v1.pdf
PWC	https://paperswithcode.com/paper/placeto-learning-generalizable-device
Repo	https://github.com/aravic/generalizable-device-placement
Framework	tf