Paper Group AWR 253
Variational Prototyping-Encoder: One-Shot Learning with Prototypical Images. ‘Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-term Tracking. Where are the Masks: Instance Segmentation with Image-level Supervision. Supervised Multimodal Bitransformers for Classifying Images and Text. Attentive Neural Processes. Exchangeable Gen …
Variational Prototyping-Encoder: One-Shot Learning with Prototypical Images
Title | Variational Prototyping-Encoder: One-Shot Learning with Prototypical Images |
Authors | Junsik Kim, Tae-Hyun Oh, Seokju Lee, Fei Pan, In So Kweon |
Abstract | In daily life, graphic symbols, such as traffic signs and brand logos, are ubiquitously utilized around us due to its intuitive expression beyond language boundary. We tackle an open-set graphic symbol recognition problem by one-shot classification with prototypical images as a single training example for each novel class. We take an approach to learn a generalizable embedding space for novel tasks. We propose a new approach called variational prototyping-encoder (VPE) that learns the image translation task from real-world input images to their corresponding prototypical images as a meta-task. As a result, VPE learns image similarity as well as prototypical concepts which differs from widely used metric learning based approaches. Our experiments with diverse datasets demonstrate that the proposed VPE performs favorably against competing metric learning based one-shot methods. Also, our qualitative analyses show that our meta-task induces an effective embedding space suitable for unseen data representation. |
Tasks | Metric Learning, One-Shot Learning |
Published | 2019-04-17 |
URL | http://arxiv.org/abs/1904.08482v1 |
http://arxiv.org/pdf/1904.08482v1.pdf | |
PWC | https://paperswithcode.com/paper/variational-prototyping-encoder-one-shot |
Repo | https://github.com/mibastro/VPE |
Framework | pytorch |
‘Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-term Tracking
Title | ‘Skimming-Perusal’ Tracking: A Framework for Real-Time and Robust Long-term Tracking |
Authors | Bin Yan, Haojie Zhao, Dong Wang, Huchuan Lu, Xiaoyun Yang |
Abstract | Compared with traditional short-term tracking, long-term tracking poses more challenges and is much closer to realistic applications. However, few works have been done and their performance have also been limited. In this work, we present a novel robust and real-time long-term tracking framework based on the proposed skimming and perusal modules. The perusal module consists of an effective bounding box regressor to generate a series of candidate proposals and a robust target verifier to infer the optimal candidate with its confidence score. Based on this score, our tracker determines whether the tracked object being present or absent, and then chooses the tracking strategies of local search or global search respectively in the next frame. To speed up the image-wide global search, a novel skimming module is designed to efficiently choose the most possible regions from a large number of sliding windows. Numerous experimental results on the VOT-2018 long-term and OxUvA long-term benchmarks demonstrate that the proposed method achieves the best performance and runs in real-time. The source codes are available at https://github.com/iiau-tracker/SPLT. |
Tasks | |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.01840v1 |
https://arxiv.org/pdf/1909.01840v1.pdf | |
PWC | https://paperswithcode.com/paper/skimming-perusal-tracking-a-framework-for |
Repo | https://github.com/iiau-tracker/SPLT |
Framework | tf |
Where are the Masks: Instance Segmentation with Image-level Supervision
Title | Where are the Masks: Instance Segmentation with Image-level Supervision |
Authors | Issam H. Laradji, David Vazquez, Mark Schmidt |
Abstract | A major obstacle in instance segmentation is that existing methods often need many per-pixel labels in order to be effective. These labels require large human effort and for certain applications, such labels are not readily available. To address this limitation, we propose a novel framework that can effectively train with image-level labels, which are significantly cheaper to acquire. For instance, one can do an internet search for the term “car” and obtain many images where a car is present with minimal effort. Our framework consists of two stages: (1) train a classifier to generate pseudo masks for the objects of interest; (2) train a fully supervised Mask R-CNN on these pseudo masks. Our two main contribution are proposing a pipeline that is simple to implement and is amenable to different segmentation methods; and achieves new state-of-the-art results for this problem setup. Our results are based on evaluating our method on PASCAL VOC 2012, a standard dataset for weakly supervised methods, where we demonstrate major performance gains compared to existing methods with respect to mean average precision. |
Tasks | Instance Segmentation, Semantic Segmentation |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01430v1 |
https://arxiv.org/pdf/1907.01430v1.pdf | |
PWC | https://paperswithcode.com/paper/where-are-the-masks-instance-segmentation |
Repo | https://github.com/ElementAI/wise_ils |
Framework | pytorch |
Supervised Multimodal Bitransformers for Classifying Images and Text
Title | Supervised Multimodal Bitransformers for Classifying Images and Text |
Authors | Douwe Kiela, Suvrat Bhooshan, Hamed Firooz, Davide Testuggine |
Abstract | Self-supervised bidirectional transformer models such as BERT have led to dramatic improvements in a wide variety of textual classification tasks. The modern digital world is increasingly multimodal, however, and textual information is often accompanied by other modalities such as images. We introduce a supervised multimodal bitransformer model that fuses information from text and image encoders, and obtain state-of-the-art performance on various multimodal classification benchmark tasks, outperforming strong baselines, including on hard test sets specifically designed to measure multimodal performance. |
Tasks | |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.02950v1 |
https://arxiv.org/pdf/1909.02950v1.pdf | |
PWC | https://paperswithcode.com/paper/supervised-multimodal-bitransformers-for |
Repo | https://github.com/facebookresearch/mmbt |
Framework | pytorch |
Attentive Neural Processes
Title | Attentive Neural Processes |
Authors | Hyunjik Kim, Andriy Mnih, Jonathan Schwarz, Marta Garnelo, Ali Eslami, Dan Rosenbaum, Oriol Vinyals, Yee Whye Teh |
Abstract | Neural Processes (NPs) (Garnelo et al 2018a;b) approach regression by learning to map a context set of observed input-output pairs to a distribution over regression functions. Each function models the distribution of the output given an input, conditioned on the context. NPs have the benefit of fitting observed data efficiently with linear complexity in the number of context input-output pairs, and can learn a wide family of conditional distributions; they learn predictive distributions conditioned on context sets of arbitrary size. Nonetheless, we show that NPs suffer a fundamental drawback of underfitting, giving inaccurate predictions at the inputs of the observed data they condition on. We address this issue by incorporating attention into NPs, allowing each input location to attend to the relevant context points for the prediction. We show that this greatly improves the accuracy of predictions, results in noticeably faster training, and expands the range of functions that can be modelled. |
Tasks | |
Published | 2019-01-17 |
URL | https://arxiv.org/abs/1901.05761v2 |
https://arxiv.org/pdf/1901.05761v2.pdf | |
PWC | https://paperswithcode.com/paper/attentive-neural-processes |
Repo | https://github.com/3springs/np_vs_kriging |
Framework | none |
Exchangeable Generative Models with Flow Scans
Title | Exchangeable Generative Models with Flow Scans |
Authors | Christopher Bender, Kevin O’Connor, Yang Li, Juan Jose Garcia, Manzil Zaheer, Junier Oliva |
Abstract | In this work, we develop a new approach to generative density estimation for exchangeable, non-i.i.d. data. The proposed framework, FlowScan, combines invertible flow transformations with a sorted scan to flexibly model the data while preserving exchangeability. Unlike most existing methods, FlowScan exploits the intradependencies within sets to learn both global and local structure. FlowScan represents the first approach that is able to apply sequential methods to exchangeable density estimation without resorting to averaging over all possible permutations. We achieve new state-of-the-art performance on point cloud and image set modeling. |
Tasks | Density Estimation |
Published | 2019-02-05 |
URL | https://arxiv.org/abs/1902.01967v3 |
https://arxiv.org/pdf/1902.01967v3.pdf | |
PWC | https://paperswithcode.com/paper/permutation-invariant-likelihoods-and |
Repo | https://github.com/lupalab/flowscan |
Framework | tf |
Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout
Title | Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout |
Authors | Hao Tan, Licheng Yu, Mohit Bansal |
Abstract | A grand goal in AI is to build a robot that can accurately navigate based on natural language instructions, which requires the agent to perceive the scene, understand and ground language, and act in the real-world environment. One key challenge here is to learn to navigate in new environments that are unseen during training. Most of the existing approaches perform dramatically worse in unseen environments as compared to seen ones. In this paper, we present a generalizable navigational agent. Our agent is trained in two stages. The first stage is training via mixed imitation and reinforcement learning, combining the benefits from both off-policy and on-policy optimization. The second stage is fine-tuning via newly-introduced ‘unseen’ triplets (environment, path, instruction). To generate these unseen triplets, we propose a simple but effective ‘environmental dropout’ method to mimic unseen environments, which overcomes the problem of limited seen environment variability. Next, we apply semi-supervised learning (via back-translation) on these dropped-out environments to generate new paths and instructions. Empirically, we show that our agent is substantially better at generalizability when fine-tuned with these triplets, outperforming the state-of-art approaches by a large margin on the private unseen test set of the Room-to-Room task, and achieving the top rank on the leaderboard. |
Tasks | Vision-Language Navigation |
Published | 2019-04-08 |
URL | http://arxiv.org/abs/1904.04195v1 |
http://arxiv.org/pdf/1904.04195v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-navigate-unseen-environments-back |
Repo | https://github.com/airsplay/R2R-EnvDrop |
Framework | pytorch |
Provably Powerful Graph Networks
Title | Provably Powerful Graph Networks |
Authors | Haggai Maron, Heli Ben-Hamu, Hadar Serviansky, Yaron Lipman |
Abstract | Recently, the Weisfeiler-Lehman (WL) graph isomorphism test was used to measure the expressive power of graph neural networks (GNN). It was shown that the popular message passing GNN cannot distinguish between graphs that are indistinguishable by the 1-WL test (Morris et al. 2018; Xu et al. 2019). Unfortunately, many simple instances of graphs are indistinguishable by the 1-WL test. In search for more expressive graph learning models we build upon the recent k-order invariant and equivariant graph neural networks (Maron et al. 2019a,b) and present two results: First, we show that such k-order networks can distinguish between non-isomorphic graphs as good as the k-WL tests, which are provably stronger than the 1-WL test for k>2. This makes these models strictly stronger than message passing models. Unfortunately, the higher expressiveness of these models comes with a computational cost of processing high order tensors. Second, setting our goal at building a provably stronger, simple and scalable model we show that a reduced 2-order network containing just scaled identity operator, augmented with a single quadratic operation (matrix multiplication) has a provable 3-WL expressive power. Differently put, we suggest a simple model that interleaves applications of standard Multilayer-Perceptron (MLP) applied to the feature dimension and matrix multiplication. We validate this model by presenting state of the art results on popular graph classification and regression tasks. To the best of our knowledge, this is the first practical invariant/equivariant model with guaranteed 3-WL expressiveness, strictly stronger than message passing models. |
Tasks | Graph Classification |
Published | 2019-05-27 |
URL | https://arxiv.org/abs/1905.11136v2 |
https://arxiv.org/pdf/1905.11136v2.pdf | |
PWC | https://paperswithcode.com/paper/provably-powerful-graph-networks |
Repo | https://github.com/hadarser/ProvablyPowerfulGraphNetworks_torch |
Framework | pytorch |
Attacking Optical Flow
Title | Attacking Optical Flow |
Authors | Anurag Ranjan, Joel Janai, Andreas Geiger, Michael J. Black |
Abstract | Deep neural nets achieve state-of-the-art performance on the problem of optical flow estimation. Since optical flow is used in several safety-critical applications like self-driving cars, it is important to gain insights into the robustness of those techniques. Recently, it has been shown that adversarial attacks easily fool deep neural networks to misclassify objects. The robustness of optical flow networks to adversarial attacks, however, has not been studied so far. In this paper, we extend adversarial patch attacks to optical flow networks and show that such attacks can compromise their performance. We show that corrupting a small patch of less than 1% of the image size can significantly affect optical flow estimates. Our attacks lead to noisy flow estimates that extend significantly beyond the region of the attack, in many cases even completely erasing the motion of objects in the scene. While networks using an encoder-decoder architecture are very sensitive to these attacks, we found that networks using a spatial pyramid architecture are less affected. We analyse the success and failure of attacking both architectures by visualizing their feature maps and comparing them to classical optical flow techniques which are robust to these attacks. We also demonstrate that such attacks are practical by placing a printed pattern into real scenes. |
Tasks | Optical Flow Estimation, Self-Driving Cars |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.10053v1 |
https://arxiv.org/pdf/1910.10053v1.pdf | |
PWC | https://paperswithcode.com/paper/attacking-optical-flow |
Repo | https://github.com/anuragranj/flowattack |
Framework | pytorch |
Meta-Amortized Variational Inference and Learning
Title | Meta-Amortized Variational Inference and Learning |
Authors | Mike Wu, Kristy Choi, Noah Goodman, Stefano Ermon |
Abstract | Despite the recent success in probabilistic modeling and their applications, generative models trained using traditional inference techniques struggle to adapt to new distributions, even when the target distribution may be closely related to the ones seen during training. In this work, we present a doubly-amortized variational inference procedure as a way to address this challenge. By sharing computation across not only a set of query inputs, but also a set of different, related probabilistic models, we learn transferable latent representations that generalize across several related distributions. In particular, given a set of distributions over images, we find the learned representations to transfer to different data transformations. We empirically demonstrate the effectiveness of our method by introducing the MetaVAE, and show that it significantly outperforms baselines on downstream image classification tasks on MNIST (10-50%) and NORB (10-35%). |
Tasks | Density Estimation, Image Classification |
Published | 2019-02-05 |
URL | https://arxiv.org/abs/1902.01950v2 |
https://arxiv.org/pdf/1902.01950v2.pdf | |
PWC | https://paperswithcode.com/paper/meta-amortized-variational-inference-and |
Repo | https://github.com/mhw32/meta-inference-public |
Framework | pytorch |
FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search
Title | FairNAS: Rethinking Evaluation Fairness of Weight Sharing Neural Architecture Search |
Authors | Xiangxiang Chu, Bo Zhang, Ruijun Xu, Jixiang Li |
Abstract | One of the most critical problems in two-stage weight-sharing neural architecture search is the evaluation of candidate models. A faithful ranking certainly leads to accurate searching results. However, current methods are prone to making misjudgments. In this paper, we prove that they inevitably give biased evaluations due to inherent unfairness in the supernet training. In view of this, we propose two levels of constraints: expectation fairness and strict fairness. Particularly, strict fairness ensures equal optimization opportunities for all choice blocks throughout the training, which neither overestimates nor underestimates their capacity. We demonstrate this is crucial to improving confidence in models’ ranking. Incorporating our supernet trained under fairness constraints with a multi-objective evolutionary search algorithm, we obtain various state-of-the-art models on ImageNet. Especially, FairNAS-A attains 77.5% top-1 accuracy. The models and their evaluation codes are made publicly available online http://github.com/fairnas/FairNAS . |
Tasks | Neural Architecture Search |
Published | 2019-07-03 |
URL | https://arxiv.org/abs/1907.01845v4 |
https://arxiv.org/pdf/1907.01845v4.pdf | |
PWC | https://paperswithcode.com/paper/fairnas-rethinking-evaluation-fairness-of |
Repo | https://github.com/xiaomi-automl/FairNAS |
Framework | pytorch |
Topic Grouper: An Agglomerative Clustering Approach to Topic Modeling
Title | Topic Grouper: An Agglomerative Clustering Approach to Topic Modeling |
Authors | Daniel Pfeifer, Jochen L. Leidner |
Abstract | We introduce Topic Grouper as a complementary approach in the field of probabilistic topic modeling. Topic Grouper creates a disjunctive partitioning of the training vocabulary in a stepwise manner such that resulting partitions represent topics. It is governed by a simple generative model, where the likelihood to generate the training documents via topics is optimized. The algorithm starts with one-word topics and joins two topics at every step. It therefore generates a solution for every desired number of topics ranging between the size of the training vocabulary and one. The process represents an agglomerative clustering that corresponds to a binary tree of topics. A resulting tree may act as a containment hierarchy, typically with more general topics towards the root of tree and more specific topics towards the leaves. Topic Grouper is not governed by a background distribution such as the Dirichlet and avoids hyper parameter optimizations. We show that Topic Grouper has reasonable predictive power and also a reasonable theoretical and practical complexity. Topic Grouper can deal well with stop words and function words and tends to push them into their own topics. Also, it can handle topic distributions, where some topics are more frequent than others. We present typical examples of computed topics from evaluation datasets, where topics appear conclusive and coherent. In this context, the fact that each word belongs to exactly one topic is not a major limitation; in some scenarios this can even be a genuine advantage, e.g.~a related shopping basket analysis may aid in optimizing groupings of articles in sales catalogs. |
Tasks | |
Published | 2019-04-13 |
URL | http://arxiv.org/abs/1904.06483v1 |
http://arxiv.org/pdf/1904.06483v1.pdf | |
PWC | https://paperswithcode.com/paper/topic-grouper-an-agglomerative-clustering |
Repo | https://github.com/pfeiferd/TopicGrouperJ |
Framework | none |
sql4ml A declarative end-to-end workflow for machine learning
Title | sql4ml A declarative end-to-end workflow for machine learning |
Authors | Nantia Makrynioti, Ruy Ley-Wild, Vasilis Vassalos |
Abstract | We present sql4ml, a system for expressing supervised machine learning (ML) models in SQL and automatically training them in TensorFlow. The primary motivation for this work stems from the observation that in many data science tasks there is a back-and-forth between a relational database that stores the data and a machine learning framework. Data preprocessing and feature engineering typically happen in a database, whereas learning is usually executed in separate ML libraries. This fragmented workflow requires from the users to juggle between different programming paradigms and software systems. With sql4ml the user can express both feature engineering and ML algorithms in SQL, while the system translates this code to an appropriate representation for training inside a machine learning framework. We describe our translation method, present experimental results from applying it on three well-known ML algorithms and discuss the usability benefits from concentrating the entire workflow on the database side. |
Tasks | Feature Engineering |
Published | 2019-07-29 |
URL | https://arxiv.org/abs/1907.12415v2 |
https://arxiv.org/pdf/1907.12415v2.pdf | |
PWC | https://paperswithcode.com/paper/sql4ml-a-declarative-end-to-end-workflow-for |
Repo | https://github.com/nantiamak/sql4ml |
Framework | tf |
Attention-based Multi-Input Deep Learning Architecture for Biological Activity Prediction: An Application in EGFR Inhibitors
Title | Attention-based Multi-Input Deep Learning Architecture for Biological Activity Prediction: An Application in EGFR Inhibitors |
Authors | Huy Ngoc Pham, Trung Hoang Le |
Abstract | Machine learning and deep learning have gained popularity and achieved immense success in Drug discovery in recent decades. Historically, machine learning and deep learning models were trained on either structural data or chemical properties by separated model. In this study, we proposed an architecture training simultaneously both type of data in order to improve the overall performance. Given the molecular structure in the form of SMILES notation and their label, we generated the SMILES-based feature matrix and molecular descriptors. These data were trained on a deep learning model which was also integrated with the Attention mechanism to facilitate training and interpreting. Experiments showed that our model could raise the performance of prediction comparing to the reference. With the maximum MCC 0.58 and AUC 90% by cross-validation on EGFR inhibitors dataset, our architecture was outperforming the referring model. We also successfully integrated Attention mechanism into our model, which helped to interpret the contribution of chemical structures on bioactivity. |
Tasks | Activity Prediction, Drug Discovery |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.05168v3 |
https://arxiv.org/pdf/1906.05168v3.pdf | |
PWC | https://paperswithcode.com/paper/attention-based-multi-input-deep-learning |
Repo | https://github.com/lehgtrung/egfr-att |
Framework | pytorch |
Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning
Title | Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning |
Authors | Ravichandra Addanki, Shaileshh Bojja Venkatakrishnan, Shreyan Gupta, Hongzi Mao, Mohammad Alizadeh |
Abstract | We present Placeto, a reinforcement learning (RL) approach to efficiently find device placements for distributed neural network training. Unlike prior approaches that only find a device placement for a specific computation graph, Placeto can learn generalizable device placement policies that can be applied to any graph. We propose two key ideas in our approach: (1) we represent the policy as performing iterative placement improvements, rather than outputting a placement in one shot; (2) we use graph embeddings to capture relevant information about the structure of the computation graph, without relying on node labels for indexing. These ideas allow Placeto to train efficiently and generalize to unseen graphs. Our experiments show that Placeto requires up to 6.1x fewer training steps to find placements that are on par with or better than the best placements found by prior approaches. Moreover, Placeto is able to learn a generalizable placement policy for any given family of graphs, which can then be used without any retraining to predict optimized placements for unseen graphs from the same family. This eliminates the large overhead incurred by prior RL approaches whose lack of generalizability necessitates re-training from scratch every time a new graph is to be placed. |
Tasks | |
Published | 2019-06-20 |
URL | https://arxiv.org/abs/1906.08879v1 |
https://arxiv.org/pdf/1906.08879v1.pdf | |
PWC | https://paperswithcode.com/paper/placeto-learning-generalizable-device |
Repo | https://github.com/aravic/generalizable-device-placement |
Framework | tf |