October 21, 2019

3195 words 15 mins read

Paper Group AWR 115

Paper Group AWR 115

The division of labor in communication: Speakers help listeners account for asymmetries in visual perspective. Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods. VoxCeleb2: Deep Speaker Recognition. MINE: Mutual Information Neural Estimation. Clustering-Oriented Representation Learning with Attractive-Repulsive Loss. Tree Spec …

The division of labor in communication: Speakers help listeners account for asymmetries in visual perspective

Title The division of labor in communication: Speakers help listeners account for asymmetries in visual perspective
Authors Robert D. Hawkins, Hyowon Gweon, Noah D. Goodman
Abstract Recent debates over adults’ theory of mind use have been fueled by surprising failures of perspective-taking in communication, suggesting that perspective-taking can be relatively effortful. How, then, should speakers and listeners allocate their limited cognitive resources to successfully understand one another? We argue for a resource-rational account of how agents navigate this division of labor. Under this account, the cognitive effort an agent chooses to allocate toward perspective-taking should depend flexibly on expectations about their interlocutor’s behavior in context. In particular, we investigate the behavior of speakers in the influential director-matcher task and show that they may be expected to take on more of this effort than previously assumed. In Experiment 1, we explicitly manipulated the presence or absence of occlusions and found that speakers systematically produced longer, more specific referring expressions when it was clear that additional objects could be in their partner’s view but not their own. In Experiment 2, we compare the scripted utterances used by confederates in prior work with those produced by unscripted speakers in the same task. We found that confederate speakers are systematically less informative than listeners would initially expect from naive speakers in this context, but that listeners may use violations to adjust their expectations over time. These results suggest that it may be boundedly rational for listeners to reduce the effort put toward perspective-taking to a certain extent given contextually appropriate pragmatic expectations.
Tasks
Published 2018-07-24
URL https://arxiv.org/abs/1807.09000v3
PDF https://arxiv.org/pdf/1807.09000v3.pdf
PWC https://paperswithcode.com/paper/speakers-account-for-asymmetries-in-visual
Repo https://github.com/hawkrobe/pragmatics_of_perspective_taking
Framework none

Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods

Title Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods
Authors Apratim Bhattacharyya, Mario Fritz, Bernt Schiele
Abstract For autonomous agents to successfully operate in the real world, the ability to anticipate future scene states is a key competence. In real-world scenarios, future states become increasingly uncertain and multi-modal, particularly on long time horizons. Dropout based Bayesian inference provides a computationally tractable, theoretically well grounded approach to learn likely hypotheses/models to deal with uncertain futures and make predictions that correspond well to observations – are well calibrated. However, it turns out that such approaches fall short to capture complex real-world scenes, even falling behind in accuracy when compared to the plain deterministic approaches. This is because the used log-likelihood estimate discourages diversity. In this work, we propose a novel Bayesian formulation for anticipating future scene states which leverages synthetic likelihoods that encourage the learning of diverse models to accurately capture the multi-modal nature of future scene states. We show that our approach achieves accurate state-of-the-art predictions and calibrated probabilities through extensive experiments for scene anticipation on Cityscapes dataset. Moreover, we show that our approach generalizes across diverse tasks such as digit generation and precipitation forecasting.
Tasks Bayesian Inference
Published 2018-10-01
URL http://arxiv.org/abs/1810.00746v3
PDF http://arxiv.org/pdf/1810.00746v3.pdf
PWC https://paperswithcode.com/paper/bayesian-prediction-of-future-street-scenes-1
Repo https://github.com/apratimbhattacharyya18/seg_pred
Framework tf

VoxCeleb2: Deep Speaker Recognition

Title VoxCeleb2: Deep Speaker Recognition
Authors Joon Son Chung, Arsha Nagrani, Andrew Zisserman
Abstract The objective of this paper is speaker recognition under noisy and unconstrained conditions. We make two key contributions. First, we introduce a very large-scale audio-visual speaker recognition dataset collected from open-source media. Using a fully automated pipeline, we curate VoxCeleb2 which contains over a million utterances from over 6,000 speakers. This is several times larger than any publicly available speaker recognition dataset. Second, we develop and compare Convolutional Neural Network (CNN) models and training strategies that can effectively recognise identities from voice under various conditions. The models trained on the VoxCeleb2 dataset surpass the performance of previous works on a benchmark dataset by a significant margin.
Tasks Speaker Recognition
Published 2018-06-14
URL http://arxiv.org/abs/1806.05622v2
PDF http://arxiv.org/pdf/1806.05622v2.pdf
PWC https://paperswithcode.com/paper/voxceleb2-deep-speaker-recognition
Repo https://github.com/a-nagrani/VGGVox
Framework none

MINE: Mutual Information Neural Estimation

Title MINE: Mutual Information Neural Estimation
Authors Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R Devon Hjelm
Abstract We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks. We present a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent. We present a handful of applications on which MINE can be used to minimize or maximize mutual information. We apply MINE to improve adversarially trained generative models. We also use MINE to implement Information Bottleneck, applying it to supervised classification; our results demonstrate substantial improvement in flexibility and performance in these settings.
Tasks
Published 2018-01-12
URL http://arxiv.org/abs/1801.04062v4
PDF http://arxiv.org/pdf/1801.04062v4.pdf
PWC https://paperswithcode.com/paper/mine-mutual-information-neural-estimation
Repo https://github.com/MasanoriYamada/Mine_pytorch
Framework pytorch

Clustering-Oriented Representation Learning with Attractive-Repulsive Loss

Title Clustering-Oriented Representation Learning with Attractive-Repulsive Loss
Authors Kian Kenyon-Dean, Andre Cianflone, Lucas Page-Caccia, Guillaume Rabusseau, Jackie Chi Kit Cheung, Doina Precup
Abstract The standard loss function used to train neural network classifiers, categorical cross-entropy (CCE), seeks to maximize accuracy on the training data; building useful representations is not a necessary byproduct of this objective. In this work, we propose clustering-oriented representation learning (COREL) as an alternative to CCE in the context of a generalized attractive-repulsive loss framework. COREL has the consequence of building latent representations that collectively exhibit the quality of natural clustering within the latent space of the final hidden layer, according to a predefined similarity function. Despite being simple to implement, COREL variants outperform or perform equivalently to CCE in a variety of scenarios, including image and news article classification using both feed-forward and convolutional neural networks. Analysis of the latent spaces created with different similarity functions facilitates insights on the different use cases COREL variants can satisfy, where the Cosine-COREL variant makes a consistently clusterable latent space, while Gaussian-COREL consistently obtains better classification accuracy than CCE.
Tasks Representation Learning
Published 2018-12-18
URL http://arxiv.org/abs/1812.07627v1
PDF http://arxiv.org/pdf/1812.07627v1.pdf
PWC https://paperswithcode.com/paper/clustering-oriented-representation-learning
Repo https://github.com/kiankd/corel2019
Framework pytorch

Tree Species Identification from Bark Images Using Convolutional Neural Networks

Title Tree Species Identification from Bark Images Using Convolutional Neural Networks
Authors Mathieu Carpentier, Philippe Giguère, Jonathan Gaudreault
Abstract Tree species identification using bark images is a challenging problem that could prove useful for many forestry related tasks. However, while the recent progress in deep learning showed impressive results on standard vision problems, a lack of datasets prevented its use on tree bark species classification. In this work, we present, and make publicly available, a novel dataset called BarkNet 1.0 containing more than 23,000 high-resolution bark images from 23 different tree species over a wide range of tree diameters. With it, we demonstrate the feasibility of species recognition through bark images, using deep learning. More specifically, we obtain an accuracy of 93.88% on single crop, and an accuracy of 97.81% using a majority voting approach on all of the images of a tree. We also empirically demonstrate that, for a fixed number of images, it is better to maximize the number of tree individuals in the training database, thus directing future data collection efforts.
Tasks
Published 2018-03-02
URL http://arxiv.org/abs/1803.00949v2
PDF http://arxiv.org/pdf/1803.00949v2.pdf
PWC https://paperswithcode.com/paper/tree-species-identification-from-bark-images
Repo https://github.com/ulaval-damas/tree-bark-classification
Framework pytorch

Graph-to-Sequence Learning using Gated Graph Neural Networks

Title Graph-to-Sequence Learning using Gated Graph Neural Networks
Authors Daniel Beck, Gholamreza Haffari, Trevor Cohn
Abstract Many NLP applications can be framed as a graph-to-sequence learning problem. Previous work proposing neural architectures on this setting obtained promising results compared to grammar-based approaches but still rely on linearisation heuristics and/or standard recurrent networks to achieve the best performance. In this work, we propose a new model that encodes the full structural information contained in the graph. Our architecture couples the recently proposed Gated Graph Neural Networks with an input transformation that allows nodes and edges to have their own hidden representations, while tackling the parameter explosion problem present in previous work. Experimental results show that our model outperforms strong baselines in generation from AMR graphs and syntax-based neural machine translation.
Tasks Graph-to-Sequence, Machine Translation
Published 2018-06-26
URL http://arxiv.org/abs/1806.09835v1
PDF http://arxiv.org/pdf/1806.09835v1.pdf
PWC https://paperswithcode.com/paper/graph-to-sequence-learning-using-gated-graph
Repo https://github.com/beckdaniel/acl2018_graph2seq
Framework mxnet

Fashion-Gen: The Generative Fashion Dataset and Challenge

Title Fashion-Gen: The Generative Fashion Dataset and Challenge
Authors Negar Rostamzadeh, Seyedarian Hosseini, Thomas Boquet, Wojciech Stokowiec, Ying Zhang, Christian Jauvin, Chris Pal
Abstract We introduce a new dataset of 293,008 high definition (1360 x 1360 pixels) fashion images paired with item descriptions provided by professional stylists. Each item is photographed from a variety of angles. We provide baseline results on 1) high-resolution image generation, and 2) image generation conditioned on the given text descriptions. We invite the community to improve upon these baselines. In this paper, we also outline the details of a challenge that we are launching based upon this dataset.
Tasks Image Generation
Published 2018-06-21
URL http://arxiv.org/abs/1806.08317v2
PDF http://arxiv.org/pdf/1806.08317v2.pdf
PWC https://paperswithcode.com/paper/fashion-gen-the-generative-fashion-dataset
Repo https://github.com/iphysresearch/DataSciComp
Framework none

Scaling simulation-to-real transfer by learning composable robot skills

Title Scaling simulation-to-real transfer by learning composable robot skills
Authors Ryan Julian, Eric Heiden, Zhanpeng He, Hejia Zhang, Stefan Schaal, Joseph J. Lim, Gaurav Sukhatme, Karol Hausman
Abstract We present a novel solution to the problem of simulation-to-real transfer, which builds on recent advances in robot skill decomposition. Rather than focusing on minimizing the simulation-reality gap, we learn a set of diverse policies that are parameterized in a way that makes them easily reusable. This diversity and parameterization of low-level skills allows us to find a transferable policy that is able to use combinations and variations of different skills to solve more complex, high-level tasks. In particular, we first use simulation to jointly learn a policy for a set of low-level skills, and a “skill embedding” parameterization which can be used to compose them. Later, we learn high-level policies which actuate the low-level policies via this skill embedding parameterization. The high-level policies encode how and when to reuse the low-level skills together to achieve specific high-level tasks. Importantly, our method learns to control a real robot in joint-space to achieve these high-level tasks with little or no on-robot time, despite the fact that the low-level policies may not be perfectly transferable from simulation to real, and that the low-level skills were not trained on any examples of high-level tasks. We illustrate the principles of our method using informative simulation experiments. We then verify its usefulness for real robotics problems by learning, transferring, and composing free-space and contact motion skills on a Sawyer robot using only joint-space control. We experiment with several techniques for composing pre-learned skills, and find that our method allows us to use both learning-based approaches and efficient search-based planning to achieve high-level tasks using only pre-learned skills.
Tasks
Published 2018-09-26
URL http://arxiv.org/abs/1809.10253v3
PDF http://arxiv.org/pdf/1809.10253v3.pdf
PWC https://paperswithcode.com/paper/scaling-simulation-to-real-transfer-by
Repo https://github.com/ryanjulian/embed2learn
Framework tf

Meta-Transfer Learning for Few-Shot Learning

Title Meta-Transfer Learning for Few-Shot Learning
Authors Qianru Sun, Yaoyao Liu, Tat-Seng Chua, Bernt Schiele
Abstract Meta-learning has been proposed as a framework to address the challenging few-shot learning setting. The key idea is to leverage a large number of similar few-shot tasks in order to learn how to adapt a base-learner to a new task for which only a few labeled samples are available. As deep neural networks (DNNs) tend to overfit using a few samples only, meta-learning typically uses shallow neural networks (SNNs), thus limiting its effectiveness. In this paper we propose a novel few-shot learning method called meta-transfer learning (MTL) which learns to adapt a deep NN for few shot learning tasks. Specifically, “meta” refers to training multiple tasks, and “transfer” is achieved by learning scaling and shifting functions of DNN weights for each task. In addition, we introduce the hard task (HT) meta-batch scheme as an effective learning curriculum for MTL. We conduct experiments using (5-class, 1-shot) and (5-class, 5-shot) recognition tasks on two challenging few-shot learning benchmarks: miniImageNet and Fewshot-CIFAR100. Extensive comparisons to related works validate that our meta-transfer learning approach trained with the proposed HT meta-batch scheme achieves top performance. An ablation study also shows that both components contribute to fast convergence and high accuracy.
Tasks Few-Shot Image Classification, Few-Shot Learning, Meta-Learning, Transfer Learning
Published 2018-12-06
URL http://arxiv.org/abs/1812.02391v3
PDF http://arxiv.org/pdf/1812.02391v3.pdf
PWC https://paperswithcode.com/paper/meta-transfer-learning-for-few-shot-learning
Repo https://github.com/yaoyao-liu/meta-transfer-learning
Framework pytorch

Probabilistic AND-OR Attribute Grouping for Zero-Shot Learning

Title Probabilistic AND-OR Attribute Grouping for Zero-Shot Learning
Authors Yuval Atzmon, Gal Chechik
Abstract In zero-shot learning (ZSL), a classifier is trained to recognize visual classes without any image samples. Instead, it is given semantic information about the class, like a textual description or a set of attributes. Learning from attributes could benefit from explicitly modeling structure of the attribute space. Unfortunately, learning of general structure from empirical samples is hard with typical dataset sizes. Here we describe LAGO, a probabilistic model designed to capture natural soft and-or relations across groups of attributes. We show how this model can be learned end-to-end with a deep attribute-detection model. The soft group structure can be learned from data jointly as part of the model, and can also readily incorporate prior knowledge about groups if available. The soft and-or structure succeeds to capture meaningful and predictive structures, improving the accuracy of zero-shot learning on two of three benchmarks. Finally, LAGO reveals a unified formulation over two ZSL approaches: DAP (Lampert et al., 2009) and ESZSL (Romera-Paredes & Torr, 2015). Interestingly, taking only one singleton group for each attribute, introduces a new soft-relaxation of DAP, that outperforms DAP by ~40.
Tasks Zero-Shot Learning
Published 2018-06-07
URL http://arxiv.org/abs/1806.02664v2
PDF http://arxiv.org/pdf/1806.02664v2.pdf
PWC https://paperswithcode.com/paper/probabilistic-and-or-attribute-grouping-for
Repo https://github.com/yuvalatzmon/LAGO
Framework tf

Self-Supervised GANs via Auxiliary Rotation Loss

Title Self-Supervised GANs via Auxiliary Rotation Loss
Authors Ting Chen, Xiaohua Zhai, Marvin Ritter, Mario Lucic, Neil Houlsby
Abstract Conditional GANs are at the forefront of natural image synthesis. The main drawback of such models is the necessity for labeled data. In this work we exploit two popular unsupervised learning techniques, adversarial training and self-supervision, and take a step towards bridging the gap between conditional and unconditional GANs. In particular, we allow the networks to collaborate on the task of representation learning, while being adversarial with respect to the classic GAN game. The role of self-supervision is to encourage the discriminator to learn meaningful feature representations which are not forgotten during training. We test empirically both the quality of the learned image representations, and the quality of the synthesized images. Under the same conditions, the self-supervised GAN attains a similar performance to state-of-the-art conditional counterparts. Finally, we show that this approach to fully unsupervised learning can be scaled to attain an FID of 23.4 on unconditional ImageNet generation.
Tasks Image Generation, Representation Learning
Published 2018-11-27
URL http://arxiv.org/abs/1811.11212v2
PDF http://arxiv.org/pdf/1811.11212v2.pdf
PWC https://paperswithcode.com/paper/self-supervised-generative-adversarial
Repo https://github.com/vandit15/Self-Supervised-Gans-Pytorch
Framework pytorch

Classification of Household Materials via Spectroscopy

Title Classification of Household Materials via Spectroscopy
Authors Zackory Erickson, Nathan Luskey, Sonia Chernova, Charles C. Kemp
Abstract Recognizing an object’s material can inform a robot on the object’s fragility or appropriate use. To estimate an object’s material during manipulation, many prior works have explored the use of haptic sensing. In this paper, we explore a technique for robots to estimate the materials of objects using spectroscopy. We demonstrate that spectrometers provide several benefits for material recognition, including fast response times and accurate measurements with low noise. Furthermore, spectrometers do not require direct contact with an object. To explore this, we collected a dataset of spectral measurements from two commercially available spectrometers during which a robotic platform interacted with 50 flat material objects, and we show that a neural network model can accurately analyze these measurements. Due to the similarity between consecutive spectral measurements, our model achieved a material classification accuracy of 94.6% when given only one spectral sample per object. Similar to prior works with haptic sensors, we found that generalizing material recognition to new objects posed a greater challenge, for which we achieved an accuracy of 79.1% via leave-one-object-out cross-validation. Finally, we demonstrate how a PR2 robot can leverage spectrometers to estimate the materials of everyday objects found in the home. From this work, we find that spectroscopy poses a promising approach for material classification during robotic manipulation.
Tasks Material Classification, Material Recognition
Published 2018-05-10
URL http://arxiv.org/abs/1805.04051v3
PDF http://arxiv.org/pdf/1805.04051v3.pdf
PWC https://paperswithcode.com/paper/classification-of-household-materials-via
Repo https://github.com/kebasaa/SCIO-read
Framework none

A more globally accurate dimensionality reduction method using triplets

Title A more globally accurate dimensionality reduction method using triplets
Authors Ehsan Amid, Manfred K. Warmuth
Abstract We first show that the commonly used dimensionality reduction (DR) methods such as t-SNE and LargeVis poorly capture the global structure of the data in the low dimensional embedding. We show this via a number of tests for the DR methods that can be easily applied by any practitioner to the dataset at hand. Surprisingly enough, t-SNE performs the best w.r.t. the commonly used measures that reward the local neighborhood accuracy such as precision-recall while having the worst performance in our tests for global structure. We then contrast the performance of these two DR method against our new method called TriMap. The main idea behind TriMap is to capture higher orders of structure with triplet information (instead of pairwise information used by t-SNE and LargeVis), and to minimize a robust loss function for satisfying the chosen triplets. We provide compelling experimental evidence on large natural datasets for the clear advantage of the TriMap DR results. As LargeVis, TriMap scales linearly with the number of data points.
Tasks Dimensionality Reduction
Published 2018-03-01
URL http://arxiv.org/abs/1803.00854v1
PDF http://arxiv.org/pdf/1803.00854v1.pdf
PWC https://paperswithcode.com/paper/a-more-globally-accurate-dimensionality
Repo https://github.com/eamid/trimap
Framework none

Message-passing neural networks for high-throughput polymer screening

Title Message-passing neural networks for high-throughput polymer screening
Authors Peter C. St. John, Caleb Phillips, Travis W. Kemper, A. Nolan Wilson, Michael F. Crowley, Mark R. Nimlos, Ross E. Larsen
Abstract Machine learning methods have shown promise in predicting molecular properties, and given sufficient training data machine learning approaches can enable rapid high-throughput virtual screening of large libraries of compounds. Graph-based neural network architectures have emerged in recent years as the most successful approach for predictions based on molecular structure, and have consistently achieved the best performance on benchmark quantum chemical datasets. However, these models have typically required optimized 3D structural information for the molecule to achieve the highest accuracy. These 3D geometries are costly to compute for high levels of theory, limiting the applicability and practicality of machine learning methods in high-throughput screening applications. In this study, we present a new database of candidate molecules for organic photovoltaic applications, comprising approximately 91,000 unique chemical structures.Compared to existing datasets, this dataset contains substantially larger molecules (up to 200 atoms) as well as extrapolated properties for long polymer chains. We show that message-passing neural networks trained with and without 3D structural information for these molecules achieve similar accuracy, comparable to state-of-the-art methods on existing benchmark datasets. These results therefore emphasize that for larger molecules with practical applications, near-optimal prediction results can be obtained without using optimized 3D geometry as an input. We further show that learned molecular representations can be leveraged to reduce the training data required to transfer predictions to a new DFT functional.
Tasks
Published 2018-07-26
URL http://arxiv.org/abs/1807.10363v2
PDF http://arxiv.org/pdf/1807.10363v2.pdf
PWC https://paperswithcode.com/paper/message-passing-neural-networks-for-high
Repo https://github.com/nrel/nfp
Framework tf
comments powered by Disqus