October 21, 2019

3195 words 15 mins read

Paper Group AWR 115

The division of labor in communication: Speakers help listeners account for asymmetries in visual perspective. Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods. VoxCeleb2: Deep Speaker Recognition. MINE: Mutual Information Neural Estimation. Clustering-Oriented Representation Learning with Attractive-Repulsive Loss. Tree Spec …

The division of labor in communication: Speakers help listeners account for asymmetries in visual perspective


Title	The division of labor in communication: Speakers help listeners account for asymmetries in visual perspective
Authors	Robert D. Hawkins, Hyowon Gweon, Noah D. Goodman
Abstract	Recent debates over adults’ theory of mind use have been fueled by surprising failures of perspective-taking in communication, suggesting that perspective-taking can be relatively effortful. How, then, should speakers and listeners allocate their limited cognitive resources to successfully understand one another? We argue for a resource-rational account of how agents navigate this division of labor. Under this account, the cognitive effort an agent chooses to allocate toward perspective-taking should depend flexibly on expectations about their interlocutor’s behavior in context. In particular, we investigate the behavior of speakers in the influential director-matcher task and show that they may be expected to take on more of this effort than previously assumed. In Experiment 1, we explicitly manipulated the presence or absence of occlusions and found that speakers systematically produced longer, more specific referring expressions when it was clear that additional objects could be in their partner’s view but not their own. In Experiment 2, we compare the scripted utterances used by confederates in prior work with those produced by unscripted speakers in the same task. We found that confederate speakers are systematically less informative than listeners would initially expect from naive speakers in this context, but that listeners may use violations to adjust their expectations over time. These results suggest that it may be boundedly rational for listeners to reduce the effort put toward perspective-taking to a certain extent given contextually appropriate pragmatic expectations.
Tasks
Published	2018-07-24
URL	https://arxiv.org/abs/1807.09000v3
PDF	https://arxiv.org/pdf/1807.09000v3.pdf
PWC	https://paperswithcode.com/paper/speakers-account-for-asymmetries-in-visual
Repo	https://github.com/hawkrobe/pragmatics_of_perspective_taking
Framework	none

Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods


Title	Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods
Authors	Apratim Bhattacharyya, Mario Fritz, Bernt Schiele
Abstract	For autonomous agents to successfully operate in the real world, the ability to anticipate future scene states is a key competence. In real-world scenarios, future states become increasingly uncertain and multi-modal, particularly on long time horizons. Dropout based Bayesian inference provides a computationally tractable, theoretically well grounded approach to learn likely hypotheses/models to deal with uncertain futures and make predictions that correspond well to observations – are well calibrated. However, it turns out that such approaches fall short to capture complex real-world scenes, even falling behind in accuracy when compared to the plain deterministic approaches. This is because the used log-likelihood estimate discourages diversity. In this work, we propose a novel Bayesian formulation for anticipating future scene states which leverages synthetic likelihoods that encourage the learning of diverse models to accurately capture the multi-modal nature of future scene states. We show that our approach achieves accurate state-of-the-art predictions and calibrated probabilities through extensive experiments for scene anticipation on Cityscapes dataset. Moreover, we show that our approach generalizes across diverse tasks such as digit generation and precipitation forecasting.
Tasks	Bayesian Inference
Published	2018-10-01
URL	http://arxiv.org/abs/1810.00746v3
PDF	http://arxiv.org/pdf/1810.00746v3.pdf
PWC	https://paperswithcode.com/paper/bayesian-prediction-of-future-street-scenes-1
Repo	https://github.com/apratimbhattacharyya18/seg_pred
Framework	tf

VoxCeleb2: Deep Speaker Recognition


Title	VoxCeleb2: Deep Speaker Recognition
Authors	Joon Son Chung, Arsha Nagrani, Andrew Zisserman
Abstract	The objective of this paper is speaker recognition under noisy and unconstrained conditions. We make two key contributions. First, we introduce a very large-scale audio-visual speaker recognition dataset collected from open-source media. Using a fully automated pipeline, we curate VoxCeleb2 which contains over a million utterances from over 6,000 speakers. This is several times larger than any publicly available speaker recognition dataset. Second, we develop and compare Convolutional Neural Network (CNN) models and training strategies that can effectively recognise identities from voice under various conditions. The models trained on the VoxCeleb2 dataset surpass the performance of previous works on a benchmark dataset by a significant margin.
Tasks	Speaker Recognition
Published	2018-06-14
URL	http://arxiv.org/abs/1806.05622v2
PDF	http://arxiv.org/pdf/1806.05622v2.pdf
PWC	https://paperswithcode.com/paper/voxceleb2-deep-speaker-recognition
Repo	https://github.com/a-nagrani/VGGVox
Framework	none

MINE: Mutual Information Neural Estimation


Title	MINE: Mutual Information Neural Estimation
Authors	Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R Devon Hjelm
Abstract	We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks. We present a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent. We present a handful of applications on which MINE can be used to minimize or maximize mutual information. We apply MINE to improve adversarially trained generative models. We also use MINE to implement Information Bottleneck, applying it to supervised classification; our results demonstrate substantial improvement in flexibility and performance in these settings.
Tasks
Published	2018-01-12
URL	http://arxiv.org/abs/1801.04062v4
PDF	http://arxiv.org/pdf/1801.04062v4.pdf
PWC	https://paperswithcode.com/paper/mine-mutual-information-neural-estimation
Repo	https://github.com/MasanoriYamada/Mine_pytorch
Framework	pytorch

Clustering-Oriented Representation Learning with Attractive-Repulsive Loss


Title	Clustering-Oriented Representation Learning with Attractive-Repulsive Loss
Authors	Kian Kenyon-Dean, Andre Cianflone, Lucas Page-Caccia, Guillaume Rabusseau, Jackie Chi Kit Cheung, Doina Precup
Abstract	The standard loss function used to train neural network classifiers, categorical cross-entropy (CCE), seeks to maximize accuracy on the training data; building useful representations is not a necessary byproduct of this objective. In this work, we propose clustering-oriented representation learning (COREL) as an alternative to CCE in the context of a generalized attractive-repulsive loss framework. COREL has the consequence of building latent representations that collectively exhibit the quality of natural clustering within the latent space of the final hidden layer, according to a predefined similarity function. Despite being simple to implement, COREL variants outperform or perform equivalently to CCE in a variety of scenarios, including image and news article classification using both feed-forward and convolutional neural networks. Analysis of the latent spaces created with different similarity functions facilitates insights on the different use cases COREL variants can satisfy, where the Cosine-COREL variant makes a consistently clusterable latent space, while Gaussian-COREL consistently obtains better classification accuracy than CCE.
Tasks	Representation Learning
Published	2018-12-18
URL	http://arxiv.org/abs/1812.07627v1
PDF	http://arxiv.org/pdf/1812.07627v1.pdf
PWC	https://paperswithcode.com/paper/clustering-oriented-representation-learning
Repo	https://github.com/kiankd/corel2019
Framework	pytorch

Tree Species Identification from Bark Images Using Convolutional Neural Networks


Title	Tree Species Identification from Bark Images Using Convolutional Neural Networks
Authors	Mathieu Carpentier, Philippe Giguère, Jonathan Gaudreault
Abstract	Tree species identification using bark images is a challenging problem that could prove useful for many forestry related tasks. However, while the recent progress in deep learning showed impressive results on standard vision problems, a lack of datasets prevented its use on tree bark species classification. In this work, we present, and make publicly available, a novel dataset called BarkNet 1.0 containing more than 23,000 high-resolution bark images from 23 different tree species over a wide range of tree diameters. With it, we demonstrate the feasibility of species recognition through bark images, using deep learning. More specifically, we obtain an accuracy of 93.88% on single crop, and an accuracy of 97.81% using a majority voting approach on all of the images of a tree. We also empirically demonstrate that, for a fixed number of images, it is better to maximize the number of tree individuals in the training database, thus directing future data collection efforts.
Tasks
Published	2018-03-02
URL	http://arxiv.org/abs/1803.00949v2
PDF	http://arxiv.org/pdf/1803.00949v2.pdf
PWC	https://paperswithcode.com/paper/tree-species-identification-from-bark-images
Repo	https://github.com/ulaval-damas/tree-bark-classification
Framework	pytorch

Graph-to-Sequence Learning using Gated Graph Neural Networks


Title	Graph-to-Sequence Learning using Gated Graph Neural Networks
Authors	Daniel Beck, Gholamreza Haffari, Trevor Cohn
Abstract	Many NLP applications can be framed as a graph-to-sequence learning problem. Previous work proposing neural architectures on this setting obtained promising results compared to grammar-based approaches but still rely on linearisation heuristics and/or standard recurrent networks to achieve the best performance. In this work, we propose a new model that encodes the full structural information contained in the graph. Our architecture couples the recently proposed Gated Graph Neural Networks with an input transformation that allows nodes and edges to have their own hidden representations, while tackling the parameter explosion problem present in previous work. Experimental results show that our model outperforms strong baselines in generation from AMR graphs and syntax-based neural machine translation.
Tasks	Graph-to-Sequence, Machine Translation
Published	2018-06-26
URL	http://arxiv.org/abs/1806.09835v1
PDF	http://arxiv.org/pdf/1806.09835v1.pdf
PWC	https://paperswithcode.com/paper/graph-to-sequence-learning-using-gated-graph
Repo	https://github.com/beckdaniel/acl2018_graph2seq
Framework	mxnet

Fashion-Gen: The Generative Fashion Dataset and Challenge


Title	Fashion-Gen: The Generative Fashion Dataset and Challenge
Authors	Negar Rostamzadeh, Seyedarian Hosseini, Thomas Boquet, Wojciech Stokowiec, Ying Zhang, Christian Jauvin, Chris Pal
Abstract	We introduce a new dataset of 293,008 high definition (1360 x 1360 pixels) fashion images paired with item descriptions provided by professional stylists. Each item is photographed from a variety of angles. We provide baseline results on 1) high-resolution image generation, and 2) image generation conditioned on the given text descriptions. We invite the community to improve upon these baselines. In this paper, we also outline the details of a challenge that we are launching based upon this dataset.
Tasks	Image Generation
Published	2018-06-21
URL	http://arxiv.org/abs/1806.08317v2
PDF	http://arxiv.org/pdf/1806.08317v2.pdf
PWC	https://paperswithcode.com/paper/fashion-gen-the-generative-fashion-dataset
Repo	https://github.com/iphysresearch/DataSciComp
Framework	none

Scaling simulation-to-real transfer by learning composable robot skills


Title	Scaling simulation-to-real transfer by learning composable robot skills
Authors	Ryan Julian, Eric Heiden, Zhanpeng He, Hejia Zhang, Stefan Schaal, Joseph J. Lim, Gaurav Sukhatme, Karol Hausman
Abstract	We present a novel solution to the problem of simulation-to-real transfer, which builds on recent advances in robot skill decomposition. Rather than focusing on minimizing the simulation-reality gap, we learn a set of diverse policies that are parameterized in a way that makes them easily reusable. This diversity and parameterization of low-level skills allows us to find a transferable policy that is able to use combinations and variations of different skills to solve more complex, high-level tasks. In particular, we first use simulation to jointly learn a policy for a set of low-level skills, and a “skill embedding” parameterization which can be used to compose them. Later, we learn high-level policies which actuate the low-level policies via this skill embedding parameterization. The high-level policies encode how and when to reuse the low-level skills together to achieve specific high-level tasks. Importantly, our method learns to control a real robot in joint-space to achieve these high-level tasks with little or no on-robot time, despite the fact that the low-level policies may not be perfectly transferable from simulation to real, and that the low-level skills were not trained on any examples of high-level tasks. We illustrate the principles of our method using informative simulation experiments. We then verify its usefulness for real robotics problems by learning, transferring, and composing free-space and contact motion skills on a Sawyer robot using only joint-space control. We experiment with several techniques for composing pre-learned skills, and find that our method allows us to use both learning-based approaches and efficient search-based planning to achieve high-level tasks using only pre-learned skills.
Tasks
Published	2018-09-26
URL	http://arxiv.org/abs/1809.10253v3
PDF	http://arxiv.org/pdf/1809.10253v3.pdf
PWC	https://paperswithcode.com/paper/scaling-simulation-to-real-transfer-by
Repo	https://github.com/ryanjulian/embed2learn
Framework	tf

Meta-Transfer Learning for Few-Shot Learning


Title	Meta-Transfer Learning for Few-Shot Learning
Authors	Qianru Sun, Yaoyao Liu, Tat-Seng Chua, Bernt Schiele
Abstract	Meta-learning has been proposed as a framework to address the challenging few-shot learning setting. The key idea is to leverage a large number of similar few-shot tasks in order to learn how to adapt a base-learner to a new task for which only a few labeled samples are available. As deep neural networks (DNNs) tend to overfit using a few samples only, meta-learning typically uses shallow neural networks (SNNs), thus limiting its effectiveness. In this paper we propose a novel few-shot learning method called meta-transfer learning (MTL) which learns to adapt a deep NN for few shot learning tasks. Specifically, “meta” refers to training multiple tasks, and “transfer” is achieved by learning scaling and shifting functions of DNN weights for each task. In addition, we introduce the hard task (HT) meta-batch scheme as an effective learning curriculum for MTL. We conduct experiments using (5-class, 1-shot) and (5-class, 5-shot) recognition tasks on two challenging few-shot learning benchmarks: miniImageNet and Fewshot-CIFAR100. Extensive comparisons to related works validate that our meta-transfer learning approach trained with the proposed HT meta-batch scheme achieves top performance. An ablation study also shows that both components contribute to fast convergence and high accuracy.
Tasks	Few-Shot Image Classification, Few-Shot Learning, Meta-Learning, Transfer Learning
Published	2018-12-06
URL	http://arxiv.org/abs/1812.02391v3
PDF	http://arxiv.org/pdf/1812.02391v3.pdf
PWC	https://paperswithcode.com/paper/meta-transfer-learning-for-few-shot-learning
Repo	https://github.com/yaoyao-liu/meta-transfer-learning
Framework	pytorch

Probabilistic AND-OR Attribute Grouping for Zero-Shot Learning


Title	Probabilistic AND-OR Attribute Grouping for Zero-Shot Learning
Authors	Yuval Atzmon, Gal Chechik
Abstract	In zero-shot learning (ZSL), a classifier is trained to recognize visual classes without any image samples. Instead, it is given semantic information about the class, like a textual description or a set of attributes. Learning from attributes could benefit from explicitly modeling structure of the attribute space. Unfortunately, learning of general structure from empirical samples is hard with typical dataset sizes. Here we describe LAGO, a probabilistic model designed to capture natural soft and-or relations across groups of attributes. We show how this model can be learned end-to-end with a deep attribute-detection model. The soft group structure can be learned from data jointly as part of the model, and can also readily incorporate prior knowledge about groups if available. The soft and-or structure succeeds to capture meaningful and predictive structures, improving the accuracy of zero-shot learning on two of three benchmarks. Finally, LAGO reveals a unified formulation over two ZSL approaches: DAP (Lampert et al., 2009) and ESZSL (Romera-Paredes & Torr, 2015). Interestingly, taking only one singleton group for each attribute, introduces a new soft-relaxation of DAP, that outperforms DAP by ~40.
Tasks	Zero-Shot Learning
Published	2018-06-07
URL	http://arxiv.org/abs/1806.02664v2
PDF	http://arxiv.org/pdf/1806.02664v2.pdf
PWC	https://paperswithcode.com/paper/probabilistic-and-or-attribute-grouping-for
Repo	https://github.com/yuvalatzmon/LAGO
Framework	tf

Self-Supervised GANs via Auxiliary Rotation Loss


Title	Self-Supervised GANs via Auxiliary Rotation Loss
Authors	Ting Chen, Xiaohua Zhai, Marvin Ritter, Mario Lucic, Neil Houlsby
Abstract	Conditional GANs are at the forefront of natural image synthesis. The main drawback of such models is the necessity for labeled data. In this work we exploit two popular unsupervised learning techniques, adversarial training and self-supervision, and take a step towards bridging the gap between conditional and unconditional GANs. In particular, we allow the networks to collaborate on the task of representation learning, while being adversarial with respect to the classic GAN game. The role of self-supervision is to encourage the discriminator to learn meaningful feature representations which are not forgotten during training. We test empirically both the quality of the learned image representations, and the quality of the synthesized images. Under the same conditions, the self-supervised GAN attains a similar performance to state-of-the-art conditional counterparts. Finally, we show that this approach to fully unsupervised learning can be scaled to attain an FID of 23.4 on unconditional ImageNet generation.
Tasks	Image Generation, Representation Learning
Published	2018-11-27
URL	http://arxiv.org/abs/1811.11212v2
PDF	http://arxiv.org/pdf/1811.11212v2.pdf
PWC	https://paperswithcode.com/paper/self-supervised-generative-adversarial
Repo	https://github.com/vandit15/Self-Supervised-Gans-Pytorch
Framework	pytorch

Classification of Household Materials via Spectroscopy


Title	Classification of Household Materials via Spectroscopy
Authors	Zackory Erickson, Nathan Luskey, Sonia Chernova, Charles C. Kemp
Abstract	Recognizing an object’s material can inform a robot on the object’s fragility or appropriate use. To estimate an object’s material during manipulation, many prior works have explored the use of haptic sensing. In this paper, we explore a technique for robots to estimate the materials of objects using spectroscopy. We demonstrate that spectrometers provide several benefits for material recognition, including fast response times and accurate measurements with low noise. Furthermore, spectrometers do not require direct contact with an object. To explore this, we collected a dataset of spectral measurements from two commercially available spectrometers during which a robotic platform interacted with 50 flat material objects, and we show that a neural network model can accurately analyze these measurements. Due to the similarity between consecutive spectral measurements, our model achieved a material classification accuracy of 94.6% when given only one spectral sample per object. Similar to prior works with haptic sensors, we found that generalizing material recognition to new objects posed a greater challenge, for which we achieved an accuracy of 79.1% via leave-one-object-out cross-validation. Finally, we demonstrate how a PR2 robot can leverage spectrometers to estimate the materials of everyday objects found in the home. From this work, we find that spectroscopy poses a promising approach for material classification during robotic manipulation.
Tasks	Material Classification, Material Recognition
Published	2018-05-10
URL	http://arxiv.org/abs/1805.04051v3
PDF	http://arxiv.org/pdf/1805.04051v3.pdf
PWC	https://paperswithcode.com/paper/classification-of-household-materials-via
Repo	https://github.com/kebasaa/SCIO-read
Framework	none

A more globally accurate dimensionality reduction method using triplets


Title	A more globally accurate dimensionality reduction method using triplets
Authors	Ehsan Amid, Manfred K. Warmuth
Abstract	We first show that the commonly used dimensionality reduction (DR) methods such as t-SNE and LargeVis poorly capture the global structure of the data in the low dimensional embedding. We show this via a number of tests for the DR methods that can be easily applied by any practitioner to the dataset at hand. Surprisingly enough, t-SNE performs the best w.r.t. the commonly used measures that reward the local neighborhood accuracy such as precision-recall while having the worst performance in our tests for global structure. We then contrast the performance of these two DR method against our new method called TriMap. The main idea behind TriMap is to capture higher orders of structure with triplet information (instead of pairwise information used by t-SNE and LargeVis), and to minimize a robust loss function for satisfying the chosen triplets. We provide compelling experimental evidence on large natural datasets for the clear advantage of the TriMap DR results. As LargeVis, TriMap scales linearly with the number of data points.
Tasks	Dimensionality Reduction
Published	2018-03-01
URL	http://arxiv.org/abs/1803.00854v1
PDF	http://arxiv.org/pdf/1803.00854v1.pdf
PWC	https://paperswithcode.com/paper/a-more-globally-accurate-dimensionality
Repo	https://github.com/eamid/trimap
Framework	none

Message-passing neural networks for high-throughput polymer screening


Title	Message-passing neural networks for high-throughput polymer screening
Authors	Peter C. St. John, Caleb Phillips, Travis W. Kemper, A. Nolan Wilson, Michael F. Crowley, Mark R. Nimlos, Ross E. Larsen
Abstract	Machine learning methods have shown promise in predicting molecular properties, and given sufficient training data machine learning approaches can enable rapid high-throughput virtual screening of large libraries of compounds. Graph-based neural network architectures have emerged in recent years as the most successful approach for predictions based on molecular structure, and have consistently achieved the best performance on benchmark quantum chemical datasets. However, these models have typically required optimized 3D structural information for the molecule to achieve the highest accuracy. These 3D geometries are costly to compute for high levels of theory, limiting the applicability and practicality of machine learning methods in high-throughput screening applications. In this study, we present a new database of candidate molecules for organic photovoltaic applications, comprising approximately 91,000 unique chemical structures.Compared to existing datasets, this dataset contains substantially larger molecules (up to 200 atoms) as well as extrapolated properties for long polymer chains. We show that message-passing neural networks trained with and without 3D structural information for these molecules achieve similar accuracy, comparable to state-of-the-art methods on existing benchmark datasets. These results therefore emphasize that for larger molecules with practical applications, near-optimal prediction results can be obtained without using optimized 3D geometry as an input. We further show that learned molecular representations can be leveraged to reduce the training data required to transfer predictions to a new DFT functional.
Tasks
Published	2018-07-26
URL	http://arxiv.org/abs/1807.10363v2
PDF	http://arxiv.org/pdf/1807.10363v2.pdf
PWC	https://paperswithcode.com/paper/message-passing-neural-networks-for-high
Repo	https://github.com/nrel/nfp
Framework	tf