February 1, 2020

3395 words 16 mins read

Paper Group AWR 215

Downhole Track Detection via Multiscale Conditional Generative Adversarial Nets. Atlas: A Dataset and Benchmark for E-commerce Clothing Product Categorization. Deep Recurrent Q-Learning vs Deep Q-Learning on a simple Partially Observable Markov Decision Process with Minecraft. Detecting Extrapolation with Local Ensembles. Brain-Like Object Recognit …

Downhole Track Detection via Multiscale Conditional Generative Adversarial Nets


Title	Downhole Track Detection via Multiscale Conditional Generative Adversarial Nets
Authors	Jia Li, Xing Wei, Guoqiang Yang, Xiao Sun, Changliang Li
Abstract	Frequent mine disasters cause a large number of casualties and property losses. Autonomous driving is a fundamental measure for solving this problem, and track detection is one of the key technologies for computer vision to achieve downhole automatic driving. The track detection result based on the traditional convolutional neural network (CNN) algorithm lacks the detailed and unique description of the object and relies too much on visual postprocessing technology. Therefore, this paper proposes a track detection algorithm based on a multiscale conditional generative adversarial network (CGAN). The generator is decomposed into global and local parts using a multigranularity structure in the generator network. A multiscale shared convolution structure is adopted in the discriminator network to further supervise training the generator. Finally, the Monte Carlo search technique is introduced to search the intermediate state of the generator, and the result is sent to the discriminator for comparison. Compared with the existing work, our model achieved 82.43% pixel accuracy and an average intersection-over-union (IOU) of 0.6218, and the detection of the track reached 95.01% accuracy in the downhole roadway scene test set.
Tasks	Autonomous Driving
Published	2019-04-17
URL	http://arxiv.org/abs/1904.08177v1
PDF	http://arxiv.org/pdf/1904.08177v1.pdf
PWC	https://paperswithcode.com/paper/downhole-track-detection-via-multiscale
Repo	https://github.com/LJ2lijia/Downhole-track-line-dataset
Framework	none

Atlas: A Dataset and Benchmark for E-commerce Clothing Product Categorization


Title	Atlas: A Dataset and Benchmark for E-commerce Clothing Product Categorization
Authors	Venkatesh Umaashankar, Girish Shanmugam S, Aditi Prakash
Abstract	In E-commerce, it is a common practice to organize the product catalog using product taxonomy. This enables the buyer to easily locate the item they are looking for and also to explore various items available under a category. Product taxonomy is a tree structure with 3 or more levels of depth and several leaf nodes. Product categorization is a large scale classification task that assigns a category path to a particular product. Research in this area is restricted by the unavailability of good real-world datasets and the variations in taxonomy due to the absence of a standard across the different e-commerce stores. In this paper, we introduce a high-quality product taxonomy dataset focusing on clothing products which contain 186,150 images under clothing category with 3 levels and 52 leaf nodes in the taxonomy. We explain the methodology used to collect and label this dataset. Further, we establish the benchmark by comparing image classification and Attention based Sequence models for predicting the category path. Our benchmark model reaches a micro f-score of 0.92 on the test set. The dataset, code and pre-trained models are publicly available at \url{https://github.com/vumaasha/atlas}. We invite the community to improve upon these baselines.
Tasks	Image Classification, Product Categorization
Published	2019-08-12
URL	https://arxiv.org/abs/1908.08984v1
PDF	https://arxiv.org/pdf/1908.08984v1.pdf
PWC	https://paperswithcode.com/paper/atlas-a-dataset-and-benchmark-for-e-commerce
Repo	https://github.com/vumaasha/atlas
Framework	none

Deep Recurrent Q-Learning vs Deep Q-Learning on a simple Partially Observable Markov Decision Process with Minecraft


Title	Deep Recurrent Q-Learning vs Deep Q-Learning on a simple Partially Observable Markov Decision Process with Minecraft
Authors	Clément Romac, Vincent Béraud
Abstract	Deep Q-Learning has been successfully applied to a wide variety of tasks in the past several years. However, the architecture of the vanilla Deep Q-Network is not suited to deal with partially observable environments such as 3D video games. For this, recurrent layers have been added to the Deep Q-Network in order to allow it to handle past dependencies. We here use Minecraft for its customization advantages and design two very simple missions that can be frames as Partially Observable Markov Decision Process. We compare on these missions the Deep Q-Network and the Deep Recurrent Q-Network in order to see if the latter, which is trickier and longer to train, is always the best architecture when the agent has to deal with partial observability.
Tasks	Q-Learning
Published	2019-03-11
URL	http://arxiv.org/abs/1903.04311v2
PDF	http://arxiv.org/pdf/1903.04311v2.pdf
PWC	https://paperswithcode.com/paper/deep-recurrent-q-learning-vs-deep-q-learning
Repo	https://github.com/vincentberaud/Minecraft-Reinforcement-Learning
Framework	tf

Detecting Extrapolation with Local Ensembles


Title	Detecting Extrapolation with Local Ensembles
Authors	David Madras, James Atwood, Alex D’Amour
Abstract	We present local ensembles, a method for detecting extrapolation at test time in a pre-trained model. We focus on underdetermination as a key component of extrapolation: we aim to detect when many possible predictions are consistent with the training data and model class. Our method uses local second-order information to approximate the variance of predictions across an ensemble of models from the same class. We compute this approximation by estimating the norm of the component of a test point’s gradient that aligns with the low-curvature directions of the Hessian, and provide a tractable method for estimating this quantity. Experimentally, we show that our method is capable of detecting when a pre-trained model is extrapolating on test data, with applications to out-of-distribution detection, detecting spurious correlates, and active learning.
Tasks	Active Learning, Out-of-Distribution Detection
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09573v1
PDF	https://arxiv.org/pdf/1910.09573v1.pdf
PWC	https://paperswithcode.com/paper/detecting-extrapolation-with-local-ensembles
Repo	https://github.com/dmadras/local-ensembles
Framework	tf

Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs


Title	Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs
Authors	Jonas Kubilius, Martin Schrimpf, Kohitij Kar, Ha Hong, Najib J. Majaj, Rishi Rajalingham, Elias B. Issa, Pouya Bashivan, Jonathan Prescott-Roy, Kailyn Schmidt, Aran Nayebi, Daniel Bear, Daniel L. K. Yamins, James J. DiCarlo
Abstract	Deep convolutional artificial neural networks (ANNs) are the leading class of candidate models of the mechanisms of visual processing in the primate ventral stream. While initially inspired by brain anatomy, over the past years, these ANNs have evolved from a simple eight-layer architecture in AlexNet to extremely deep and branching architectures, demonstrating increasingly better object categorization performance, yet bringing into question how brain-like they still are. In particular, typical deep models from the machine learning community are often hard to map onto the brain’s anatomy due to their vast number of layers and missing biologically-important connections, such as recurrence. Here we demonstrate that better anatomical alignment to the brain and high performance on machine learning as well as neuroscience measures do not have to be in contradiction. We developed CORnet-S, a shallow ANN with four anatomically mapped areas and recurrent connectivity, guided by Brain-Score, a new large-scale composite of neural and behavioral benchmarks for quantifying the functional fidelity of models of the primate ventral visual stream. Despite being significantly shallower than most models, CORnet-S is the top model on Brain-Score and outperforms similarly compact models on ImageNet. Moreover, our extensive analyses of CORnet-S circuitry variants reveal that recurrence is the main predictive factor of both Brain-Score and ImageNet top-1 performance. Finally, we report that the temporal evolution of the CORnet-S “IT” neural population resembles the actual monkey IT population dynamics. Taken together, these results establish CORnet-S, a compact, recurrent ANN, as the current best model of the primate ventral visual stream.
Tasks	Object Recognition
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06161v2
PDF	https://arxiv.org/pdf/1909.06161v2.pdf
PWC	https://paperswithcode.com/paper/brain-like-object-recognition-with-high
Repo	https://github.com/dicarlolab/cornet
Framework	pytorch

Open Set Domain Adaptation: Theoretical Bound and Algorithm


Title	Open Set Domain Adaptation: Theoretical Bound and Algorithm
Authors	Zhen Fang, Jie Lu, Feng Liu, Junyu Xuan, Guangquan Zhang
Abstract	Unsupervised domain adaptation for classification tasks has achieved great progress in leveraging the knowledge in a labeled (source) domain to improve the task performance in an unlabeled (target) domain by mitigating the effect of distribution discrepancy. However, most existing methods can only handle unsupervised closed set domain adaptation (UCSDA), where the source and target domains share the same label set. In this paper, we target a more challenging but realistic setting: unsupervised open set domain adaptation (UOSDA), where the target domain has unknown classes that the source domain does not have. This study is the first to give the generalization bound of open set domain adaptation through theoretically investigating the risk of the target classifier on the unknown classes. The proposed generalization bound for open set domain adaptation has a special term, namely open set difference, which reflects the risk of the target classifier on unknown classes. According to this generalization bound, we propose a novel and theoretically guided unsupervised open set domain adaptation method: Distribution Alignment with Open Difference (DAOD), which is based on the structural risk minimization principle and open set difference regularization. The experiments on several benchmark datasets show the superior performance of the proposed UOSDA method compared with the state-of-the-art methods in the literature.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2019-07-19
URL	https://arxiv.org/abs/1907.08375v1
PDF	https://arxiv.org/pdf/1907.08375v1.pdf
PWC	https://paperswithcode.com/paper/open-set-domain-adaptation-theoretical-bound
Repo	https://github.com/fang-zhen/Open-set-domain-adaptation
Framework	none


Title	VALAN: Vision and Language Agent Navigation
Authors	Larry Lansing, Vihan Jain, Harsh Mehta, Haoshuo Huang, Eugene Ie
Abstract	VALAN is a lightweight and scalable software framework for deep reinforcement learning based on the SEED RL architecture. The framework facilitates the development and evaluation of embodied agents for solving grounded language understanding tasks, such as Vision-and-Language Navigation and Vision-and-Dialog Navigation, in photo-realistic environments, such as Matterport3D and Google StreetView. We have added a minimal set of abstractions on top of SEED RL allowing us to generalize the architecture to solve a variety of other RL problems. In this article, we will describe VALAN’s software abstraction and architecture, and also present an example of using VALAN to design agents for instruction-conditioned indoor navigation.
Tasks
Published	2019-12-06
URL	https://arxiv.org/abs/1912.03241v1
PDF	https://arxiv.org/pdf/1912.03241v1.pdf
PWC	https://paperswithcode.com/paper/valan-vision-and-language-agent-navigation
Repo	https://github.com/google-research/valan
Framework	tf

On Network Design Spaces for Visual Recognition


Title	On Network Design Spaces for Visual Recognition
Authors	Ilija Radosavovic, Justin Johnson, Saining Xie, Wan-Yen Lo, Piotr Dollár
Abstract	Over the past several years progress in designing better neural network architectures for visual recognition has been substantial. To help sustain this rate of progress, in this work we propose to reexamine the methodology for comparing network architectures. In particular, we introduce a new comparison paradigm of distribution estimates, in which network design spaces are compared by applying statistical techniques to populations of sampled models, while controlling for confounding factors like network complexity. Compared to current methodologies of comparing point and curve estimates of model families, distribution estimates paint a more complete picture of the entire design landscape. As a case study, we examine design spaces used in neural architecture search (NAS). We find significant statistical differences between recent NAS design space variants that have been largely overlooked. Furthermore, our analysis reveals that the design spaces for standard model families like ResNeXt can be comparable to the more complex ones used in recent NAS work. We hope these insights into distribution analysis will enable more robust progress toward discovering better networks for visual recognition.
Tasks	Neural Architecture Search
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13214v1
PDF	https://arxiv.org/pdf/1905.13214v1.pdf
PWC	https://paperswithcode.com/paper/on-network-design-spaces-for-visual
Repo	https://github.com/facebookresearch/nds
Framework	none

Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations


Title	Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations
Authors	Daniel S. Brown, Wonjoon Goo, Scott Niekum
Abstract	The performance of imitation learning is typically upper-bounded by the performance of the demonstrator. While recent empirical results demonstrate that ranked demonstrations allow for better-than-demonstrator performance, preferences over demonstrations may be difficult to obtain, and little is known theoretically about when such methods can be expected to successfully extrapolate beyond the performance of the demonstrator. To address these issues, we first contribute a sufficient condition for better-than-demonstrator imitation learning and provide theoretical results showing why preferences over demonstrations can better reduce reward function ambiguity when performing inverse reinforcement learning. Building on this theory, we introduce Disturbance-based Reward Extrapolation (D-REX), a ranking-based imitation learning method that injects noise into a policy learned through behavioral cloning to automatically generate ranked demonstrations. These ranked demonstrations are used to efficiently learn a reward function that can then be optimized using reinforcement learning. We empirically validate our approach on simulated robot and Atari imitation learning benchmarks and show that D-REX outperforms standard imitation learning approaches and can significantly surpass the performance of the demonstrator. D-REX is the first imitation learning approach to achieve significant extrapolation beyond the demonstrator’s performance without additional side-information or supervision, such as rewards or human preferences. By generating rankings automatically, we show that preference-based inverse reinforcement learning can be applied in traditional imitation learning settings where only unlabeled demonstrations are available.
Tasks	Imitation Learning
Published	2019-07-09
URL	https://arxiv.org/abs/1907.03976v3
PDF	https://arxiv.org/pdf/1907.03976v3.pdf
PWC	https://paperswithcode.com/paper/ranking-based-reward-extrapolation-without
Repo	https://github.com/dsbrown1331/CoRL2019-DREX
Framework	tf

STAR-GCN: Stacked and Reconstructed Graph Convolutional Networks for Recommender Systems


Title	STAR-GCN: Stacked and Reconstructed Graph Convolutional Networks for Recommender Systems
Authors	Jiani Zhang, Xingjian Shi, Shenglin Zhao, Irwin King
Abstract	We propose a new STAcked and Reconstructed Graph Convolutional Networks (STAR-GCN) architecture to learn node representations for boosting the performance in recommender systems, especially in the cold start scenario. STAR-GCN employs a stack of GCN encoder-decoders combined with intermediate supervision to improve the final prediction performance. Unlike the graph convolutional matrix completion model with one-hot encoding node inputs, our STAR-GCN learns low-dimensional user and item latent factors as the input to restrain the model space complexity. Moreover, our STAR-GCN can produce node embeddings for new nodes by reconstructing masked input node embeddings, which essentially tackles the cold start problem. Furthermore, we discover a label leakage issue when training GCN-based models for link prediction tasks and propose a training strategy to avoid the issue. Empirical results on multiple rating prediction benchmarks demonstrate our model achieves state-of-the-art performance in four out of five real-world datasets and significant improvements in predicting ratings in the cold start scenario. The code implementation is available in https://github.com/jennyzhang0215/STAR-GCN.
Tasks	Link Prediction, Matrix Completion, Recommendation Systems
Published	2019-05-27
URL	https://arxiv.org/abs/1905.13129v1
PDF	https://arxiv.org/pdf/1905.13129v1.pdf
PWC	https://paperswithcode.com/paper/star-gcn-stacked-and-reconstructed-graph
Repo	https://github.com/jennyzhang0215/STAR-GCN
Framework	mxnet

Adaptively Aligned Image Captioning via Adaptive Attention Time


Title	Adaptively Aligned Image Captioning via Adaptive Attention Time
Authors	Lun Huang, Wenmin Wang, Yaxian Xia, Jie Chen
Abstract	Recent neural models for image captioning usually employ an encoder-decoder framework with an attention mechanism. However, the attention mechanism in such a framework aligns one single (attended) image feature vector to one caption word, assuming one-to-one mapping from source image regions and target caption words, which is never possible. In this paper, we propose a novel attention model, namely Adaptive Attention Time (AAT), to align the source and the target adaptively for image captioning. AAT allows the framework to learn how many attention steps to take to output a caption word at each decoding step. With AAT, an image region can be mapped to an arbitrary number of caption words while a caption word can also attend to an arbitrary number of image regions. AAT is deterministic and differentiable, and doesn’t introduce any noise to the parameter gradients. In this paper, we empirically show that AAT improves over state-of-the-art methods on the task of image captioning. Code is available at https://github.com/husthuaan/AAT.
Tasks	Image Captioning
Published	2019-09-19
URL	https://arxiv.org/abs/1909.09060v3
PDF	https://arxiv.org/pdf/1909.09060v3.pdf
PWC	https://paperswithcode.com/paper/adaptively-aligned-image-captioning-via
Repo	https://github.com/husthuaan/AAT
Framework	pytorch

Look and Modify: Modification Networks for Image Captioning


Title	Look and Modify: Modification Networks for Image Captioning
Authors	Fawaz Sammani, Mahmoud Elsayed
Abstract	Attention-based neural encoder-decoder frameworks have been widely used for image captioning. Many of these frameworks deploy their full focus on generating the caption from scratch by relying solely on the image features or the object detection regional features. In this paper, we introduce a novel framework that learns to modify existing captions from a given framework by modeling the residual information, where at each timestep the model learns what to keep, remove or add to the existing caption allowing the model to fully focus on “what to modify” rather than on “what to predict”. We evaluate our method on the COCO dataset, trained on top of several image captioning frameworks and show that our model successfully modifies captions yielding better ones with better evaluation scores.
Tasks	Image Captioning, Object Detection
Published	2019-09-07
URL	https://arxiv.org/abs/1909.03169v2
PDF	https://arxiv.org/pdf/1909.03169v2.pdf
PWC	https://paperswithcode.com/paper/look-and-modify-modification-networks-for
Repo	https://github.com/fawazsammani/look-and-modify
Framework	pytorch

Concurrent Meta Reinforcement Learning


Title	Concurrent Meta Reinforcement Learning
Authors	Emilio Parisotto, Soham Ghosh, Sai Bhargav Yalamanchi, Varsha Chinnaobireddy, Yuhuai Wu, Ruslan Salakhutdinov
Abstract	State-of-the-art meta reinforcement learning algorithms typically assume the setting of a single agent interacting with its environment in a sequential manner. A negative side-effect of this sequential execution paradigm is that, as the environment becomes more and more challenging, and thus requiring more interaction episodes for the meta-learner, it needs the agent to reason over longer and longer time-scales. To combat the difficulty of long time-scale credit assignment, we propose an alternative parallel framework, which we name “Concurrent Meta-Reinforcement Learning” (CMRL), that transforms the temporal credit assignment problem into a multi-agent reinforcement learning one. In this multi-agent setting, a set of parallel agents are executed in the same environment and each of these “rollout” agents are given the means to communicate with each other. The goal of the communication is to coordinate, in a collaborative manner, the most efficient exploration of the shared task the agents are currently assigned. This coordination therefore represents the meta-learning aspect of the framework, as each agent can be assigned or assign itself a particular section of the current task’s state space. This framework is in contrast to standard RL methods that assume that each parallel rollout occurs independently, which can potentially waste computation if many of the rollouts end up sampling the same part of the state space. Furthermore, the parallel setting enables us to define several reward sharing functions and auxiliary losses that are non-trivial to apply in the sequential setting. We demonstrate the effectiveness of our proposed CMRL at improving over sequential methods in a variety of challenging tasks.
Tasks	Efficient Exploration, Meta-Learning, Multi-agent Reinforcement Learning
Published	2019-03-07
URL	http://arxiv.org/abs/1903.02710v1
PDF	http://arxiv.org/pdf/1903.02710v1.pdf
PWC	https://paperswithcode.com/paper/concurrent-meta-reinforcement-learning
Repo	https://github.com/impredicative/irc-rss-feed-bot
Framework	none

DeepTEGINN: Deep Learning Based Tools to Extract Graphs from Images of Neural Networks


Title	DeepTEGINN: Deep Learning Based Tools to Extract Graphs from Images of Neural Networks
Authors	Gustavo Borges Moreno e Mello, Vibeke Devold Valderhaug, Sidney Pontes-Filho, Evi Zouganeli, Ioanna Sandvig, Stefano Nichele
Abstract	In the brain, the structure of a network of neurons defines how these neurons implement the computations that underlie the mind and the behavior of animals and humans. Provided that we can describe the network of neurons as a graph, we can employ methods from graph theory to investigate its structure or use cellular automata to mathematically assess its function. Although, software for the analysis of graphs and cellular automata are widely available. Graph extraction from the image of networks of brain cells remains difficult. Nervous tissue is heterogeneous, and differences in anatomy may reflect relevant differences in function. Here we introduce a deep learning based toolbox to extracts graphs from images of brain tissue. This toolbox provides an easy-to-use framework allowing system neuroscientists to generate graphs based on images of brain tissue by combining methods from image processing, deep learning, and graph theory. The goals are to simplify the training and usage of deep learning methods for computer vision and facilitate its integration into graph extraction pipelines. In this way, the toolbox provides an alternative to the required laborious manual process of tracing, sorting and classifying. We expect to democratize the machine learning methods to a wider community of users beyond the computer vision experts and improve the time-efficiency of graph extraction from large brain image datasets, which may lead to further understanding of the human mind.
Tasks
Published	2019-07-01
URL	https://arxiv.org/abs/1907.01062v1
PDF	https://arxiv.org/pdf/1907.01062v1.pdf
PWC	https://paperswithcode.com/paper/deepteginn-deep-learning-based-tools-to
Repo	https://github.com/gmorenomello/deepteginn
Framework	none

Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning


Title	Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
Authors	Tian Tan, Zhihan Xiong, Vikranth R. Dwaracherla
Abstract	It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to represent uncertainty in our action-value estimates. We first present an algorithm to learn parameterized indexed value function through a distributional version of temporal difference in a tabular setting and prove its regret bound. Then, in a computational point of view, we propose a dual-network architecture, Parameterized Indexed Networks (PINs), comprising one mean network and one uncertainty network to learn the indexed value function. Finally, we show the efficacy of PINs through computational experiments.
Tasks	Efficient Exploration
Published	2019-12-23
URL	https://arxiv.org/abs/1912.10577v2
PDF	https://arxiv.org/pdf/1912.10577v2.pdf
PWC	https://paperswithcode.com/paper/parameterized-indexed-value-function-for
Repo	https://github.com/tiantan522/PINs
Framework	pytorch