February 1, 2020

3190 words 15 mins read

Paper Group AWR 200

Habitat: A Platform for Embodied AI Research. Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification. Deep Learning Recommendation Model for Personalization and Recommendation Systems. Multi-Agent Image Classification via Reinforcement Learning. What Makes A Good Story? Designing Composite Rewards for Visu …

Habitat: A Platform for Embodied AI Research


Title	Habitat: A Platform for Embodied AI Research
Authors	Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra
Abstract	We present Habitat, a platform for research in embodied artificial intelligence (AI). Habitat enables training embodied agents (virtual robots) in highly efficient photorealistic 3D simulation. Specifically, Habitat consists of: (i) Habitat-Sim: a flexible, high-performance 3D simulator with configurable agents, sensors, and generic 3D dataset handling. Habitat-Sim is fast – when rendering a scene from Matterport3D, it achieves several thousand frames per second (fps) running single-threaded, and can reach over 10,000 fps multi-process on a single GPU. (ii) Habitat-API: a modular high-level library for end-to-end development of embodied AI algorithms – defining tasks (e.g., navigation, instruction following, question answering), configuring, training, and benchmarking embodied agents. These large-scale engineering contributions enable us to answer scientific questions requiring experiments that were till now impracticable or ‘merely’ impractical. Specifically, in the context of point-goal navigation: (1) we revisit the comparison between learning and SLAM approaches from two recent works and find evidence for the opposite conclusion – that learning outperforms SLAM if scaled to an order of magnitude more experience than previous investigations, and (2) we conduct the first cross-dataset generalization experiments {train, test} x {Matterport3D, Gibson} for multiple sensors {blind, RGB, RGBD, D} and find that only agents with depth (D) sensors generalize across datasets. We hope that our open-source platform and these findings will advance research in embodied AI.
Tasks	PointGoal Navigation, Question Answering, Robot Navigation
Published	2019-04-02
URL	https://arxiv.org/abs/1904.01201v2
PDF	https://arxiv.org/pdf/1904.01201v2.pdf
PWC	https://paperswithcode.com/paper/habitat-a-platform-for-embodied-ai-research
Repo	https://github.com/facebookresearch/habitat-sim
Framework	pytorch

Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification


Title	Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification
Authors	Seokeon Choi, Sumin Lee, Youngeun Kim, Taekyung Kim, Changick Kim
Abstract	Visible-infrared person re-identification (VI-ReID) is an important task in night-time surveillance applications, since visible cameras are difficult to capture valid appearance information under poor illumination conditions. Compared to traditional person re-identification that handles only the intra-modality discrepancy, VI-ReID suffers from additional cross-modality discrepancy caused by different types of imaging systems. To reduce both intra- and cross-modality discrepancies, we propose a Hierarchical Cross-Modality Disentanglement (Hi-CMD) method, which automatically disentangles ID-discriminative factors and ID-excluded factors from visible-thermal images. We only use ID-discriminative factors for robust cross-modality matching without ID-excluded factors such as pose or illumination. To implement our approach, we introduce an ID-preserving person image generation network and a hierarchical feature learning module. Our generation network learns the disentangled representation by generating a new cross-modality image with different poses and illuminations while preserving a person’s identity. At the same time, the feature learning module enables our model to explicitly extract the common ID-discriminative characteristic between visible-infrared images. Extensive experimental results demonstrate that our method outperforms the state-of-the-art methods on two VI-ReID datasets. The source code is available at: https://github.com/bismex/HiCMD.
Tasks	Image Generation, Person Re-Identification
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01230v3
PDF	https://arxiv.org/pdf/1912.01230v3.pdf
PWC	https://paperswithcode.com/paper/hi-cmd-hierarchical-cross-modality
Repo	https://github.com/bismex/HiCMD
Framework	pytorch

Deep Learning Recommendation Model for Personalization and Recommendation Systems


Title	Deep Learning Recommendation Model for Personalization and Recommendation Systems
Authors	Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G. Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii, Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko, Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong, Misha Smelyanskiy
Abstract	With the advent of deep learning, neural network-based recommendation models have emerged as an important tool for tackling personalization and recommendation tasks. These networks differ significantly from other deep learning networks due to their need to handle categorical features and are not well studied or understood. In this paper, we develop a state-of-the-art deep learning recommendation model (DLRM) and provide its implementation in both PyTorch and Caffe2 frameworks. In addition, we design a specialized parallelization scheme utilizing model parallelism on the embedding tables to mitigate memory constraints while exploiting data parallelism to scale-out compute from the fully-connected layers. We compare DLRM against existing recommendation models and characterize its performance on the Big Basin AI platform, demonstrating its usefulness as a benchmark for future algorithmic experimentation and system co-design.
Tasks	Recommendation Systems
Published	2019-05-31
URL	https://arxiv.org/abs/1906.00091v1
PDF	https://arxiv.org/pdf/1906.00091v1.pdf
PWC	https://paperswithcode.com/paper/190600091
Repo	https://github.com/facebookresearch/dlrm
Framework	pytorch

Multi-Agent Image Classification via Reinforcement Learning


Title	Multi-Agent Image Classification via Reinforcement Learning
Authors	Hossein K. Mousavi, Mohammadreza Nazari, Martin Takáč, Nader Motee
Abstract	We investigate a classification problem using multiple mobile agents capable of collecting (partial) pose-dependent observations of an unknown environment. The objective is to classify an image over a finite time horizon. We propose a network architecture on how agents should form a local belief, take local actions, and extract relevant features from their raw partial observations. Agents are allowed to exchange information with their neighboring agents to update their own beliefs. It is shown how reinforcement learning techniques can be utilized to achieve decentralized implementation of the classification problem by running a decentralized consensus protocol. Our experimental results on the MNIST handwritten digit dataset demonstrates the effectiveness of our proposed framework.
Tasks	Image Classification
Published	2019-05-13
URL	https://arxiv.org/abs/1905.04835v2
PDF	https://arxiv.org/pdf/1905.04835v2.pdf
PWC	https://paperswithcode.com/paper/multi-agent-image-classification-via
Repo	https://github.com/Ipsedo/MARLClassification
Framework	pytorch

What Makes A Good Story? Designing Composite Rewards for Visual Storytelling


Title	What Makes A Good Story? Designing Composite Rewards for Visual Storytelling
Authors	Junjie Hu, Yu Cheng, Zhe Gan, Jingjing Liu, Jianfeng Gao, Graham Neubig
Abstract	Previous storytelling approaches mostly focused on optimizing traditional metrics such as BLEU, ROUGE and CIDEr. In this paper, we re-examine this problem from a different angle, by looking deep into what defines a realistically-natural and topically-coherent story. To this end, we propose three assessment criteria: relevance, coherence and expressiveness, which we observe through empirical analysis could constitute a “high-quality” story to the human eye. Following this quality guideline, we propose a reinforcement learning framework, ReCo-RL, with reward functions designed to capture the essence of these quality criteria. Experiments on the Visual Storytelling Dataset (VIST) with both automatic and human evaluations demonstrate that our ReCo-RL model achieves better performance than state-of-the-art baselines on both traditional metrics and the proposed new criteria.
Tasks	Visual Storytelling
Published	2019-09-11
URL	https://arxiv.org/abs/1909.05316v2
PDF	https://arxiv.org/pdf/1909.05316v2.pdf
PWC	https://paperswithcode.com/paper/what-makes-a-good-story-designing-composite
Repo	https://github.com/JunjieHu/ReCo-RL
Framework	pytorch

Putting visual object recognition in context


Title	Putting visual object recognition in context
Authors	Mengmi Zhang, Claire Tseng, Gabriel Kreiman
Abstract	Context plays an important role in visual recognition. Recent studies have shown that visual recognition networks can be fooled by placing objects in inconsistent contexts (e.g., a cow in the ocean). To model the role of contextual information in visual recognition, we systematically investigated ten critical properties of where, when, and how context modulates recognition, including the amount of context, context and object resolution, geometrical structure of context, context congruence, and temporal dynamics of contextual modulation. The tasks involved recognizing a target object surrounded with context in a natural image. As an essential benchmark, we conducted a series of psychophysics experiments where we altered one aspect of context at a time, and quantified recognition accuracy. We propose a biologically-inspired context-aware object recognition model consisting of a two-stream architecture. The model processes visual information at the fovea and periphery in parallel, dynamically incorporates object and contextual information, and sequentially reasons about the class label for the target object. Across a wide range of behavioral tasks, the model approximates human level performance without retraining for each task, captures the dependence of context enhancement on image properties, and provides initial steps towards integrating scene and object information for visual recognition. All source code and data are publicly available: https://github.com/kreimanlab/Put-In-Context.
Tasks	Object Recognition
Published	2019-11-17
URL	https://arxiv.org/abs/1911.07349v3
PDF	https://arxiv.org/pdf/1911.07349v3.pdf
PWC	https://paperswithcode.com/paper/putting-visual-object-recognition-in-context
Repo	https://github.com/kreimanlab/Put-In-Context
Framework	pytorch

Deep Learning for Classification and Severity Estimation of Coffee Leaf Biotic Stress


Title	Deep Learning for Classification and Severity Estimation of Coffee Leaf Biotic Stress
Authors	J. G. M. Esgario, R. A. Krohling, J. A. Ventura
Abstract	Biotic stress consists of damage to plants through other living organisms. Efficient control of biotic agents such as pests and pathogens (viruses, fungi, bacteria, etc.) is closely related to the concept of agricultural sustainability. Agricultural sustainability promotes the development of new technologies that allow the reduction of environmental impacts, greater accessibility to farmers and, consequently, increase on productivity. The use of computer vision with deep learning methods allows the early and correct identification of the stress-causing agent. So, corrective measures can be applied as soon as possible to mitigate the problem. The objective of this work is to design an effective and practical system capable of identifying and estimating the stress severity caused by biotic agents on coffee leaves. The proposed approach consists of a multi-task system based on convolutional neural networks. In addition, we have explored the use of data augmentation techniques to make the system more robust and accurate. The experimental results obtained for classification as well as for severity estimation indicate that the proposed system might be a suitable tool to assist both experts and farmers in the identification and quantification of biotic stresses in coffee plantations.
Tasks	Data Augmentation
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11561v1
PDF	https://arxiv.org/pdf/1907.11561v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-classification-and-severity
Repo	https://github.com/esgario/lara2018
Framework	pytorch

Learning One-Shot Imitation from Humans without Humans


Title	Learning One-Shot Imitation from Humans without Humans
Authors	Alessandro Bonardi, Stephen James, Andrew J. Davison
Abstract	Humans can naturally learn to execute a new task by seeing it performed by other individuals once, and then reproduce it in a variety of configurations. Endowing robots with this ability of imitating humans from third person is a very immediate and natural way of teaching new tasks. Only recently, through meta-learning, there have been successful attempts to one-shot imitation learning from humans; however, these approaches require a lot of human resources to collect the data in the real world to train the robot. But is there a way to remove the need for real world human demonstrations during training? We show that with Task-Embedded Control Networks, we can infer control polices by embedding human demonstrations that can condition a control policy and achieve one-shot imitation learning. Importantly, we do not use a real human arm to supply demonstrations during training, but instead leverage domain randomisation in an application that has not been seen before: sim-to-real transfer on humans. Upon evaluating our approach on pushing and placing tasks in both simulation and in the real world, we show that in comparison to a system that was trained on real-world data we are able to achieve similar results by utilising only simulation data.
Tasks	Imitation Learning, Meta-Learning
Published	2019-11-04
URL	https://arxiv.org/abs/1911.01103v1
PDF	https://arxiv.org/pdf/1911.01103v1.pdf
PWC	https://paperswithcode.com/paper/learning-one-shot-imitation-from-humans
Repo	https://github.com/stepjam/PyRep
Framework	none

Learning Mixtures of Plackett-Luce Models from Structured Partial Orders


Title	Learning Mixtures of Plackett-Luce Models from Structured Partial Orders
Authors	Zhibing Zhao, Lirong Xia
Abstract	Mixtures of ranking models have been widely used for heterogeneous preferences. However, learning a mixture model is highly nontrivial, especially when the dataset consists of partial orders. In such cases, the parameter of the model may not be even identifiable. In this paper, we focus on three popular structures of partial orders: ranked top-$l_1$, $l_2$-way, and choice data over a subset of alternatives. We prove that when the dataset consists of combinations of ranked top-$l_1$ and $l_2$-way (or choice data over up to $l_2$ alternatives), mixture of $k$ Plackett-Luce models is not identifiable when $l_1+l_2\le 2k-1$ ($l_2$ is set to $1$ when there are no $l_2$-way orders). We also prove that under some combinations, including ranked top-$3$, ranked top-$2$ plus $2$-way, and choice data over up to $4$ alternatives, mixtures of two Plackett-Luce models are identifiable. Guided by our theoretical results, we propose efficient generalized method of moments (GMM) algorithms to learn mixtures of two Plackett-Luce models, which are proven consistent. Our experiments demonstrate the efficacy of our algorithms. Moreover, we show that when full rankings are available, learning from different marginal events (partial orders) provides tradeoffs between statistical efficiency and computational efficiency.
Tasks
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11721v1
PDF	https://arxiv.org/pdf/1910.11721v1.pdf
PWC	https://paperswithcode.com/paper/learning-mixtures-of-plackett-luce-models-1
Repo	https://github.com/zhaozb08/MixPL-SPO
Framework	none

Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings


Title	Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings
Authors	Gregor Wiedemann, Steffen Remus, Avi Chawla, Chris Biemann
Abstract	Contextualized word embeddings (CWE) such as provided by ELMo (Peters et al., 2018), Flair NLP (Akbik et al., 2018), or BERT (Devlin et al., 2019) are a major recent innovation in NLP. CWEs provide semantic vector representations of words depending on their respective context. Their advantage over static word embeddings has been shown for a number of tasks, such as text classification, sequence tagging, or machine translation. Since vectors of the same word type can vary depending on the respective context, they implicitly provide a model for word sense disambiguation (WSD). We introduce a simple but effective approach to WSD using a nearest neighbor classification on CWEs. We compare the performance of different CWE models for the task and can report improvements above the current state of the art for two standard WSD benchmark datasets. We further show that the pre-trained BERT model is able to place polysemic words into distinct ‘sense’ regions of the embedding space, while ELMo and Flair NLP do not seem to possess this ability.
Tasks	Word Sense Disambiguation
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10430v2
PDF	https://arxiv.org/pdf/1909.10430v2.pdf
PWC	https://paperswithcode.com/paper/190910430
Repo	https://github.com/uhh-lt/bert-sense
Framework	pytorch

Attention Guided Low-light Image Enhancement with a Large Scale Low-light Simulation Dataset


Title	Attention Guided Low-light Image Enhancement with a Large Scale Low-light Simulation Dataset
Authors	Feifan Lv, Yu Li, Feng Lu
Abstract	Low-light image enhancement is challenging in that it needs to consider not only brightness recovery but also complex issues like color distortion and noise, which usually hide in the dark. Simply adjusting the brightness of a low-light image will inevitably amplify those artifacts. To address this difficult problem, this paper proposes a novel end-to-end attention-guided method based on multi-branch convolutional neural network. To this end, we first construct a synthetic dataset with carefully designed low-light simulation strategies. The dataset is much larger and more diverse than existing ones. With the new dataset for training, our method learns two attention maps to guide the brightness enhancement and denoising tasks respectively. The first attention map distinguishes underexposed regions from well lit regions, and the second attention map distinguishes noises from real textures. With their guidance, the proposed multi-branch decomposition-and-fusion enhancement network works in an input adaptive way. Moreover, a reinforcement-net further enhances color and contrast of the output image. Extensive experiments on multiple datasets demonstrate that our method can produce high fidelity enhancement results for low-light images and outperforms the current state-of-the-art methods by a large margin both quantitatively and visually.
Tasks	Denoising, Image Enhancement, Low-Light Image Enhancement
Published	2019-08-02
URL	https://arxiv.org/abs/1908.00682v3
PDF	https://arxiv.org/pdf/1908.00682v3.pdf
PWC	https://paperswithcode.com/paper/attention-guided-low-light-image-enhancement
Repo	https://github.com/Lvfeifan/MBLLEN
Framework	tf

Artifact Disentanglement Network for Unsupervised Metal Artifact Reduction


Title	Artifact Disentanglement Network for Unsupervised Metal Artifact Reduction
Authors	Haofu Liao, Wei-An Lin, Jianbo Yuan, S. Kevin Zhou, Jiebo Luo
Abstract	Current deep neural network based approaches to computed tomography (CT) metal artifact reduction (MAR) are supervised methods which rely heavily on synthesized data for training. However, as synthesized data may not perfectly simulate the underlying physical mechanisms of CT imaging, the supervised methods often generalize poorly to clinical applications. To address this problem, we propose, to the best of our knowledge, the first unsupervised learning approach to MAR. Specifically, we introduce a novel artifact disentanglement network that enables different forms of generations and regularizations between the artifact-affected and artifact-free image domains to support unsupervised learning. Extensive experiments show that our method significantly outperforms the existing unsupervised models for image-to-image translation problems, and achieves comparable performance to existing supervised models on a synthesized dataset. When applied to clinical datasets, our method achieves considerable improvements over the supervised models. The source code of this paper is publicly available at https://github.com/liaohaofu/adn.
Tasks	Computed Tomography (CT), Image-to-Image Translation, Metal Artifact Reduction
Published	2019-06-05
URL	https://arxiv.org/abs/1906.01806v5
PDF	https://arxiv.org/pdf/1906.01806v5.pdf
PWC	https://paperswithcode.com/paper/artifact-disentanglement-network-for
Repo	https://github.com/liaohaofu/adn
Framework	pytorch

Noise-Aware Unsupervised Deep Lidar-Stereo Fusion


Title	Noise-Aware Unsupervised Deep Lidar-Stereo Fusion
Authors	Xuelian Cheng, Yiran Zhong, Yuchao Dao, Pan Ji, Hongdong Li
Abstract	In this paper, we present LidarStereoNet, the first unsupervised Lidar-stereo fusion network, which can be trained in an end-to-end manner without the need of ground truth depth maps. By introducing a novel “Feedback Loop’’ to connect the network input with output, LidarStereoNet could tackle both noisy Lidar points and misalignment between sensors that have been ignored in existing Lidar-stereo fusion studies. Besides, we propose to incorporate a piecewise planar model into network learning to further constrain depths to conform to the underlying 3D geometry. Extensive quantitative and qualitative evaluations on both real and synthetic datasets demonstrate the superiority of our method, which outperforms state-of-the-art stereo matching, depth completion and Lidar-Stereo fusion approaches significantly.
Tasks	Depth Completion, Stereo Matching, Stereo Matching Hand
Published	2019-04-08
URL	http://arxiv.org/abs/1904.03868v1
PDF	http://arxiv.org/pdf/1904.03868v1.pdf
PWC	https://paperswithcode.com/paper/noise-aware-unsupervised-deep-lidar-stereo
Repo	https://github.com/AvrilCheng/LidarStereoNet
Framework	none

Can WiFi Estimate Person Pose?


Title	Can WiFi Estimate Person Pose?
Authors	Fei Wang, Stanislav Panev, Ziyi Dai, Jinsong Han, Dong Huang
Abstract	WiFi human sensing has achieved great progress in indoor localization, activity classification, etc. Retracing the development of these work, we have a natural question: can WiFi devices work like cameras for vision applications? In this paper We try to answer this question by exploring the ability of WiFi on estimating single person pose. We use a 3-antenna WiFi sender and a 3-antenna receiver to generate WiFi data. Meanwhile, we use a synchronized camera to capture person videos for corresponding keypoint annotations. We further propose a fully convolutional network (FCN), termed WiSPPN, to estimate single person pose from the collected data and annotations. Evaluation on over 80k images (16 sites and 8 persons) replies aforesaid question with a positive answer. Codes have been made publicly available at https://github.com/geekfeiw/WiSPPN.
Tasks	3D Human Pose Estimation, RF-based Pose Estimation
Published	2019-03-30
URL	http://arxiv.org/abs/1904.00277v2
PDF	http://arxiv.org/pdf/1904.00277v2.pdf
PWC	https://paperswithcode.com/paper/can-wifi-estimate-person-pose
Repo	https://github.com/geekfeiw/WiSPPN
Framework	pytorch

Low-Complexity Nonparametric Bayesian Online Prediction with Universal Guarantees


Title	Low-Complexity Nonparametric Bayesian Online Prediction with Universal Guarantees
Authors	Alix Lhéritier, Frédéric Cazals
Abstract	We propose a novel nonparametric online predictor for discrete labels conditioned on multivariate continuous features. The predictor is based on a feature space discretization induced by a full-fledged k-d tree with randomly picked directions and a recursive Bayesian distribution, which allows to automatically learn the most relevant feature scales characterizing the conditional distribution. We prove its pointwise universality, i.e., it achieves a normalized log loss performance asymptotically as good as the true conditional entropy of the labels given the features. The time complexity to process the $n$-th sample point is $O(\log n)$ in probability with respect to the distribution generating the data points, whereas other exact nonparametric methods require to process all past observations. Experiments on challenging datasets show the computational and statistical efficiency of our algorithm in comparison to standard and state-of-the-art methods.
Tasks
Published	2019-01-23
URL	https://arxiv.org/abs/1901.07662v4
PDF	https://arxiv.org/pdf/1901.07662v4.pdf
PWC	https://paperswithcode.com/paper/kd-switch-a-universal-online-predictor-with
Repo	https://github.com/alherit/kd-switch
Framework	pytorch