Paper Group AWR 200
Habitat: A Platform for Embodied AI Research. Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification. Deep Learning Recommendation Model for Personalization and Recommendation Systems. Multi-Agent Image Classification via Reinforcement Learning. What Makes A Good Story? Designing Composite Rewards for Visu …
Habitat: A Platform for Embodied AI Research
Title | Habitat: A Platform for Embodied AI Research |
Authors | Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra |
Abstract | We present Habitat, a platform for research in embodied artificial intelligence (AI). Habitat enables training embodied agents (virtual robots) in highly efficient photorealistic 3D simulation. Specifically, Habitat consists of: (i) Habitat-Sim: a flexible, high-performance 3D simulator with configurable agents, sensors, and generic 3D dataset handling. Habitat-Sim is fast – when rendering a scene from Matterport3D, it achieves several thousand frames per second (fps) running single-threaded, and can reach over 10,000 fps multi-process on a single GPU. (ii) Habitat-API: a modular high-level library for end-to-end development of embodied AI algorithms – defining tasks (e.g., navigation, instruction following, question answering), configuring, training, and benchmarking embodied agents. These large-scale engineering contributions enable us to answer scientific questions requiring experiments that were till now impracticable or ‘merely’ impractical. Specifically, in the context of point-goal navigation: (1) we revisit the comparison between learning and SLAM approaches from two recent works and find evidence for the opposite conclusion – that learning outperforms SLAM if scaled to an order of magnitude more experience than previous investigations, and (2) we conduct the first cross-dataset generalization experiments {train, test} x {Matterport3D, Gibson} for multiple sensors {blind, RGB, RGBD, D} and find that only agents with depth (D) sensors generalize across datasets. We hope that our open-source platform and these findings will advance research in embodied AI. |
Tasks | PointGoal Navigation, Question Answering, Robot Navigation |
Published | 2019-04-02 |
URL | https://arxiv.org/abs/1904.01201v2 |
https://arxiv.org/pdf/1904.01201v2.pdf | |
PWC | https://paperswithcode.com/paper/habitat-a-platform-for-embodied-ai-research |
Repo | https://github.com/facebookresearch/habitat-sim |
Framework | pytorch |
Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification
Title | Hi-CMD: Hierarchical Cross-Modality Disentanglement for Visible-Infrared Person Re-Identification |
Authors | Seokeon Choi, Sumin Lee, Youngeun Kim, Taekyung Kim, Changick Kim |
Abstract | Visible-infrared person re-identification (VI-ReID) is an important task in night-time surveillance applications, since visible cameras are difficult to capture valid appearance information under poor illumination conditions. Compared to traditional person re-identification that handles only the intra-modality discrepancy, VI-ReID suffers from additional cross-modality discrepancy caused by different types of imaging systems. To reduce both intra- and cross-modality discrepancies, we propose a Hierarchical Cross-Modality Disentanglement (Hi-CMD) method, which automatically disentangles ID-discriminative factors and ID-excluded factors from visible-thermal images. We only use ID-discriminative factors for robust cross-modality matching without ID-excluded factors such as pose or illumination. To implement our approach, we introduce an ID-preserving person image generation network and a hierarchical feature learning module. Our generation network learns the disentangled representation by generating a new cross-modality image with different poses and illuminations while preserving a person’s identity. At the same time, the feature learning module enables our model to explicitly extract the common ID-discriminative characteristic between visible-infrared images. Extensive experimental results demonstrate that our method outperforms the state-of-the-art methods on two VI-ReID datasets. The source code is available at: https://github.com/bismex/HiCMD. |
Tasks | Image Generation, Person Re-Identification |
Published | 2019-12-03 |
URL | https://arxiv.org/abs/1912.01230v3 |
https://arxiv.org/pdf/1912.01230v3.pdf | |
PWC | https://paperswithcode.com/paper/hi-cmd-hierarchical-cross-modality |
Repo | https://github.com/bismex/HiCMD |
Framework | pytorch |
Deep Learning Recommendation Model for Personalization and Recommendation Systems
Title | Deep Learning Recommendation Model for Personalization and Recommendation Systems |
Authors | Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G. Azzolini, Dmytro Dzhulgakov, Andrey Mallevich, Ilia Cherniavskii, Yinghai Lu, Raghuraman Krishnamoorthi, Ansha Yu, Volodymyr Kondratenko, Stephanie Pereira, Xianjie Chen, Wenlin Chen, Vijay Rao, Bill Jia, Liang Xiong, Misha Smelyanskiy |
Abstract | With the advent of deep learning, neural network-based recommendation models have emerged as an important tool for tackling personalization and recommendation tasks. These networks differ significantly from other deep learning networks due to their need to handle categorical features and are not well studied or understood. In this paper, we develop a state-of-the-art deep learning recommendation model (DLRM) and provide its implementation in both PyTorch and Caffe2 frameworks. In addition, we design a specialized parallelization scheme utilizing model parallelism on the embedding tables to mitigate memory constraints while exploiting data parallelism to scale-out compute from the fully-connected layers. We compare DLRM against existing recommendation models and characterize its performance on the Big Basin AI platform, demonstrating its usefulness as a benchmark for future algorithmic experimentation and system co-design. |
Tasks | Recommendation Systems |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1906.00091v1 |
https://arxiv.org/pdf/1906.00091v1.pdf | |
PWC | https://paperswithcode.com/paper/190600091 |
Repo | https://github.com/facebookresearch/dlrm |
Framework | pytorch |
Multi-Agent Image Classification via Reinforcement Learning
Title | Multi-Agent Image Classification via Reinforcement Learning |
Authors | Hossein K. Mousavi, Mohammadreza Nazari, Martin Takáč, Nader Motee |
Abstract | We investigate a classification problem using multiple mobile agents capable of collecting (partial) pose-dependent observations of an unknown environment. The objective is to classify an image over a finite time horizon. We propose a network architecture on how agents should form a local belief, take local actions, and extract relevant features from their raw partial observations. Agents are allowed to exchange information with their neighboring agents to update their own beliefs. It is shown how reinforcement learning techniques can be utilized to achieve decentralized implementation of the classification problem by running a decentralized consensus protocol. Our experimental results on the MNIST handwritten digit dataset demonstrates the effectiveness of our proposed framework. |
Tasks | Image Classification |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.04835v2 |
https://arxiv.org/pdf/1905.04835v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-agent-image-classification-via |
Repo | https://github.com/Ipsedo/MARLClassification |
Framework | pytorch |
What Makes A Good Story? Designing Composite Rewards for Visual Storytelling
Title | What Makes A Good Story? Designing Composite Rewards for Visual Storytelling |
Authors | Junjie Hu, Yu Cheng, Zhe Gan, Jingjing Liu, Jianfeng Gao, Graham Neubig |
Abstract | Previous storytelling approaches mostly focused on optimizing traditional metrics such as BLEU, ROUGE and CIDEr. In this paper, we re-examine this problem from a different angle, by looking deep into what defines a realistically-natural and topically-coherent story. To this end, we propose three assessment criteria: relevance, coherence and expressiveness, which we observe through empirical analysis could constitute a “high-quality” story to the human eye. Following this quality guideline, we propose a reinforcement learning framework, ReCo-RL, with reward functions designed to capture the essence of these quality criteria. Experiments on the Visual Storytelling Dataset (VIST) with both automatic and human evaluations demonstrate that our ReCo-RL model achieves better performance than state-of-the-art baselines on both traditional metrics and the proposed new criteria. |
Tasks | Visual Storytelling |
Published | 2019-09-11 |
URL | https://arxiv.org/abs/1909.05316v2 |
https://arxiv.org/pdf/1909.05316v2.pdf | |
PWC | https://paperswithcode.com/paper/what-makes-a-good-story-designing-composite |
Repo | https://github.com/JunjieHu/ReCo-RL |
Framework | pytorch |
Putting visual object recognition in context
Title | Putting visual object recognition in context |
Authors | Mengmi Zhang, Claire Tseng, Gabriel Kreiman |
Abstract | Context plays an important role in visual recognition. Recent studies have shown that visual recognition networks can be fooled by placing objects in inconsistent contexts (e.g., a cow in the ocean). To model the role of contextual information in visual recognition, we systematically investigated ten critical properties of where, when, and how context modulates recognition, including the amount of context, context and object resolution, geometrical structure of context, context congruence, and temporal dynamics of contextual modulation. The tasks involved recognizing a target object surrounded with context in a natural image. As an essential benchmark, we conducted a series of psychophysics experiments where we altered one aspect of context at a time, and quantified recognition accuracy. We propose a biologically-inspired context-aware object recognition model consisting of a two-stream architecture. The model processes visual information at the fovea and periphery in parallel, dynamically incorporates object and contextual information, and sequentially reasons about the class label for the target object. Across a wide range of behavioral tasks, the model approximates human level performance without retraining for each task, captures the dependence of context enhancement on image properties, and provides initial steps towards integrating scene and object information for visual recognition. All source code and data are publicly available: https://github.com/kreimanlab/Put-In-Context. |
Tasks | Object Recognition |
Published | 2019-11-17 |
URL | https://arxiv.org/abs/1911.07349v3 |
https://arxiv.org/pdf/1911.07349v3.pdf | |
PWC | https://paperswithcode.com/paper/putting-visual-object-recognition-in-context |
Repo | https://github.com/kreimanlab/Put-In-Context |
Framework | pytorch |
Deep Learning for Classification and Severity Estimation of Coffee Leaf Biotic Stress
Title | Deep Learning for Classification and Severity Estimation of Coffee Leaf Biotic Stress |
Authors | J. G. M. Esgario, R. A. Krohling, J. A. Ventura |
Abstract | Biotic stress consists of damage to plants through other living organisms. Efficient control of biotic agents such as pests and pathogens (viruses, fungi, bacteria, etc.) is closely related to the concept of agricultural sustainability. Agricultural sustainability promotes the development of new technologies that allow the reduction of environmental impacts, greater accessibility to farmers and, consequently, increase on productivity. The use of computer vision with deep learning methods allows the early and correct identification of the stress-causing agent. So, corrective measures can be applied as soon as possible to mitigate the problem. The objective of this work is to design an effective and practical system capable of identifying and estimating the stress severity caused by biotic agents on coffee leaves. The proposed approach consists of a multi-task system based on convolutional neural networks. In addition, we have explored the use of data augmentation techniques to make the system more robust and accurate. The experimental results obtained for classification as well as for severity estimation indicate that the proposed system might be a suitable tool to assist both experts and farmers in the identification and quantification of biotic stresses in coffee plantations. |
Tasks | Data Augmentation |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11561v1 |
https://arxiv.org/pdf/1907.11561v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-classification-and-severity |
Repo | https://github.com/esgario/lara2018 |
Framework | pytorch |
Learning One-Shot Imitation from Humans without Humans
Title | Learning One-Shot Imitation from Humans without Humans |
Authors | Alessandro Bonardi, Stephen James, Andrew J. Davison |
Abstract | Humans can naturally learn to execute a new task by seeing it performed by other individuals once, and then reproduce it in a variety of configurations. Endowing robots with this ability of imitating humans from third person is a very immediate and natural way of teaching new tasks. Only recently, through meta-learning, there have been successful attempts to one-shot imitation learning from humans; however, these approaches require a lot of human resources to collect the data in the real world to train the robot. But is there a way to remove the need for real world human demonstrations during training? We show that with Task-Embedded Control Networks, we can infer control polices by embedding human demonstrations that can condition a control policy and achieve one-shot imitation learning. Importantly, we do not use a real human arm to supply demonstrations during training, but instead leverage domain randomisation in an application that has not been seen before: sim-to-real transfer on humans. Upon evaluating our approach on pushing and placing tasks in both simulation and in the real world, we show that in comparison to a system that was trained on real-world data we are able to achieve similar results by utilising only simulation data. |
Tasks | Imitation Learning, Meta-Learning |
Published | 2019-11-04 |
URL | https://arxiv.org/abs/1911.01103v1 |
https://arxiv.org/pdf/1911.01103v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-one-shot-imitation-from-humans |
Repo | https://github.com/stepjam/PyRep |
Framework | none |
Learning Mixtures of Plackett-Luce Models from Structured Partial Orders
Title | Learning Mixtures of Plackett-Luce Models from Structured Partial Orders |
Authors | Zhibing Zhao, Lirong Xia |
Abstract | Mixtures of ranking models have been widely used for heterogeneous preferences. However, learning a mixture model is highly nontrivial, especially when the dataset consists of partial orders. In such cases, the parameter of the model may not be even identifiable. In this paper, we focus on three popular structures of partial orders: ranked top-$l_1$, $l_2$-way, and choice data over a subset of alternatives. We prove that when the dataset consists of combinations of ranked top-$l_1$ and $l_2$-way (or choice data over up to $l_2$ alternatives), mixture of $k$ Plackett-Luce models is not identifiable when $l_1+l_2\le 2k-1$ ($l_2$ is set to $1$ when there are no $l_2$-way orders). We also prove that under some combinations, including ranked top-$3$, ranked top-$2$ plus $2$-way, and choice data over up to $4$ alternatives, mixtures of two Plackett-Luce models are identifiable. Guided by our theoretical results, we propose efficient generalized method of moments (GMM) algorithms to learn mixtures of two Plackett-Luce models, which are proven consistent. Our experiments demonstrate the efficacy of our algorithms. Moreover, we show that when full rankings are available, learning from different marginal events (partial orders) provides tradeoffs between statistical efficiency and computational efficiency. |
Tasks | |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.11721v1 |
https://arxiv.org/pdf/1910.11721v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-mixtures-of-plackett-luce-models-1 |
Repo | https://github.com/zhaozb08/MixPL-SPO |
Framework | none |
Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings
Title | Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings |
Authors | Gregor Wiedemann, Steffen Remus, Avi Chawla, Chris Biemann |
Abstract | Contextualized word embeddings (CWE) such as provided by ELMo (Peters et al., 2018), Flair NLP (Akbik et al., 2018), or BERT (Devlin et al., 2019) are a major recent innovation in NLP. CWEs provide semantic vector representations of words depending on their respective context. Their advantage over static word embeddings has been shown for a number of tasks, such as text classification, sequence tagging, or machine translation. Since vectors of the same word type can vary depending on the respective context, they implicitly provide a model for word sense disambiguation (WSD). We introduce a simple but effective approach to WSD using a nearest neighbor classification on CWEs. We compare the performance of different CWE models for the task and can report improvements above the current state of the art for two standard WSD benchmark datasets. We further show that the pre-trained BERT model is able to place polysemic words into distinct ‘sense’ regions of the embedding space, while ELMo and Flair NLP do not seem to possess this ability. |
Tasks | Word Sense Disambiguation |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.10430v2 |
https://arxiv.org/pdf/1909.10430v2.pdf | |
PWC | https://paperswithcode.com/paper/190910430 |
Repo | https://github.com/uhh-lt/bert-sense |
Framework | pytorch |
Attention Guided Low-light Image Enhancement with a Large Scale Low-light Simulation Dataset
Title | Attention Guided Low-light Image Enhancement with a Large Scale Low-light Simulation Dataset |
Authors | Feifan Lv, Yu Li, Feng Lu |
Abstract | Low-light image enhancement is challenging in that it needs to consider not only brightness recovery but also complex issues like color distortion and noise, which usually hide in the dark. Simply adjusting the brightness of a low-light image will inevitably amplify those artifacts. To address this difficult problem, this paper proposes a novel end-to-end attention-guided method based on multi-branch convolutional neural network. To this end, we first construct a synthetic dataset with carefully designed low-light simulation strategies. The dataset is much larger and more diverse than existing ones. With the new dataset for training, our method learns two attention maps to guide the brightness enhancement and denoising tasks respectively. The first attention map distinguishes underexposed regions from well lit regions, and the second attention map distinguishes noises from real textures. With their guidance, the proposed multi-branch decomposition-and-fusion enhancement network works in an input adaptive way. Moreover, a reinforcement-net further enhances color and contrast of the output image. Extensive experiments on multiple datasets demonstrate that our method can produce high fidelity enhancement results for low-light images and outperforms the current state-of-the-art methods by a large margin both quantitatively and visually. |
Tasks | Denoising, Image Enhancement, Low-Light Image Enhancement |
Published | 2019-08-02 |
URL | https://arxiv.org/abs/1908.00682v3 |
https://arxiv.org/pdf/1908.00682v3.pdf | |
PWC | https://paperswithcode.com/paper/attention-guided-low-light-image-enhancement |
Repo | https://github.com/Lvfeifan/MBLLEN |
Framework | tf |
Artifact Disentanglement Network for Unsupervised Metal Artifact Reduction
Title | Artifact Disentanglement Network for Unsupervised Metal Artifact Reduction |
Authors | Haofu Liao, Wei-An Lin, Jianbo Yuan, S. Kevin Zhou, Jiebo Luo |
Abstract | Current deep neural network based approaches to computed tomography (CT) metal artifact reduction (MAR) are supervised methods which rely heavily on synthesized data for training. However, as synthesized data may not perfectly simulate the underlying physical mechanisms of CT imaging, the supervised methods often generalize poorly to clinical applications. To address this problem, we propose, to the best of our knowledge, the first unsupervised learning approach to MAR. Specifically, we introduce a novel artifact disentanglement network that enables different forms of generations and regularizations between the artifact-affected and artifact-free image domains to support unsupervised learning. Extensive experiments show that our method significantly outperforms the existing unsupervised models for image-to-image translation problems, and achieves comparable performance to existing supervised models on a synthesized dataset. When applied to clinical datasets, our method achieves considerable improvements over the supervised models. The source code of this paper is publicly available at https://github.com/liaohaofu/adn. |
Tasks | Computed Tomography (CT), Image-to-Image Translation, Metal Artifact Reduction |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.01806v5 |
https://arxiv.org/pdf/1906.01806v5.pdf | |
PWC | https://paperswithcode.com/paper/artifact-disentanglement-network-for |
Repo | https://github.com/liaohaofu/adn |
Framework | pytorch |
Noise-Aware Unsupervised Deep Lidar-Stereo Fusion
Title | Noise-Aware Unsupervised Deep Lidar-Stereo Fusion |
Authors | Xuelian Cheng, Yiran Zhong, Yuchao Dao, Pan Ji, Hongdong Li |
Abstract | In this paper, we present LidarStereoNet, the first unsupervised Lidar-stereo fusion network, which can be trained in an end-to-end manner without the need of ground truth depth maps. By introducing a novel “Feedback Loop’’ to connect the network input with output, LidarStereoNet could tackle both noisy Lidar points and misalignment between sensors that have been ignored in existing Lidar-stereo fusion studies. Besides, we propose to incorporate a piecewise planar model into network learning to further constrain depths to conform to the underlying 3D geometry. Extensive quantitative and qualitative evaluations on both real and synthetic datasets demonstrate the superiority of our method, which outperforms state-of-the-art stereo matching, depth completion and Lidar-Stereo fusion approaches significantly. |
Tasks | Depth Completion, Stereo Matching, Stereo Matching Hand |
Published | 2019-04-08 |
URL | http://arxiv.org/abs/1904.03868v1 |
http://arxiv.org/pdf/1904.03868v1.pdf | |
PWC | https://paperswithcode.com/paper/noise-aware-unsupervised-deep-lidar-stereo |
Repo | https://github.com/AvrilCheng/LidarStereoNet |
Framework | none |
Can WiFi Estimate Person Pose?
Title | Can WiFi Estimate Person Pose? |
Authors | Fei Wang, Stanislav Panev, Ziyi Dai, Jinsong Han, Dong Huang |
Abstract | WiFi human sensing has achieved great progress in indoor localization, activity classification, etc. Retracing the development of these work, we have a natural question: can WiFi devices work like cameras for vision applications? In this paper We try to answer this question by exploring the ability of WiFi on estimating single person pose. We use a 3-antenna WiFi sender and a 3-antenna receiver to generate WiFi data. Meanwhile, we use a synchronized camera to capture person videos for corresponding keypoint annotations. We further propose a fully convolutional network (FCN), termed WiSPPN, to estimate single person pose from the collected data and annotations. Evaluation on over 80k images (16 sites and 8 persons) replies aforesaid question with a positive answer. Codes have been made publicly available at https://github.com/geekfeiw/WiSPPN. |
Tasks | 3D Human Pose Estimation, RF-based Pose Estimation |
Published | 2019-03-30 |
URL | http://arxiv.org/abs/1904.00277v2 |
http://arxiv.org/pdf/1904.00277v2.pdf | |
PWC | https://paperswithcode.com/paper/can-wifi-estimate-person-pose |
Repo | https://github.com/geekfeiw/WiSPPN |
Framework | pytorch |
Low-Complexity Nonparametric Bayesian Online Prediction with Universal Guarantees
Title | Low-Complexity Nonparametric Bayesian Online Prediction with Universal Guarantees |
Authors | Alix Lhéritier, Frédéric Cazals |
Abstract | We propose a novel nonparametric online predictor for discrete labels conditioned on multivariate continuous features. The predictor is based on a feature space discretization induced by a full-fledged k-d tree with randomly picked directions and a recursive Bayesian distribution, which allows to automatically learn the most relevant feature scales characterizing the conditional distribution. We prove its pointwise universality, i.e., it achieves a normalized log loss performance asymptotically as good as the true conditional entropy of the labels given the features. The time complexity to process the $n$-th sample point is $O(\log n)$ in probability with respect to the distribution generating the data points, whereas other exact nonparametric methods require to process all past observations. Experiments on challenging datasets show the computational and statistical efficiency of our algorithm in comparison to standard and state-of-the-art methods. |
Tasks | |
Published | 2019-01-23 |
URL | https://arxiv.org/abs/1901.07662v4 |
https://arxiv.org/pdf/1901.07662v4.pdf | |
PWC | https://paperswithcode.com/paper/kd-switch-a-universal-online-predictor-with |
Repo | https://github.com/alherit/kd-switch |
Framework | pytorch |