February 1, 2020

3130 words 15 mins read

Paper Group AWR 151

USIP: Unsupervised Stable Interest Point Detection from 3D Point Clouds. Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions. Graph-based Knowledge Distillation by Multi-head Attention Network. Harnessing Evolution of Multi-Turn Conversations for Effective Answer Retrieval. IFR-Net: Iterative Feature Refinement …

USIP: Unsupervised Stable Interest Point Detection from 3D Point Clouds


Title	USIP: Unsupervised Stable Interest Point Detection from 3D Point Clouds
Authors	Jiaxin Li, Gim Hee Lee
Abstract	In this paper, we propose the USIP detector: an Unsupervised Stable Interest Point detector that can detect highly repeatable and accurately localized keypoints from 3D point clouds under arbitrary transformations without the need for any ground truth training data. Our USIP detector consists of a feature proposal network that learns stable keypoints from input 3D point clouds and their respective transformed pairs from randomly generated transformations. We provide degeneracy analysis of our USIP detector and suggest solutions to prevent it. We encourage high repeatability and accurate localization of the keypoints with a probabilistic chamfer loss that minimizes the distances between the detected keypoints from the training point cloud pairs. Extensive experimental results of repeatability tests on several simulated and real-world 3D point cloud datasets from Lidar, RGB-D and CAD models show that our USIP detector significantly outperforms existing hand-crafted and deep learning-based 3D keypoint detectors. Our code is available at the project website. https://github.com/lijx10/USIP
Tasks	Interest Point Detection
Published	2019-03-30
URL	http://arxiv.org/abs/1904.00229v1
PDF	http://arxiv.org/pdf/1904.00229v1.pdf
PWC	https://paperswithcode.com/paper/usip-unsupervised-stable-interest-point
Repo	https://github.com/lijx10/USIP
Framework	pytorch

Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions


Title	Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions
Authors	Zhenyi Wang, Ping Yu, Yang Zhao, Ruiyi Zhang, Yufan Zhou, Junsong Yuan, Changyou Chen
Abstract	Human-motion generation is a long-standing challenging task due to the requirement of accurately modeling complex and diverse dynamic patterns. Most existing methods adopt sequence models such as RNN to directly model transitions in the original action space. Due to high dimensionality and potential noise, such modeling of action transitions is particularly challenging. In this paper, we focus on skeleton-based action generation and propose to model smooth and diverse transitions on a latent space of action sequences with much lower dimensionality. Conditioned on a latent sequence, actions are generated by a frame-wise decoder shared by all latent action-poses. Specifically, an implicit RNN is defined to model smooth latent sequences, whose randomness (diversity) is controlled by noise from the input. Different from standard action-prediction methods, our model can generate action sequences from pure noise without any conditional action poses. Remarkably, it can also generate unseen actions from mixed classes during training. Our model is learned with a bi-directional generative-adversarial-net framework, which not only can generate diverse action sequences of a particular class or mix classes, but also learns to classify action sequences within the same model. Experimental results show the superiority of our method in both diverse action-sequence generation and classification, relative to existing methods.
Tasks
Published	2019-12-21
URL	https://arxiv.org/abs/1912.10150v1
PDF	https://arxiv.org/pdf/1912.10150v1.pdf
PWC	https://paperswithcode.com/paper/learning-diverse-stochastic-human-action
Repo	https://github.com/zheshiyige/Learning-Diverse-Stochastic-Human-Action-Generators-by-Learning-Smooth-Latent-Transitions
Framework	tf

Graph-based Knowledge Distillation by Multi-head Attention Network


Title	Graph-based Knowledge Distillation by Multi-head Attention Network
Authors	Seunghyun Lee, Byung Cheol Song
Abstract	Knowledge distillation (KD) is a technique to derive optimal performance from a small student network (SN) by distilling knowledge of a large teacher network (TN) and transferring the distilled knowledge to the small SN. Since a role of convolutional neural network (CNN) in KD is to embed a dataset so as to perform a given task well, it is very important to acquire knowledge that considers intra-data relations. Conventional KD methods have concentrated on distilling knowledge in data units. To our knowledge, any KD methods for distilling information in dataset units have not yet been proposed. Therefore, this paper proposes a novel method that enables distillation of dataset-based knowledge from the TN using an attention network. The knowledge of the embedding procedure of the TN is distilled to graph by multi-head attention (MHA), and multi-task learning is performed to give relational inductive bias to the SN. The MHA can provide clear information about the source dataset, which can greatly improves the performance of the SN. Experimental results show that the proposed method is 7.05% higher than the SN alone for CIFAR100, which is 2.46% higher than the state-of-the-art.
Tasks	Multi-Task Learning, Transfer Learning
Published	2019-07-04
URL	https://arxiv.org/abs/1907.02226v2
PDF	https://arxiv.org/pdf/1907.02226v2.pdf
PWC	https://paperswithcode.com/paper/graph-based-knowledge-distillation-by-multi
Repo	https://github.com/sseung0703/Knowledge_distillation_via_TF2.0
Framework	tf

Harnessing Evolution of Multi-Turn Conversations for Effective Answer Retrieval


Title	Harnessing Evolution of Multi-Turn Conversations for Effective Answer Retrieval
Authors	Mohammad Aliannejadi, Manajit Chakraborty, Esteban Andrés Ríssola, Fabio Crestani
Abstract	With the improvements in speech recognition and voice generation technologies over the last years, a lot of companies have sought to develop conversation understanding systems that run on mobile phones or smart home devices through natural language interfaces. Conversational assistants, such as Google Assistant and Microsoft Cortana, can help users to complete various types of tasks. This requires an accurate understanding of the user’s information need as the conversation evolves into multiple turns. Finding relevant context in a conversation’s history is challenging because of the complexity of natural language and the evolution of a user’s information need. In this work, we present an extensive analysis of language, relevance, dependency of user utterances in a multi-turn information-seeking conversation. To this aim, we have annotated relevant utterances in the conversations released by the TREC CaST 2019 track. The annotation labels determine which of the previous utterances in a conversation can be used to improve the current one. Furthermore, we propose a neural utterance relevance model based on BERT fine-tuning, outperforming competitive baselines. We study and compare the performance of multiple retrieval models, utilizing different strategies to incorporate the user’s context. The experimental results on both classification and retrieval tasks show that our proposed approach can effectively identify and incorporate the conversation context. We show that processing the current utterance using the predicted relevant utterance leads to a 38% relative improvement in terms of nDCG@20. Finally, to foster research in this area, we have released the dataset of the annotations.
Tasks	Speech Recognition
Published	2019-12-22
URL	https://arxiv.org/abs/1912.10554v2
PDF	https://arxiv.org/pdf/1912.10554v2.pdf
PWC	https://paperswithcode.com/paper/harnessing-evolution-of-multi-turn
Repo	https://github.com/aliannejadi/castur
Framework	none


Title	IFR-Net: Iterative Feature Refinement Network for Compressed Sensing MRI
Authors	Yiling Liu, Qiegen Liu, Minghui Zhang, Qingxin Yang, Shanshan Wang, Dong Liang
Abstract	To improve the compressive sensing MRI (CS-MRI) approaches in terms of fine structure loss under high acceleration factors, we have proposed an iterative feature refinement model (IFR-CS), equipped with fixed transforms, to restore the meaningful structures and details. Nevertheless, the proposed IFR-CS still has some limitations, such as the selection of hyper-parameters, a lengthy reconstruction time, and the fixed sparsifying transform. To alleviate these issues, we unroll the iterative feature refinement procedures in IFR-CS to a supervised model-driven network, dubbed IFR-Net. Equipped with training data pairs, both regularization parameter and the utmost feature refinement operator in IFR-CS become trainable. Additionally, inspired by the powerful representation capability of convolutional neural network (CNN), CNN-based inversion blocks are explored in the sparsity-promoting denoising module to generalize the sparsity-enforcing operator. Extensive experiments on both simulated and in vivo MR datasets have shown that the proposed network possesses a strong capability to capture image details and preserve well the structural information with fast reconstruction speed.
Tasks	Compressive Sensing, Denoising
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10856v2
PDF	https://arxiv.org/pdf/1909.10856v2.pdf
PWC	https://paperswithcode.com/paper/ifr-net-iterative-feature-refinement-network
Repo	https://github.com/yqx7150/IFR-Net-Code
Framework	none

Answering Complex Open-domain Questions Through Iterative Query Generation


Title	Answering Complex Open-domain Questions Through Iterative Query Generation
Authors	Peng Qi, Xiaowen Lin, Leo Mehr, Zijian Wang, Christopher D. Manning
Abstract	It is challenging for current one-step retrieve-and-read question answering (QA) systems to answer questions like “Which novel by the author of ‘Armada’ will be adapted as a feature film by Steven Spielberg?” because the question seldom contains retrievable clues about the missing entity (here, the author). Answering such a question requires multi-hop reasoning where one must gather information about the missing entity (or facts) to proceed with further reasoning. We present GoldEn (Gold Entity) Retriever, which iterates between reading context and retrieving more supporting documents to answer open-domain multi-hop questions. Instead of using opaque and computationally expensive neural retrieval models, GoldEn Retriever generates natural language search queries given the question and available context, and leverages off-the-shelf information retrieval systems to query for missing entities. This allows GoldEn Retriever to scale up efficiently for open-domain multi-hop reasoning while maintaining interpretability. We evaluate GoldEn Retriever on the recently proposed open-domain multi-hop QA dataset, HotpotQA, and demonstrate that it outperforms the best previously published model despite not using pretrained language models such as BERT.
Tasks	Information Retrieval, Question Answering
Published	2019-10-15
URL	https://arxiv.org/abs/1910.07000v1
PDF	https://arxiv.org/pdf/1910.07000v1.pdf
PWC	https://paperswithcode.com/paper/answering-complex-open-domain-questions
Repo	https://github.com/qipeng/golden-retriever
Framework	none

GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations


Title	GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations
Authors	Martin Engelcke, Adam R. Kosiorek, Oiwi Parker Jones, Ingmar Posner
Abstract	Generative latent-variable models are emerging as promising tools in robotics and reinforcement learning. Yet, even though tasks in these domains typically involve distinct objects, most state-of-the-art generative models do not explicitly capture the compositional nature of visual scenes. Two recent exceptions, MONet and IODINE, decompose scenes into objects in an unsupervised fashion. Their underlying generative processes, however, do not account for component interactions. Hence, neither of them allows for principled sampling of novel scenes. Here we present GENESIS, the first object-centric generative model of 3D visual scenes capable of both decomposing and generating scenes by capturing relationships between scene components. GENESIS parameterises a spatial GMM over images which is decoded from a set of object-centric latent variables that are either inferred sequentially in an amortised fashion or sampled from an autoregressive prior. We train GENESIS on several publicly available datasets and evaluate its performance on scene generation, decomposition, and semi-supervised learning.
Tasks	Latent Variable Models, Scene Generation
Published	2019-07-30
URL	https://arxiv.org/abs/1907.13052v3
PDF	https://arxiv.org/pdf/1907.13052v3.pdf
PWC	https://paperswithcode.com/paper/genesis-generative-scene-inference-and
Repo	https://github.com/applied-ai-lab/genesis
Framework	pytorch

Meta-Learning to Communicate: Fast End-to-End Training for Fading Channels


Title	Meta-Learning to Communicate: Fast End-to-End Training for Fading Channels
Authors	Sangwoo Park, Osvaldo Simeone, Joonhyuk Kang
Abstract	When a channel model is available, learning how to communicate on fading noisy channels can be formulated as the (unsupervised) training of an autoencoder consisting of the cascade of encoder, channel, and decoder. An important limitation of the approach is that training should be generally carried out from scratch for each new channel. To cope with this problem, prior works considered joint training over multiple channels with the aim of finding a single pair of encoder and decoder that works well on a class of channels. As a result, joint training ideally mimics the operation of non-coherent transmission schemes. In this paper, we propose to obviate the limitations of joint training via meta-learning: Rather than training a common model for all channels, meta-learning finds a common initialization vector that enables fast training on any channel. The approach is validated via numerical results, demonstrating significant training speed-ups, with effective encoders and decoders obtained with as little as one iteration of Stochastic Gradient Descent.
Tasks	Meta-Learning
Published	2019-10-22
URL	https://arxiv.org/abs/1910.09945v1
PDF	https://arxiv.org/pdf/1910.09945v1.pdf
PWC	https://paperswithcode.com/paper/meta-learning-to-communicate-fast-end-to-end
Repo	https://github.com/kclip/meta-autoencoder
Framework	pytorch

ArcticNet: A Deep Learning Solution to Classify Arctic Wetlands


Title	ArcticNet: A Deep Learning Solution to Classify Arctic Wetlands
Authors	Ziyu Jiang, Kate Von Ness, Julie Loisel, Zhangyang Wang
Abstract	Arctic environments are rapidly changing under the warming climate. Of particular interest are wetlands, a type of ecosystem that constitutes the most effective terrestrial long-term carbon store. As permafrost thaws, the carbon that was locked in these wetland soils for millennia becomes available for aerobic and anaerobic decomposition, which releases CO2 and CH4, respectively, back to the atmosphere.As CO2 and CH4 are potent greenhouse gases, this transfer of carbon from the land to the atmosphere further contributes to global warming, thereby increasing the rate of permafrost degradation in a positive feedback loop. Therefore, monitoring Arctic wetland health and dynamics is a key scientific task that is also of importance for policy. However, the identification and delineation of these important wetland ecosystems, remain incomplete and often inaccurate. Mapping the extent of Arctic wetlands remains a challenge for the scientific community. Conventional, coarser remote sensing methods are inadequate at distinguishing the diverse and micro-topographically complex non-vascular vegetation that characterize Arctic wetlands, presenting the need for better identification methods. To tackle this challenging problem, we constructed and annotated the first-of-its-kind Arctic Wetland Dataset (AWD). Based on that, we present ArcticNet, a deep neural network that exploits the multi-spectral, high-resolution imagery captured from nanosatellites (Planet Dove CubeSats) with additional DEM from the ArcticDEM project, to semantically label a Arctic study area into six types, in which three Arctic wetland functional types are included. We present multi-fold efforts to handle the arising challenges, including class imbalance, and the choice of fusion strategies. Preliminary results endorse the high promise of ArcticNet, achieving 93.12% in labelling a hold-out set of regions in our Arctic study area.
Tasks
Published	2019-06-01
URL	https://arxiv.org/abs/1906.00133v1
PDF	https://arxiv.org/pdf/1906.00133v1.pdf
PWC	https://paperswithcode.com/paper/190600133
Repo	https://github.com/geekJZY/arcticnet
Framework	pytorch

End-to-end Learning, with or without Labels


Title	End-to-end Learning, with or without Labels
Authors	Corinne Jones, Vincent Roulet, Zaid Harchaoui
Abstract	We present an approach for end-to-end learning that allows one to jointly learn a feature representation from unlabeled data (with or without labeled data) and predict labels for unlabeled data. The feature representation is assumed to be specified in a differentiable programming framework, that is, as a parameterized mapping amenable to automatic differentiation. The proposed approach can be used with any amount of labeled and unlabeled data, gracefully adjusting to the amount of supervision. We provide experimental results illustrating the effectiveness of the approach.
Tasks
Published	2019-12-30
URL	https://arxiv.org/abs/1912.12979v1
PDF	https://arxiv.org/pdf/1912.12979v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-learning-with-or-without-labels
Repo	https://github.com/cjones6/xsdc
Framework	pytorch

Linked Crunchbase: A Linked Data API and RDF Data Set About Innovative Companies


Title	Linked Crunchbase: A Linked Data API and RDF Data Set About Innovative Companies
Authors	Michael Färber
Abstract	Crunchbase is an online platform collecting information about startups and technology companies, including attributes and relations of companies, people, and investments. Data contained in Crunchbase is, to a large extent, not available elsewhere, making Crunchbase to a unique data source. In this paper, we present how to bring Crunchbase to the Web of Data so that its data can be used in the machine-readable RDF format by anyone on the Web. First, we give insights into how we developed and hosted a Linked Data API for Crunchbase and how sameAs links to other data sources are integrated. Then, we present our method for crawling RDF data based on this API to build a custom Crunchbase RDF knowledge graph. We created an RDF data set with over 347 million triples, including 781k people, 659k organizations, and 343k investments. Our Crunchbase Linked Data API is available online at http://linked-crunchbase.org.
Tasks
Published	2019-07-19
URL	https://arxiv.org/abs/1907.08671v1
PDF	https://arxiv.org/pdf/1907.08671v1.pdf
PWC	https://paperswithcode.com/paper/linked-crunchbase-a-linked-data-api-and-rdf
Repo	https://github.com/michaelfaerber/linked-crunchbase
Framework	none

Onion-Peel Networks for Deep Video Completion


Title	Onion-Peel Networks for Deep Video Completion
Authors	Seoung Wug Oh, Sungho Lee, Joon-Young Lee, Seon Joo Kim
Abstract	We propose the onion-peel networks for video completion. Given a set of reference images and a target image with holes, our network fills the hole by referring the contents in the reference images. Our onion-peel network progressively fills the hole from the hole boundary enabling it to exploit richer contextual information for the missing regions every step. Given a sufficient number of recurrences, even a large hole can be inpainted successfully. To attend to the missing information visible in the reference images, we propose an asymmetric attention block that computes similarities between the hole boundary pixels in the target and the non-hole pixels in the references in a non-local manner. With our attention block, our network can have an unlimited spatial-temporal window size and fill the holes with globally coherent contents. In addition, our framework is applicable to the image completion guided by the reference images without any modification, which is difficult to do with the previous methods. We validate that our method produces visually pleasing image and video inpainting results in realistic test cases.
Tasks	Video Inpainting
Published	2019-08-23
URL	https://arxiv.org/abs/1908.08718v1
PDF	https://arxiv.org/pdf/1908.08718v1.pdf
PWC	https://paperswithcode.com/paper/onion-peel-networks-for-deep-video-completion
Repo	https://github.com/seoungwugoh/opn-demo
Framework	pytorch

ASER: A Large-scale Eventuality Knowledge Graph


Title	ASER: A Large-scale Eventuality Knowledge Graph
Authors	Hongming Zhang, Xin Liu, Haojie Pan, Yangqiu Song, Cane Wing-Ki Leung
Abstract	Understanding human’s language requires complex world knowledge. However, existing large-scale knowledge graphs mainly focus on knowledge about entities while ignoring knowledge about activities, states, or events, which are used to describe how entities or things act in the real world. To fill this gap, we develop ASER (activities, states, events, and their relations), a large-scale eventuality knowledge graph extracted from more than 11-billion-token unstructured textual data. ASER contains 15 relation types belonging to five categories, 194-million unique eventualities, and 64-million unique edges among them. Both intrinsic and extrinsic evaluations demonstrate the quality and effectiveness of ASER.
Tasks	Knowledge Graphs
Published	2019-05-01
URL	https://arxiv.org/abs/1905.00270v3
PDF	https://arxiv.org/pdf/1905.00270v3.pdf
PWC	https://paperswithcode.com/paper/aser-a-large-scale-eventuality-knowledge
Repo	https://github.com/HKUST-KnowComp/ASER
Framework	pytorch

Multi-scale Attributed Node Embedding


Title	Multi-scale Attributed Node Embedding
Authors	Benedek Rozemberczki, Carl Allen, Rik Sarkar
Abstract	We present network embedding algorithms that capture information about a node from the local distribution over node attributes around it, as observed over random walks following an approach similar to Skip-gram. Observations from neighborhoods of different sizes are either pooled (AE) or encoded distinctly in a multi-scale approach (MUSAE). Capturing attribute-neighborhood relationships over multiple scales is useful for a diverse range of applications, including latent feature identification across disconnected networks with similar attributes. We prove theoretically that matrices of node-feature pointwise mutual information are implicitly factorized by the embeddings. Experiments show that our algorithms are robust, computationally efficient and outperform comparable models on social networks and web graphs.
Tasks	Network Embedding
Published	2019-09-28
URL	https://arxiv.org/abs/1909.13021v2
PDF	https://arxiv.org/pdf/1909.13021v2.pdf
PWC	https://paperswithcode.com/paper/multi-scale-attributed-node-embedding
Repo	https://github.com/benedekrozemberczki/karateclub
Framework	none


Title	Vision-and-Dialog Navigation
Authors	Jesse Thomason, Michael Murray, Maya Cakmak, Luke Zettlemoyer
Abstract	Robots navigating in human environments should use language to ask for assistance and be able to understand human responses. To study this challenge, we introduce Cooperative Vision-and-Dialog Navigation, a dataset of over 2k embodied, human-human dialogs situated in simulated, photorealistic home environments. The Navigator asks questions to their partner, the Oracle, who has privileged access to the best next steps the Navigator should take according to a shortest path planner. To train agents that search an environment for a goal location, we define the Navigation from Dialog History task. An agent, given a target object and a dialog history between humans cooperating to find that object, must infer navigation actions towards the goal in unexplored environments. We establish an initial, multi-modal sequence-to-sequence model and demonstrate that looking farther back in the dialog history improves performance. Sourcecode and a live interface demo can be found at https://cvdn.dev/
Tasks
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04957v3
PDF	https://arxiv.org/pdf/1907.04957v3.pdf
PWC	https://paperswithcode.com/paper/vision-and-dialog-navigation
Repo	https://github.com/mmurray/cvdn
Framework	none