January 29, 2020

3040 words 15 mins read

Paper Group ANR 595

PointPoseNet: Accurate Object Detection and 6 DoF Pose Estimation in Point Clouds. Composition of Sentence Embeddings:Lessons from Statistical Relational Learning. Taxonomy and Evaluation of Structured Compression of Convolutional Neural Networks. LiveSketch: Query Perturbations for Guided Sketch-based Visual Search. Improved Robustness and Safety …

PointPoseNet: Accurate Object Detection and 6 DoF Pose Estimation in Point Clouds


Title	PointPoseNet: Accurate Object Detection and 6 DoF Pose Estimation in Point Clouds
Authors	Frederik Hagelskjær, Anders Glent Buch
Abstract	We present a learning-based method for 6 DoF pose estimation of rigid objects in point cloud data. Many recent learning-based approaches use primarily RGB information for detecting objects, in some cases with an added refinement step using depth data. Our method consumes unordered point sets with/without RGB information, from initial detection to the final transformation estimation stage. This allows us to achieve accurate pose estimates, in some cases surpassing state of the art methods trained on the same data.
Tasks	Object Detection, Pose Estimation
Published	2019-12-19
URL	https://arxiv.org/abs/1912.09057v1
PDF	https://arxiv.org/pdf/1912.09057v1.pdf
PWC	https://paperswithcode.com/paper/pointposenet-accurate-object-detection-and-6
Repo
Framework

Composition of Sentence Embeddings:Lessons from Statistical Relational Learning


Title	Composition of Sentence Embeddings:Lessons from Statistical Relational Learning
Authors	Damien Sileo, Tim Van-De-Cruys, Camille Pradel, Philippe Muller
Abstract	Various NLP problems – such as the prediction of sentence similarity, entailment, and discourse relations – are all instances of the same general task: the modeling of semantic relations between a pair of textual elements. A popular model for such problems is to embed sentences into fixed size vectors, and use composition functions (e.g. concatenation or sum) of those vectors as features for the prediction. At the same time, composition of embeddings has been a main focus within the field of Statistical Relational Learning (SRL) whose goal is to predict relations between entities (typically from knowledge base triples). In this article, we show that previous work on relation prediction between texts implicitly uses compositions from baseline SRL models. We show that such compositions are not expressive enough for several tasks (e.g. natural language inference). We build on recent SRL models to address textual relational problems, showing that they are more expressive, and can alleviate issues from simpler compositions. The resulting models significantly improve the state of the art in both transferable sentence representation learning and relation prediction.
Tasks	Natural Language Inference, Relational Reasoning, Representation Learning
Published	2019-04-04
URL	http://arxiv.org/abs/1904.02464v1
PDF	http://arxiv.org/pdf/1904.02464v1.pdf
PWC	https://paperswithcode.com/paper/composition-of-sentence-embeddingslessons
Repo
Framework

Taxonomy and Evaluation of Structured Compression of Convolutional Neural Networks


Title	Taxonomy and Evaluation of Structured Compression of Convolutional Neural Networks
Authors	Andrey Kuzmin, Markus Nagel, Saurabh Pitre, Sandeep Pendyam, Tijmen Blankevoort, Max Welling
Abstract	The success of deep neural networks in many real-world applications is leading to new challenges in building more efficient architectures. One effective way of making networks more efficient is neural network compression. We provide an overview of existing neural network compression methods that can be used to make neural networks more efficient by changing the architecture of the network. First, we introduce a new way to categorize all published compression methods, based on the amount of data and compute needed to make the methods work in practice. These are three ‘levels of compression solutions’. Second, we provide a taxonomy of tensor factorization based and probabilistic compression methods. Finally, we perform an extensive evaluation of different compression techniques from the literature for models trained on ImageNet. We show that SVD and probabilistic compression or pruning methods are complementary and give the best results of all the considered methods. We also provide practical ways to combine them.
Tasks	Neural Network Compression
Published	2019-12-20
URL	https://arxiv.org/abs/1912.09802v1
PDF	https://arxiv.org/pdf/1912.09802v1.pdf
PWC	https://paperswithcode.com/paper/taxonomy-and-evaluation-of-structured
Repo
Framework

LiveSketch: Query Perturbations for Guided Sketch-based Visual Search


Title	LiveSketch: Query Perturbations for Guided Sketch-based Visual Search
Authors	John Collomosse, Tu Bui, Hailin Jin
Abstract	LiveSketch is a novel algorithm for searching large image collections using hand-sketched queries. LiveSketch tackles the inherent ambiguity of sketch search by creating visual suggestions that augment the query as it is drawn, making query specification an iterative rather than one-shot process that helps disambiguate users’ search intent. Our technical contributions are: a triplet convnet architecture that incorporates an RNN based variational autoencoder to search for images using vector (stroke-based) queries; real-time clustering to identify likely search intents (and so, targets within the search embedding); and the use of backpropagation from those targets to perturb the input stroke sequence, so suggesting alterations to the query in order to guide the search. We show improvements in accuracy and time-to-task over contemporary baselines using a 67M image corpus.
Tasks
Published	2019-04-14
URL	http://arxiv.org/abs/1904.06611v1
PDF	http://arxiv.org/pdf/1904.06611v1.pdf
PWC	https://paperswithcode.com/paper/livesketch-query-perturbations-for-guided
Repo
Framework

Improved Robustness and Safety for Autonomous Vehicle Control with Adversarial Reinforcement Learning


Title	Improved Robustness and Safety for Autonomous Vehicle Control with Adversarial Reinforcement Learning
Authors	Xiaobai Ma, Katherine Driggs-Campbell, Mykel J. Kochenderfer
Abstract	To improve efficiency and reduce failures in autonomous vehicles, research has focused on developing robust and safe learning methods that take into account disturbances in the environment. Existing literature in robust reinforcement learning poses the learning problem as a two player game between the autonomous system and disturbances. This paper examines two different algorithms to solve the game, Robust Adversarial Reinforcement Learning and Neural Fictitious Self Play, and compares performance on an autonomous driving scenario. We extend the game formulation to a semi-competitive setting and demonstrate that the resulting adversary better captures meaningful disturbances that lead to better overall performance. The resulting robust policy exhibits improved driving efficiency while effectively reducing collision rates compared to baseline control policies produced by traditional reinforcement learning methods.
Tasks	Autonomous Driving, Autonomous Vehicles
Published	2019-03-08
URL	http://arxiv.org/abs/1903.03642v1
PDF	http://arxiv.org/pdf/1903.03642v1.pdf
PWC	https://paperswithcode.com/paper/improved-robustness-and-safety-for-autonomous
Repo
Framework

Joint Embedding of 3D Scan and CAD Objects


Title	Joint Embedding of 3D Scan and CAD Objects
Authors	Manuel Dahnert, Angela Dai, Leonidas Guibas, Matthias Nießner
Abstract	3D scan geometry and CAD models often contain complementary information towards understanding environments, which could be leveraged through establishing a mapping between the two domains. However, this is a challenging task due to strong, lower-level differences between scan and CAD geometry. We propose a novel approach to learn a joint embedding space between scan and CAD geometry, where semantically similar objects from both domains lie close together. To achieve this, we introduce a new 3D CNN-based approach to learn a joint embedding space representing object similarities across these domains. To learn a shared space where scan objects and CAD models can interlace, we propose a stacked hourglass approach to separate foreground and background from a scan object, and transform it to a complete, CAD-like representation to produce a shared embedding space. This embedding space can then be used for CAD model retrieval; to further enable this task, we introduce a new dataset of ranked scan-CAD similarity annotations, enabling new, fine-grained evaluation of CAD model retrieval to cluttered, noisy, partial scans. Our learned joint embedding outperforms current state of the art for CAD model retrieval by 12% in instance retrieval accuracy.
Tasks
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06989v1
PDF	https://arxiv.org/pdf/1908.06989v1.pdf
PWC	https://paperswithcode.com/paper/joint-embedding-of-3d-scan-and-cad-objects
Repo
Framework

NAIRS: A Neural Attentive Interpretable Recommendation System


Title	NAIRS: A Neural Attentive Interpretable Recommendation System
Authors	Shuai Yu, Yongbo Wang, Min Yang, Baocheng Li, Qiang Qu, Jialie Shen
Abstract	In this paper, we develop a neural attentive interpretable recommendation system, named NAIRS. A self-attention network, as a key component of the system, is designed to assign attention weights to interacted items of a user. This attention mechanism can distinguish the importance of the various interacted items in contributing to a user profile. Based on the user profiles obtained by the self-attention network, NAIRS offers personalized high-quality recommendation. Moreover, it develops visual cues to interpret recommendations. This demo application with the implementation of NAIRS enables users to interact with a recommendation system, and it persistently collects training data to improve the system. The demonstration and experimental results show the effectiveness of NAIRS.
Tasks
Published	2019-02-20
URL	http://arxiv.org/abs/1902.07494v1
PDF	http://arxiv.org/pdf/1902.07494v1.pdf
PWC	https://paperswithcode.com/paper/nairs-a-neural-attentive-interpretable
Repo
Framework

Query Auto Completion for Math Formula Search


Title	Query Auto Completion for Math Formula Search
Authors	Shaurya Rohatgi, Wei Zhong, Richard Zanibbi, Jian Wu, C. Lee Giles
Abstract	Query Auto Completion (QAC) is among the most appealing features of a web search engine. It helps users formulate queries quickly with less effort. Although there has been much effort in this area for text, to the best of our knowledge there is few work on mathematical formula auto completion. In this paper, we implement 5 existing QAC methods on mathematical formula and evaluate them on the NTCIR-12 MathIR task dataset. We report the efficiency of retrieved results using Mean Reciprocal Rank (MRR) and Mean Average Precision(MAP). Our study indicates that the Finite State Transducer outperforms other QAC models with a MRR score of $0.642$.
Tasks
Published	2019-12-09
URL	https://arxiv.org/abs/1912.04115v1
PDF	https://arxiv.org/pdf/1912.04115v1.pdf
PWC	https://paperswithcode.com/paper/query-auto-completion-for-math-formula-search
Repo
Framework

Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling


Title	Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling
Authors	Hao Zhang, Bo Chen, Long Tian, Zhengjue Wang, Mingyuan Zhou
Abstract	For bidirectional joint image-text modeling, we develop variational hetero-encoder (VHE) randomized generative adversarial network (GAN), a versatile deep generative model that integrates a probabilistic text decoder, probabilistic image encoder, and GAN into a coherent end-to-end multi-modality learning framework. VHE randomized GAN (VHE-GAN) encodes an image to decode its associated text, and feeds the variational posterior as the source of randomness into the GAN image generator. We plug three off-the-shelf modules, including a deep topic model, a ladder-structured image encoder, and StackGAN++, into VHE-GAN, which already achieves competitive performance. This further motivates the development of VHE-raster-scan-GAN that generates photo-realistic images in not only a multi-scale low-to-high-resolution manner, but also a hierarchical-semantic coarse-to-fine fashion. By capturing and relating hierarchical semantic and visual concepts with end-to-end training, VHE-raster-scan-GAN achieves state-of-the-art performance in a wide variety of image-text multi-modality learning and generation tasks.
Tasks
Published	2019-05-18
URL	https://arxiv.org/abs/1905.08622v3
PDF	https://arxiv.org/pdf/1905.08622v3.pdf
PWC	https://paperswithcode.com/paper/variational-hetero-encoder-randomized
Repo
Framework

EventKG - the Hub of Event Knowledge on the Web - and Biographical Timeline Generation


Title	EventKG - the Hub of Event Knowledge on the Web - and Biographical Timeline Generation
Authors	Simon Gottschalk, Elena Demidova
Abstract	One of the key requirements to facilitate the semantic analytics of information regarding contemporary and historical events on the Web, in the news and in social media is the availability of reference knowledge repositories containing comprehensive representations of events, entities and temporal relations. Existing knowledge graphs, with popular examples including DBpedia, YAGO and Wikidata, focus mostly on entity-centric information and are insufficient in terms of their coverage and completeness with respect to events and temporal relations. In this article we address this limitation, formalise the concept of a temporal knowledge graph and present its instantiation - EventKG. EventKG is a multilingual event-centric temporal knowledge graph that incorporates over 690 thousand events and over 2.3 million temporal relations obtained from several large-scale knowledge graphs and semi-structured sources and makes them available through a canonical RDF representation. Whereas popular entities often possess hundreds of relations within a temporal knowledge graph such as EventKG, generating a concise overview of the most important temporal relations for a given entity is a challenging task. In this article we demonstrate an application of EventKG to biographical timeline generation, where we adopt a distant supervision method to identify relations most relevant for an entity biography. Our evaluation results provide insights on the characteristics of EventKG and demonstrate the effectiveness of the proposed biographical timeline generation method.
Tasks	Knowledge Graphs
Published	2019-05-21
URL	https://arxiv.org/abs/1905.08794v1
PDF	https://arxiv.org/pdf/1905.08794v1.pdf
PWC	https://paperswithcode.com/paper/eventkg-the-hub-of-event-knowledge-on-the-web
Repo
Framework

Local Orthogonal Decomposition for Maximum Inner Product Search


Title	Local Orthogonal Decomposition for Maximum Inner Product Search
Authors	Xiang Wu, Ruiqi Guo, Sanjiv Kumar, David Simcha
Abstract	Inverted file and asymmetric distance computation (IVFADC) have been successfully applied to approximate nearest neighbor search and subsequently maximum inner product search. In such a framework, vector quantization is used for coarse partitioning while product quantization is used for quantizing residuals. In the original IVFADC as well as all of its variants, after residuals are computed, the second production quantization step is completely independent of the first vector quantization step. In this work, we seek to exploit the connection between these two steps when we perform non-exhaustive search. More specifically, we decompose a residual vector locally into two orthogonal components and perform uniform quantization and multiscale quantization to each component respectively. The proposed method, called local orthogonal decomposition, combined with multiscale quantization consistently achieves higher recall than previous methods under the same bitrates. We conduct comprehensive experiments on large scale datasets as well as detailed ablation tests, demonstrating effectiveness of our method.
Tasks	Quantization
Published	2019-03-25
URL	http://arxiv.org/abs/1903.10391v1
PDF	http://arxiv.org/pdf/1903.10391v1.pdf
PWC	https://paperswithcode.com/paper/local-orthogonal-decomposition-for-maximum
Repo
Framework

HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities


Title	HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities
Authors	Devanshu Arya, Stevan Rudinac, Marcel Worring
Abstract	Multimodal datasets contain an enormous amount of relational information, which grows exponentially with the introduction of new modalities. Learning representations in such a scenario is inherently complex due to the presence of multiple heterogeneous information channels. These channels can encode both (a) inter-relations between the items of different modalities and (b) intra-relations between the items of the same modality. Encoding multimedia items into a continuous low-dimensional semantic space such that both types of relations are captured and preserved is extremely challenging, especially if the goal is a unified end-to-end learning framework. The two key challenges that need to be addressed are: 1) the framework must be able to merge complex intra and inter relations without losing any valuable information and 2) the learning model should be invariant to the addition of new and potentially very different modalities. In this paper, we propose a flexible framework which can scale to data streams from many modalities. To that end we introduce a hypergraph-based model for data representation and deploy Graph Convolutional Networks to fuse relational information within and across modalities. Our approach provides an efficient solution for distributing otherwise extremely computationally expensive or even unfeasible training processes across multiple-GPUs, without any sacrifices in accuracy. Moreover, adding new modalities to our model requires only an additional GPU unit keeping the computational time unchanged, which brings representation learning to truly multimodal datasets. We demonstrate the feasibility of our approach in the experiments on multimedia datasets featuring second, third and fourth order relations.
Tasks	Representation Learning
Published	2019-09-19
URL	https://arxiv.org/abs/1909.09252v1
PDF	https://arxiv.org/pdf/1909.09252v1.pdf
PWC	https://paperswithcode.com/paper/hyperlearn-a-distributed-approach-for
Repo
Framework


Title	Cyber-All-Intel: An AI for Security related Threat Intelligence
Authors	Sudip Mittal, Anupam Joshi, Tim Finin
Abstract	Keeping up with threat intelligence is a must for a security analyst today. There is a volume of information present in `the wild' that affects an organization. We need to develop an artificial intelligence system that scours the intelligence sources, to keep the analyst updated about various threats that pose a risk to her organization. A security analyst who is better` tapped in’ can be more effective. In this paper we present, Cyber-All-Intel an artificial intelligence system to aid a security analyst. It is a system for knowledge extraction, representation and analytics in an end-to-end pipeline grounded in the cybersecurity informatics domain. It uses multiple knowledge representations like, vector spaces and knowledge graphs in a ‘VKG structure’ to store incoming intelligence. The system also uses neural network models to pro-actively improve its knowledge. We have also created a query engine and an alert system that can be used by an analyst to find actionable cybersecurity insights.
Tasks	Knowledge Graphs
Published	2019-05-07
URL	https://arxiv.org/abs/1905.02895v1
PDF	https://arxiv.org/pdf/1905.02895v1.pdf
PWC	https://paperswithcode.com/paper/cyber-all-intel-an-ai-for-security-related
Repo
Framework

DeepPCO: End-to-End Point Cloud Odometry through Deep Parallel Neural Network


Title	DeepPCO: End-to-End Point Cloud Odometry through Deep Parallel Neural Network
Authors	Wei Wang, Muhamad Risqi U. Saputra, Peijun Zhao, Pedro Gusmao, Bo Yang, Changhao Chen, Andrew Markham, Niki Trigoni
Abstract	Odometry is of key importance for localization in the absence of a map. There is considerable work in the area of visual odometry (VO), and recent advances in deep learning have brought novel approaches to VO, which directly learn salient features from raw images. These learning-based approaches have led to more accurate and robust VO systems. However, they have not been well applied to point cloud data yet. In this work, we investigate how to exploit deep learning to estimate point cloud odometry (PCO), which may serve as a critical component in point cloud-based downstream tasks or learning-based systems. Specifically, we propose a novel end-to-end deep parallel neural network called DeepPCO, which can estimate the 6-DOF poses using consecutive point clouds. It consists of two parallel sub-networks to estimate 3-D translation and orientation respectively rather than a single neural network. We validate our approach on KITTI Visual Odometry/SLAM benchmark dataset with different baselines. Experiments demonstrate that the proposed approach achieves good performance in terms of pose accuracy.
Tasks	Visual Odometry
Published	2019-10-13
URL	https://arxiv.org/abs/1910.11088v2
PDF	https://arxiv.org/pdf/1910.11088v2.pdf
PWC	https://paperswithcode.com/paper/deeppco-end-to-end-point-cloud-odometry
Repo
Framework

Using Photorealistic Face Synthesis and Domain Adaptation to Improve Facial Expression Analysis


Title	Using Photorealistic Face Synthesis and Domain Adaptation to Improve Facial Expression Analysis
Authors	Behzad Bozorgtabar, Mohammad Saeed Rad, Hazim Kemal Ekenel, Jean-Philippe Thiran
Abstract	Cross-domain synthesizing realistic faces to learn deep models has attracted increasing attention for facial expression analysis as it helps to improve the performance of expression recognition accuracy despite having small number of real training images. However, learning from synthetic face images can be problematic due to the distribution discrepancy between low-quality synthetic images and real face images and may not achieve the desired performance when the learned model applies to real world scenarios. To this end, we propose a new attribute guided face image synthesis to perform a translation between multiple image domains using a single model. In addition, we adopt the proposed model to learn from synthetic faces by matching the feature distributions between different domains while preserving each domain’s characteristics. We evaluate the effectiveness of the proposed approach on several face datasets on generating realistic face images. We demonstrate that the expression recognition performance can be enhanced by benefiting from our face synthesis model. Moreover, we also conduct experiments on a near-infrared dataset containing facial expression videos of drivers to assess the performance using in-the-wild data for driver emotion recognition.
Tasks	Domain Adaptation, Emotion Recognition, Face Generation, Image Generation
Published	2019-05-17
URL	https://arxiv.org/abs/1905.08090v1
PDF	https://arxiv.org/pdf/1905.08090v1.pdf
PWC	https://paperswithcode.com/paper/using-photorealistic-face-synthesis-and
Repo
Framework