Paper Group ANR 595
PointPoseNet: Accurate Object Detection and 6 DoF Pose Estimation in Point Clouds. Composition of Sentence Embeddings:Lessons from Statistical Relational Learning. Taxonomy and Evaluation of Structured Compression of Convolutional Neural Networks. LiveSketch: Query Perturbations for Guided Sketch-based Visual Search. Improved Robustness and Safety …
PointPoseNet: Accurate Object Detection and 6 DoF Pose Estimation in Point Clouds
Title | PointPoseNet: Accurate Object Detection and 6 DoF Pose Estimation in Point Clouds |
Authors | Frederik Hagelskjær, Anders Glent Buch |
Abstract | We present a learning-based method for 6 DoF pose estimation of rigid objects in point cloud data. Many recent learning-based approaches use primarily RGB information for detecting objects, in some cases with an added refinement step using depth data. Our method consumes unordered point sets with/without RGB information, from initial detection to the final transformation estimation stage. This allows us to achieve accurate pose estimates, in some cases surpassing state of the art methods trained on the same data. |
Tasks | Object Detection, Pose Estimation |
Published | 2019-12-19 |
URL | https://arxiv.org/abs/1912.09057v1 |
https://arxiv.org/pdf/1912.09057v1.pdf | |
PWC | https://paperswithcode.com/paper/pointposenet-accurate-object-detection-and-6 |
Repo | |
Framework | |
Composition of Sentence Embeddings:Lessons from Statistical Relational Learning
Title | Composition of Sentence Embeddings:Lessons from Statistical Relational Learning |
Authors | Damien Sileo, Tim Van-De-Cruys, Camille Pradel, Philippe Muller |
Abstract | Various NLP problems – such as the prediction of sentence similarity, entailment, and discourse relations – are all instances of the same general task: the modeling of semantic relations between a pair of textual elements. A popular model for such problems is to embed sentences into fixed size vectors, and use composition functions (e.g. concatenation or sum) of those vectors as features for the prediction. At the same time, composition of embeddings has been a main focus within the field of Statistical Relational Learning (SRL) whose goal is to predict relations between entities (typically from knowledge base triples). In this article, we show that previous work on relation prediction between texts implicitly uses compositions from baseline SRL models. We show that such compositions are not expressive enough for several tasks (e.g. natural language inference). We build on recent SRL models to address textual relational problems, showing that they are more expressive, and can alleviate issues from simpler compositions. The resulting models significantly improve the state of the art in both transferable sentence representation learning and relation prediction. |
Tasks | Natural Language Inference, Relational Reasoning, Representation Learning |
Published | 2019-04-04 |
URL | http://arxiv.org/abs/1904.02464v1 |
http://arxiv.org/pdf/1904.02464v1.pdf | |
PWC | https://paperswithcode.com/paper/composition-of-sentence-embeddingslessons |
Repo | |
Framework | |
Taxonomy and Evaluation of Structured Compression of Convolutional Neural Networks
Title | Taxonomy and Evaluation of Structured Compression of Convolutional Neural Networks |
Authors | Andrey Kuzmin, Markus Nagel, Saurabh Pitre, Sandeep Pendyam, Tijmen Blankevoort, Max Welling |
Abstract | The success of deep neural networks in many real-world applications is leading to new challenges in building more efficient architectures. One effective way of making networks more efficient is neural network compression. We provide an overview of existing neural network compression methods that can be used to make neural networks more efficient by changing the architecture of the network. First, we introduce a new way to categorize all published compression methods, based on the amount of data and compute needed to make the methods work in practice. These are three ‘levels of compression solutions’. Second, we provide a taxonomy of tensor factorization based and probabilistic compression methods. Finally, we perform an extensive evaluation of different compression techniques from the literature for models trained on ImageNet. We show that SVD and probabilistic compression or pruning methods are complementary and give the best results of all the considered methods. We also provide practical ways to combine them. |
Tasks | Neural Network Compression |
Published | 2019-12-20 |
URL | https://arxiv.org/abs/1912.09802v1 |
https://arxiv.org/pdf/1912.09802v1.pdf | |
PWC | https://paperswithcode.com/paper/taxonomy-and-evaluation-of-structured |
Repo | |
Framework | |
LiveSketch: Query Perturbations for Guided Sketch-based Visual Search
Title | LiveSketch: Query Perturbations for Guided Sketch-based Visual Search |
Authors | John Collomosse, Tu Bui, Hailin Jin |
Abstract | LiveSketch is a novel algorithm for searching large image collections using hand-sketched queries. LiveSketch tackles the inherent ambiguity of sketch search by creating visual suggestions that augment the query as it is drawn, making query specification an iterative rather than one-shot process that helps disambiguate users’ search intent. Our technical contributions are: a triplet convnet architecture that incorporates an RNN based variational autoencoder to search for images using vector (stroke-based) queries; real-time clustering to identify likely search intents (and so, targets within the search embedding); and the use of backpropagation from those targets to perturb the input stroke sequence, so suggesting alterations to the query in order to guide the search. We show improvements in accuracy and time-to-task over contemporary baselines using a 67M image corpus. |
Tasks | |
Published | 2019-04-14 |
URL | http://arxiv.org/abs/1904.06611v1 |
http://arxiv.org/pdf/1904.06611v1.pdf | |
PWC | https://paperswithcode.com/paper/livesketch-query-perturbations-for-guided |
Repo | |
Framework | |
Improved Robustness and Safety for Autonomous Vehicle Control with Adversarial Reinforcement Learning
Title | Improved Robustness and Safety for Autonomous Vehicle Control with Adversarial Reinforcement Learning |
Authors | Xiaobai Ma, Katherine Driggs-Campbell, Mykel J. Kochenderfer |
Abstract | To improve efficiency and reduce failures in autonomous vehicles, research has focused on developing robust and safe learning methods that take into account disturbances in the environment. Existing literature in robust reinforcement learning poses the learning problem as a two player game between the autonomous system and disturbances. This paper examines two different algorithms to solve the game, Robust Adversarial Reinforcement Learning and Neural Fictitious Self Play, and compares performance on an autonomous driving scenario. We extend the game formulation to a semi-competitive setting and demonstrate that the resulting adversary better captures meaningful disturbances that lead to better overall performance. The resulting robust policy exhibits improved driving efficiency while effectively reducing collision rates compared to baseline control policies produced by traditional reinforcement learning methods. |
Tasks | Autonomous Driving, Autonomous Vehicles |
Published | 2019-03-08 |
URL | http://arxiv.org/abs/1903.03642v1 |
http://arxiv.org/pdf/1903.03642v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-robustness-and-safety-for-autonomous |
Repo | |
Framework | |
Joint Embedding of 3D Scan and CAD Objects
Title | Joint Embedding of 3D Scan and CAD Objects |
Authors | Manuel Dahnert, Angela Dai, Leonidas Guibas, Matthias Nießner |
Abstract | 3D scan geometry and CAD models often contain complementary information towards understanding environments, which could be leveraged through establishing a mapping between the two domains. However, this is a challenging task due to strong, lower-level differences between scan and CAD geometry. We propose a novel approach to learn a joint embedding space between scan and CAD geometry, where semantically similar objects from both domains lie close together. To achieve this, we introduce a new 3D CNN-based approach to learn a joint embedding space representing object similarities across these domains. To learn a shared space where scan objects and CAD models can interlace, we propose a stacked hourglass approach to separate foreground and background from a scan object, and transform it to a complete, CAD-like representation to produce a shared embedding space. This embedding space can then be used for CAD model retrieval; to further enable this task, we introduce a new dataset of ranked scan-CAD similarity annotations, enabling new, fine-grained evaluation of CAD model retrieval to cluttered, noisy, partial scans. Our learned joint embedding outperforms current state of the art for CAD model retrieval by 12% in instance retrieval accuracy. |
Tasks | |
Published | 2019-08-19 |
URL | https://arxiv.org/abs/1908.06989v1 |
https://arxiv.org/pdf/1908.06989v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-embedding-of-3d-scan-and-cad-objects |
Repo | |
Framework | |
NAIRS: A Neural Attentive Interpretable Recommendation System
Title | NAIRS: A Neural Attentive Interpretable Recommendation System |
Authors | Shuai Yu, Yongbo Wang, Min Yang, Baocheng Li, Qiang Qu, Jialie Shen |
Abstract | In this paper, we develop a neural attentive interpretable recommendation system, named NAIRS. A self-attention network, as a key component of the system, is designed to assign attention weights to interacted items of a user. This attention mechanism can distinguish the importance of the various interacted items in contributing to a user profile. Based on the user profiles obtained by the self-attention network, NAIRS offers personalized high-quality recommendation. Moreover, it develops visual cues to interpret recommendations. This demo application with the implementation of NAIRS enables users to interact with a recommendation system, and it persistently collects training data to improve the system. The demonstration and experimental results show the effectiveness of NAIRS. |
Tasks | |
Published | 2019-02-20 |
URL | http://arxiv.org/abs/1902.07494v1 |
http://arxiv.org/pdf/1902.07494v1.pdf | |
PWC | https://paperswithcode.com/paper/nairs-a-neural-attentive-interpretable |
Repo | |
Framework | |
Query Auto Completion for Math Formula Search
Title | Query Auto Completion for Math Formula Search |
Authors | Shaurya Rohatgi, Wei Zhong, Richard Zanibbi, Jian Wu, C. Lee Giles |
Abstract | Query Auto Completion (QAC) is among the most appealing features of a web search engine. It helps users formulate queries quickly with less effort. Although there has been much effort in this area for text, to the best of our knowledge there is few work on mathematical formula auto completion. In this paper, we implement 5 existing QAC methods on mathematical formula and evaluate them on the NTCIR-12 MathIR task dataset. We report the efficiency of retrieved results using Mean Reciprocal Rank (MRR) and Mean Average Precision(MAP). Our study indicates that the Finite State Transducer outperforms other QAC models with a MRR score of $0.642$. |
Tasks | |
Published | 2019-12-09 |
URL | https://arxiv.org/abs/1912.04115v1 |
https://arxiv.org/pdf/1912.04115v1.pdf | |
PWC | https://paperswithcode.com/paper/query-auto-completion-for-math-formula-search |
Repo | |
Framework | |
Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling
Title | Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling |
Authors | Hao Zhang, Bo Chen, Long Tian, Zhengjue Wang, Mingyuan Zhou |
Abstract | For bidirectional joint image-text modeling, we develop variational hetero-encoder (VHE) randomized generative adversarial network (GAN), a versatile deep generative model that integrates a probabilistic text decoder, probabilistic image encoder, and GAN into a coherent end-to-end multi-modality learning framework. VHE randomized GAN (VHE-GAN) encodes an image to decode its associated text, and feeds the variational posterior as the source of randomness into the GAN image generator. We plug three off-the-shelf modules, including a deep topic model, a ladder-structured image encoder, and StackGAN++, into VHE-GAN, which already achieves competitive performance. This further motivates the development of VHE-raster-scan-GAN that generates photo-realistic images in not only a multi-scale low-to-high-resolution manner, but also a hierarchical-semantic coarse-to-fine fashion. By capturing and relating hierarchical semantic and visual concepts with end-to-end training, VHE-raster-scan-GAN achieves state-of-the-art performance in a wide variety of image-text multi-modality learning and generation tasks. |
Tasks | |
Published | 2019-05-18 |
URL | https://arxiv.org/abs/1905.08622v3 |
https://arxiv.org/pdf/1905.08622v3.pdf | |
PWC | https://paperswithcode.com/paper/variational-hetero-encoder-randomized |
Repo | |
Framework | |
EventKG - the Hub of Event Knowledge on the Web - and Biographical Timeline Generation
Title | EventKG - the Hub of Event Knowledge on the Web - and Biographical Timeline Generation |
Authors | Simon Gottschalk, Elena Demidova |
Abstract | One of the key requirements to facilitate the semantic analytics of information regarding contemporary and historical events on the Web, in the news and in social media is the availability of reference knowledge repositories containing comprehensive representations of events, entities and temporal relations. Existing knowledge graphs, with popular examples including DBpedia, YAGO and Wikidata, focus mostly on entity-centric information and are insufficient in terms of their coverage and completeness with respect to events and temporal relations. In this article we address this limitation, formalise the concept of a temporal knowledge graph and present its instantiation - EventKG. EventKG is a multilingual event-centric temporal knowledge graph that incorporates over 690 thousand events and over 2.3 million temporal relations obtained from several large-scale knowledge graphs and semi-structured sources and makes them available through a canonical RDF representation. Whereas popular entities often possess hundreds of relations within a temporal knowledge graph such as EventKG, generating a concise overview of the most important temporal relations for a given entity is a challenging task. In this article we demonstrate an application of EventKG to biographical timeline generation, where we adopt a distant supervision method to identify relations most relevant for an entity biography. Our evaluation results provide insights on the characteristics of EventKG and demonstrate the effectiveness of the proposed biographical timeline generation method. |
Tasks | Knowledge Graphs |
Published | 2019-05-21 |
URL | https://arxiv.org/abs/1905.08794v1 |
https://arxiv.org/pdf/1905.08794v1.pdf | |
PWC | https://paperswithcode.com/paper/eventkg-the-hub-of-event-knowledge-on-the-web |
Repo | |
Framework | |
Local Orthogonal Decomposition for Maximum Inner Product Search
Title | Local Orthogonal Decomposition for Maximum Inner Product Search |
Authors | Xiang Wu, Ruiqi Guo, Sanjiv Kumar, David Simcha |
Abstract | Inverted file and asymmetric distance computation (IVFADC) have been successfully applied to approximate nearest neighbor search and subsequently maximum inner product search. In such a framework, vector quantization is used for coarse partitioning while product quantization is used for quantizing residuals. In the original IVFADC as well as all of its variants, after residuals are computed, the second production quantization step is completely independent of the first vector quantization step. In this work, we seek to exploit the connection between these two steps when we perform non-exhaustive search. More specifically, we decompose a residual vector locally into two orthogonal components and perform uniform quantization and multiscale quantization to each component respectively. The proposed method, called local orthogonal decomposition, combined with multiscale quantization consistently achieves higher recall than previous methods under the same bitrates. We conduct comprehensive experiments on large scale datasets as well as detailed ablation tests, demonstrating effectiveness of our method. |
Tasks | Quantization |
Published | 2019-03-25 |
URL | http://arxiv.org/abs/1903.10391v1 |
http://arxiv.org/pdf/1903.10391v1.pdf | |
PWC | https://paperswithcode.com/paper/local-orthogonal-decomposition-for-maximum |
Repo | |
Framework | |
HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities
Title | HyperLearn: A Distributed Approach for Representation Learning in Datasets With Many Modalities |
Authors | Devanshu Arya, Stevan Rudinac, Marcel Worring |
Abstract | Multimodal datasets contain an enormous amount of relational information, which grows exponentially with the introduction of new modalities. Learning representations in such a scenario is inherently complex due to the presence of multiple heterogeneous information channels. These channels can encode both (a) inter-relations between the items of different modalities and (b) intra-relations between the items of the same modality. Encoding multimedia items into a continuous low-dimensional semantic space such that both types of relations are captured and preserved is extremely challenging, especially if the goal is a unified end-to-end learning framework. The two key challenges that need to be addressed are: 1) the framework must be able to merge complex intra and inter relations without losing any valuable information and 2) the learning model should be invariant to the addition of new and potentially very different modalities. In this paper, we propose a flexible framework which can scale to data streams from many modalities. To that end we introduce a hypergraph-based model for data representation and deploy Graph Convolutional Networks to fuse relational information within and across modalities. Our approach provides an efficient solution for distributing otherwise extremely computationally expensive or even unfeasible training processes across multiple-GPUs, without any sacrifices in accuracy. Moreover, adding new modalities to our model requires only an additional GPU unit keeping the computational time unchanged, which brings representation learning to truly multimodal datasets. We demonstrate the feasibility of our approach in the experiments on multimedia datasets featuring second, third and fourth order relations. |
Tasks | Representation Learning |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.09252v1 |
https://arxiv.org/pdf/1909.09252v1.pdf | |
PWC | https://paperswithcode.com/paper/hyperlearn-a-distributed-approach-for |
Repo | |
Framework | |
Cyber-All-Intel: An AI for Security related Threat Intelligence
Title | Cyber-All-Intel: An AI for Security related Threat Intelligence |
Authors | Sudip Mittal, Anupam Joshi, Tim Finin |
Abstract | Keeping up with threat intelligence is a must for a security analyst today. There is a volume of information present in the wild' that affects an organization. We need to develop an artificial intelligence system that scours the intelligence sources, to keep the analyst updated about various threats that pose a risk to her organization. A security analyst who is better tapped in’ can be more effective. In this paper we present, Cyber-All-Intel an artificial intelligence system to aid a security analyst. It is a system for knowledge extraction, representation and analytics in an end-to-end pipeline grounded in the cybersecurity informatics domain. It uses multiple knowledge representations like, vector spaces and knowledge graphs in a ‘VKG structure’ to store incoming intelligence. The system also uses neural network models to pro-actively improve its knowledge. We have also created a query engine and an alert system that can be used by an analyst to find actionable cybersecurity insights. |
Tasks | Knowledge Graphs |
Published | 2019-05-07 |
URL | https://arxiv.org/abs/1905.02895v1 |
https://arxiv.org/pdf/1905.02895v1.pdf | |
PWC | https://paperswithcode.com/paper/cyber-all-intel-an-ai-for-security-related |
Repo | |
Framework | |
DeepPCO: End-to-End Point Cloud Odometry through Deep Parallel Neural Network
Title | DeepPCO: End-to-End Point Cloud Odometry through Deep Parallel Neural Network |
Authors | Wei Wang, Muhamad Risqi U. Saputra, Peijun Zhao, Pedro Gusmao, Bo Yang, Changhao Chen, Andrew Markham, Niki Trigoni |
Abstract | Odometry is of key importance for localization in the absence of a map. There is considerable work in the area of visual odometry (VO), and recent advances in deep learning have brought novel approaches to VO, which directly learn salient features from raw images. These learning-based approaches have led to more accurate and robust VO systems. However, they have not been well applied to point cloud data yet. In this work, we investigate how to exploit deep learning to estimate point cloud odometry (PCO), which may serve as a critical component in point cloud-based downstream tasks or learning-based systems. Specifically, we propose a novel end-to-end deep parallel neural network called DeepPCO, which can estimate the 6-DOF poses using consecutive point clouds. It consists of two parallel sub-networks to estimate 3-D translation and orientation respectively rather than a single neural network. We validate our approach on KITTI Visual Odometry/SLAM benchmark dataset with different baselines. Experiments demonstrate that the proposed approach achieves good performance in terms of pose accuracy. |
Tasks | Visual Odometry |
Published | 2019-10-13 |
URL | https://arxiv.org/abs/1910.11088v2 |
https://arxiv.org/pdf/1910.11088v2.pdf | |
PWC | https://paperswithcode.com/paper/deeppco-end-to-end-point-cloud-odometry |
Repo | |
Framework | |
Using Photorealistic Face Synthesis and Domain Adaptation to Improve Facial Expression Analysis
Title | Using Photorealistic Face Synthesis and Domain Adaptation to Improve Facial Expression Analysis |
Authors | Behzad Bozorgtabar, Mohammad Saeed Rad, Hazim Kemal Ekenel, Jean-Philippe Thiran |
Abstract | Cross-domain synthesizing realistic faces to learn deep models has attracted increasing attention for facial expression analysis as it helps to improve the performance of expression recognition accuracy despite having small number of real training images. However, learning from synthetic face images can be problematic due to the distribution discrepancy between low-quality synthetic images and real face images and may not achieve the desired performance when the learned model applies to real world scenarios. To this end, we propose a new attribute guided face image synthesis to perform a translation between multiple image domains using a single model. In addition, we adopt the proposed model to learn from synthetic faces by matching the feature distributions between different domains while preserving each domain’s characteristics. We evaluate the effectiveness of the proposed approach on several face datasets on generating realistic face images. We demonstrate that the expression recognition performance can be enhanced by benefiting from our face synthesis model. Moreover, we also conduct experiments on a near-infrared dataset containing facial expression videos of drivers to assess the performance using in-the-wild data for driver emotion recognition. |
Tasks | Domain Adaptation, Emotion Recognition, Face Generation, Image Generation |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.08090v1 |
https://arxiv.org/pdf/1905.08090v1.pdf | |
PWC | https://paperswithcode.com/paper/using-photorealistic-face-synthesis-and |
Repo | |
Framework | |