January 31, 2020

3373 words 16 mins read

Paper Group ANR 179

Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models. Boundary Cues for 3D Object Shape Recovery. Quantum Structure in Cognition: Human Language as a Boson Gas of Entangled Words. Learning gradient-based ICA by neurally estimating mutual information. DIALOG: A framework for modeling, an …

Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models


Title	Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models
Authors	Zhi-Xiu Ye, Qian Chen, Wen Wang, Zhen-Hua Ling
Abstract	Neural language representation models such as Bidirectional Encoder Representations from Transformers (BERT) pre-trained on large-scale corpora can well capture rich semantics from plain text, and can be fine-tuned to consistently improve the performance on various natural language processing (NLP) tasks. However, the existing pre-trained language representation models rarely consider explicitly incorporating commonsense knowledge or other knowledge. In this paper, we develop a pre-training approach for incorporating commonsense knowledge into language representation models. We construct a commonsense-related multi-choice question answering dataset for pre-training a neural language representation model. The dataset is created automatically by our proposed “align, mask, and select” (AMS) method. We also investigate different pre-training tasks. Experimental results demonstrate that pre-training models using the proposed approach followed by fine-tuning achieves significant improvements on various commonsense-related tasks, such as CommonsenseQA and Winograd Schema Challenge, while maintaining comparable performance on other NLP tasks, such as sentence classification and natural language inference (NLI) tasks, compared to the original BERT models.
Tasks	Natural Language Inference, Question Answering, Sentence Classification
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06725v4
PDF	https://arxiv.org/pdf/1908.06725v4.pdf
PWC	https://paperswithcode.com/paper/align-mask-and-select-a-simple-method-for
Repo
Framework

Boundary Cues for 3D Object Shape Recovery


Title	Boundary Cues for 3D Object Shape Recovery
Authors	Kevin Karsch, Zicheng Liao, Jason Rock, Jonathan T. Barron, Derek Hoiem
Abstract	Early work in computer vision considered a host of geometric cues for both shape reconstruction and recognition. However, since then, the vision community has focused heavily on shading cues for reconstruction, and moved towards data-driven approaches for recognition. In this paper, we reconsider these perhaps overlooked “boundary” cues (such as self occlusions and folds in a surface), as well as many other established constraints for shape reconstruction. In a variety of user studies and quantitative tasks, we evaluate how well these cues inform shape reconstruction (relative to each other) in terms of both shape quality and shape recognition. Our findings suggest many new directions for future research in shape reconstruction, such as automatic boundary cue detection and relaxing assumptions in shape from shading (e.g. orthographic projection, Lambertian surfaces).
Tasks
Published	2019-12-24
URL	https://arxiv.org/abs/1912.11566v1
PDF	https://arxiv.org/pdf/1912.11566v1.pdf
PWC	https://paperswithcode.com/paper/boundary-cues-for-3d-object-shape-recovery-1
Repo
Framework

Quantum Structure in Cognition: Human Language as a Boson Gas of Entangled Words


Title	Quantum Structure in Cognition: Human Language as a Boson Gas of Entangled Words
Authors	Diederik Aerts, Lester Beltran
Abstract	We model a piece of text of human language telling a story by means of the quantum structure describing a Bose gas in a state close to a Bose-Einstein condensate near absolute zero temperature. For this we introduce energy levels for the words (concepts) used in the story and we also introduce the new notion of ‘cogniton’ as the quantum of human thought. Words (concepts) are then cognitons in different energy states as it is the case for photons in different energy states, or states of different radiative frequency, when the considered boson gas is that of the quanta of the electromagnetic field. We show that Bose-Einstein statistics delivers a very good model for these pieces of texts telling stories, both for short stories and for long stories of the size of novels. We analyze an unexpected connection with Zipf’s law in human language, the Zipf ranking relating to the energy levels of the words, and the Bose-Einstein graph coinciding with the Zipf graph. We investigate the issue of ‘identity and indistinguishability’ from this new perspective and conjecture that the way one can easily understand how two of ‘the same concepts’ are ‘absolutely identical and indistinguishable’ in human language is also the way in which quantum particles are absolutely identical and indistinguishable in physical reality, providing in this way new evidence for our conceptuality interpretation of quantum theory.
Tasks
Published	2019-09-15
URL	https://arxiv.org/abs/1909.06845v2
PDF	https://arxiv.org/pdf/1909.06845v2.pdf
PWC	https://paperswithcode.com/paper/human-language-a-boson-gas-of-quantum
Repo
Framework

Learning gradient-based ICA by neurally estimating mutual information


Title	Learning gradient-based ICA by neurally estimating mutual information
Authors	Hlynur Davíð Hlynsson, Laurenz Wiskott
Abstract	Several methods of estimating the mutual information of random variables have been developed in recent years. They can prove valuable for novel approaches to learning statistically independent features. In this paper, we use one of these methods, a mutual information neural estimation (MINE) network, to present a proof-of-concept of how a neural network can perform linear ICA. We minimize the mutual information, as estimated by a MINE network, between the output units of a differentiable encoder network. This is done by simple alternate optimization of the two networks. The method is shown to get a qualitatively equal solution to FastICA on blind-source-separation of noisy sources.
Tasks
Published	2019-04-22
URL	http://arxiv.org/abs/1904.09858v1
PDF	http://arxiv.org/pdf/1904.09858v1.pdf
PWC	https://paperswithcode.com/paper/learning-gradient-based-ica-by-neurally
Repo
Framework

DIALOG: A framework for modeling, analysis and reuse of digital forensic knowledge


Title	DIALOG: A framework for modeling, analysis and reuse of digital forensic knowledge
Authors	Damir Kahvedzic, Tahar Kechadi
Abstract	This paper presents DIALOG (Digital Investigation Ontology); a framework for the management, reuse, and analysis of Digital Investigation knowledge. DIALOG provides a general, application independent vocabulary that can be used to describe an investigation at different levels of detail. DIALOG is defined to encapsulate all concepts of the digital forensics field and the relationships between them. In particular, we concentrate on the Windows Registry, where registry keys are modeled in terms of both their structure and function. Registry analysis software tools are modeled in a similar manner and we illustrate how the interpretation of their results can be done using the reasoning capabilities of ontology
Tasks
Published	2019-02-21
URL	http://arxiv.org/abs/1903.03061v1
PDF	http://arxiv.org/pdf/1903.03061v1.pdf
PWC	https://paperswithcode.com/paper/dialog-a-framework-for-modeling-analysis-and
Repo
Framework

Autoregressive Convolutional Recurrent Neural Network for Univariate and Multivariate Time Series Prediction


Title	Autoregressive Convolutional Recurrent Neural Network for Univariate and Multivariate Time Series Prediction
Authors	Matteo Maggiolo, Gerasimos Spanakis
Abstract	Time Series forecasting (univariate and multivariate) is a problem of high complexity due the different patterns that have to be detected in the input, ranging from high to low frequencies ones. In this paper we propose a new model for timeseries prediction that utilizes convolutional layers for feature extraction, a recurrent encoder and a linear autoregressive component. We motivate the model and we test and compare it against a baseline of widely used existing architectures for univariate and multivariate timeseries. The proposed model appears to outperform the baselines in almost every case of the multivariate timeseries datasets, in some cases even with 50% improvement which shows the strengths of such a hybrid architecture in complex timeseries.
Tasks	Time Series, Time Series Forecasting, Time Series Prediction
Published	2019-03-06
URL	http://arxiv.org/abs/1903.02540v1
PDF	http://arxiv.org/pdf/1903.02540v1.pdf
PWC	https://paperswithcode.com/paper/autoregressive-convolutional-recurrent-neural
Repo
Framework

EnergyStar++: Towards more accurate and explanatory building energy benchmarking


Title	EnergyStar++: Towards more accurate and explanatory building energy benchmarking
Authors	Pandarasamy Arjunan, Kameshwar Poolla, Clayton Miller
Abstract	Building energy performance benchmarking has been adopted widely in the USA and Canada through the Energy Star Portfolio Manager platform. Building operations and energy management professionals have long used a simple 1-100 score to understand how their building compares to its peers. This single number is easy to use, but is created by inaccurate linear regression (MLR) models. This paper proposes a methodology that enhances the existing Energy Star calculation method by increasing accuracy and providing additional model output processing to help explain why a building is achieving a certain score. We propose and test two new prediction models: multiple linear regression with feature interactions (MLRi) and gradient boosted trees (GBT). Both models have better average accuracy than the baseline Energy Star models. The third order MLRi and GBT models achieve 4.9% and 24.9% increase in adjusted R2, respectively, and 7.0% and 13.7% decrease in normalized root mean squared error (NRMSE), respectively, on average than MLR models for six building types. Even more importantly, a set of techniques is developed to help determine which factors most influence the score using SHAP values. The SHAP force visualization in particular offers an accessible overview of the aspects of the building that influence the score that non-technical users can readily interpret. This methodology is tested on the 2012 Commercial Building Energy Consumption Survey (CBECS)(1,812 buildings) and public data sets from the energy disclosure programs of New York City (11,131 buildings) and Seattle (2,073 buildings).
Tasks
Published	2019-10-30
URL	https://arxiv.org/abs/1910.14563v1
PDF	https://arxiv.org/pdf/1910.14563v1.pdf
PWC	https://paperswithcode.com/paper/energystar-towards-more-accurate-and
Repo
Framework


Title	CMIR-NET : A Deep Learning Based Model For Cross-Modal Retrieval In Remote Sensing
Authors	Ushasi Chaudhuri, Biplab Banerjee, Avik Bhattacharya, Mihai Datcu
Abstract	We address the problem of cross-modal information retrieval in the domain of remote sensing. In particular, we are interested in two application scenarios: i) cross-modal retrieval between panchromatic (PAN) and multi-spectral imagery, and ii) multi-label image retrieval between very high resolution (VHR) images and speech based label annotations. Notice that these multi-modal retrieval scenarios are more challenging than the traditional uni-modal retrieval approaches given the inherent differences in distributions between the modalities. However, with the growing availability of multi-source remote sensing data and the scarcity of enough semantic annotations, the task of multi-modal retrieval has recently become extremely important. In this regard, we propose a novel deep neural network based architecture which is considered to learn a discriminative shared feature space for all the input modalities, suitable for semantically coherent information retrieval. Extensive experiments are carried out on the benchmark large-scale PAN - multi-spectral DSRSID dataset and the multi-label UC-Merced dataset. Together with the Merced dataset, we generate a corpus of speech signals corresponding to the labels. Superior performance with respect to the current state-of-the-art is observed in all the cases.
Tasks	Cross-Modal Information Retrieval, Cross-Modal Retrieval, Image Retrieval, Information Retrieval, Multi-Label Image Retrieval
Published	2019-04-09
URL	https://arxiv.org/abs/1904.04794v2
PDF	https://arxiv.org/pdf/1904.04794v2.pdf
PWC	https://paperswithcode.com/paper/cmir-net-a-deep-learning-based-model-for
Repo
Framework

Neural Recurrent Structure Search for Knowledge Graph Embedding


Title	Neural Recurrent Structure Search for Knowledge Graph Embedding
Authors	Yongqi Zhang, Quanming Yao, Lei Chen
Abstract	Knowledge graph (KG) embedding is a fundamental problem in mining relational patterns. It aims to encode the entities and relations in KG into low dimensional vector space that can be used for subsequent algorithms. Lots of KG embedding models have been proposed to learn the interactions between entities and relations, which contain meaningful semantic information. However, structural information, which encodes local topology among entities, is also important to KG. In this work, we propose S2E to distill structural information and combine it with semantic information for different KGs as a neural architecture search (NAS) problem. First, we analyze the difficulty of using a unified model to solve the distillation problem. Based on it, we define the path distiller to recurrently combine structural and semantic information along relational paths, which are sampled to preserve both local topologies and semantics. Then, inspired by the recent success of NAS, we design a recurrent network-based search space for specific KG tasks and propose a natural gradient (NG) based search algorithm to update architectures. Experimental results demonstrate that the searched models by our proposed S2E outperform human-designed ones, and the NG based search algorithm is efficient compared with other NAS methods. Besides, our work is the first NAS method for RNN that can search architectures with better performance than human-designed models.
Tasks	Graph Embedding, Knowledge Graph Embedding, Neural Architecture Search
Published	2019-11-17
URL	https://arxiv.org/abs/1911.07132v1
PDF	https://arxiv.org/pdf/1911.07132v1.pdf
PWC	https://paperswithcode.com/paper/neural-recurrent-structure-search-for
Repo
Framework

Improving Neural Machine Translation with Pre-trained Representation


Title	Improving Neural Machine Translation with Pre-trained Representation
Authors	Rongxiang Weng, Heng Yu, Shujian Huang, Weihua Luo, Jiajun Chen
Abstract	Monolingual data has been demonstrated to be helpful in improving the translation quality of neural machine translation (NMT). The current methods stay at the usage of word-level knowledge, such as generating synthetic parallel data or extracting information from word embedding. In contrast, the power of sentence-level contextual knowledge which is more complex and diverse, playing an important role in natural language generation, has not been fully exploited. In this paper, we propose a novel structure which could leverage monolingual data to acquire sentence-level contextual representations. Then, we design a framework for integrating both source and target sentence-level representations into NMT model to improve the translation quality. Experimental results on Chinese-English, German-English machine translation tasks show that our proposed model achieves improvement over strong Transformer baselines, while experiments on English-Turkish further demonstrate the effectiveness of our approach in the low-resource scenario.
Tasks	Machine Translation, Text Generation
Published	2019-08-21
URL	https://arxiv.org/abs/1908.07688v1
PDF	https://arxiv.org/pdf/1908.07688v1.pdf
PWC	https://paperswithcode.com/paper/190807688
Repo
Framework

Coordination and Trajectory Prediction for Vehicle Interactions via Bayesian Generative Modeling


Title	Coordination and Trajectory Prediction for Vehicle Interactions via Bayesian Generative Modeling
Authors	Jiachen Li, Hengbo Ma, Wei Zhan, Masayoshi Tomizuka
Abstract	Coordination recognition and subtle pattern prediction of future trajectories play a significant role when modeling interactive behaviors of multiple agents. Due to the essential property of uncertainty in the future evolution, deterministic predictors are not sufficiently safe and robust. In order to tackle the task of probabilistic prediction for multiple, interactive entities, we propose a coordination and trajectory prediction system (CTPS), which has a hierarchical structure including a macro-level coordination recognition module and a micro-level subtle pattern prediction module which solves a probabilistic generation task. We illustrate two types of representation of the coordination variable: categorized and real-valued, and compare their effects and advantages based on empirical studies. We also bring the ideas of Bayesian deep learning into deep generative models to generate diversified prediction hypotheses. The proposed system is tested on multiple driving datasets in various traffic scenarios, which achieves better performance than baseline approaches in terms of a set of evaluation metrics. The results also show that using categorized coordination can better capture multi-modality and generate more diversified samples than the real-valued coordination, while the latter can generate prediction hypotheses with smaller errors with a sacrifice of sample diversity. Moreover, employing neural networks with weight uncertainty is able to generate samples with larger variance and diversity.
Tasks	Trajectory Prediction
Published	2019-05-02
URL	http://arxiv.org/abs/1905.00587v1
PDF	http://arxiv.org/pdf/1905.00587v1.pdf
PWC	https://paperswithcode.com/paper/coordination-and-trajectory-prediction-for
Repo
Framework

A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection


Title	A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection
Authors	Qinbin Li, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo Wang, Bingsheng He
Abstract	Federated learning has been a hot research topic in enabling the collaborative training of machine learning models among different organizations under the privacy restrictions. As researchers try to support more machine learning models with different privacy-preserving approaches, there is a requirement in developing systems and infrastructures to ease the development of various federated learning algorithms. Similar to deep learning systems such as PyTorch and TensorFlow that boost the development of deep learning, federated learning systems (FLSs) are equivalently important, and face challenges from various aspects such as effectiveness, efficiency, and privacy. In this survey, we conduct a comprehensive review on federated learning systems. To achieve smooth flow and guide future research, we introduce the definition of federated learning systems and analyze the system components. Moreover, we provide a thorough categorization for federated learning systems according to six different aspects, including data distribution, machine learning model, privacy mechanism, communication architecture, scale of federation and motivation of federation. The categorization can help the design of federated learning systems as shown in our case studies. By systematically summarizing the existing federated learning systems, we present the design factors, case studies, and future research opportunities.
Tasks
Published	2019-07-23
URL	https://arxiv.org/abs/1907.09693v4
PDF	https://arxiv.org/pdf/1907.09693v4.pdf
PWC	https://paperswithcode.com/paper/federated-learning-systems-vision-hype-and
Repo
Framework

Human Action Recognition in Drone Videos using a Few Aerial Training Examples


Title	Human Action Recognition in Drone Videos using a Few Aerial Training Examples
Authors	Waqas Sultani, Mubarak Shah
Abstract	Drones are enabling new forms of human action surveil-lance due to their low cost and fast mobility. However, using deep neural networks for automatic aerial action recognition is difficult due to the need for the humongous number of aerial human action videos needed for training. Collecting a large collection of human action aerial videos costly, time-consuming and difficult. In this paper, we explore two alternative data sources to improve aerial action classification when only a few training aerial examples are available. As a first data source, we resort to video games. We introduce the first of its kind game action dataset. The dataset contains plenty of ground and aerial video pairs of human actions from video games. For the second data source, employing ground videos we generate discriminative fake aerial examples using conditionalWasserstein Generative Adversarial Networks. We do not assume game and real action dataset of having the same action classes. Due to the heterogeneous nature of the data (game and real action dataset actions are not necessarily the same), we feed the network with real and game(or fake) data in an alternating fashion in a disjoint multitask learning framework to obtain a robust action classifier. We validate the proposed approach on several aerial action datasets and demonstrate that aerial games and generated fake aerial examples can be extremely useful for improved action recognition in real aerial videos when only a few aerial training examples are available. The code, aerial-ground game action dataset, and a real aerial action will be made publicly available.
Tasks	Action Classification, Temporal Action Localization
Published	2019-10-22
URL	https://arxiv.org/abs/1910.10027v2
PDF	https://arxiv.org/pdf/1910.10027v2.pdf
PWC	https://paperswithcode.com/paper/human-action-recognition-in-drone-videos
Repo
Framework

Learning Execution through Neural Code Fusion


Title	Learning Execution through Neural Code Fusion
Authors	Zhan Shi, Kevin Swersky, Daniel Tarlow, Parthasarathy Ranganathan, Milad Hashemi
Abstract	As the performance of computer systems stagnates due to the end of Moore’s Law, there is a need for new models that can understand and optimize the execution of general purpose code. While there is a growing body of work on using Graph Neural Networks (GNNs) to learn representations of source code, these representations do not understand how code dynamically executes. In this work, we propose a new approach to use GNNs to learn fused representations of general source code and its execution. Our approach defines a multi-task GNN over low-level representations of source code and program state (i.e., assembly code and dynamic memory states), converting complex source code constructs and complex data structures into a simpler, more uniform format. We show that this leads to improved performance over similar methods that do not use execution and it opens the door to applying GNN models to new tasks that would not be feasible from static code alone. As an illustration of this, we apply the new model to challenging dynamic tasks (branch prediction and prefetching) from the SPEC CPU benchmark suite, outperforming the state-of-the-art by 26% and 45% respectively. Moreover, we use the learned fused graph embeddings to demonstrate transfer learning with high performance on an indirectly related task (algorithm classification).
Tasks	Transfer Learning
Published	2019-06-17
URL	https://arxiv.org/abs/1906.07181v2
PDF	https://arxiv.org/pdf/1906.07181v2.pdf
PWC	https://paperswithcode.com/paper/learning-execution-through-neural-code-fusion
Repo
Framework

Analyzing the Variety Loss in the Context of Probabilistic Trajectory Prediction


Title	Analyzing the Variety Loss in the Context of Probabilistic Trajectory Prediction
Authors	Luca Anthony Thiede, Pratik Prabhanjan Brahma
Abstract	Trajectory or behavior prediction of traffic agents is an important component of autonomous driving and robot planning in general. It can be framed as a probabilistic future sequence generation problem and recent literature has studied the applicability of generative models in this context. The variety or Minimum over N (MoN) loss, which tries to minimize the error between the ground truth and the closest of N output predictions, has been used in these recent learning models to improve the diversity of predictions. In this work, we present a proof to show that the MoN loss does not lead to the ground truth probability density function, but approximately to its square root instead. We validate this finding with extensive experiments on both simulated toy as well as real world datasets. We also propose multiple solutions to compensate for the dilation to show improvement of log likelihood of the ground truth samples in the corrected probability density function.
Tasks	Autonomous Driving, Trajectory Prediction
Published	2019-07-23
URL	https://arxiv.org/abs/1907.10178v1
PDF	https://arxiv.org/pdf/1907.10178v1.pdf
PWC	https://paperswithcode.com/paper/analyzing-the-variety-loss-in-the-context-of
Repo
Framework