January 29, 2020

3326 words 16 mins read

Paper Group ANR 669

On Architectures for Including Visual Information in Neural Language Models for Image Description. Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation. End-to-End Deep Neural Networks and Transfer Learning for Automatic Analysis of Nation-State Malware. A Regularized Convolutional Neural Network for Semantic Image Se …

On Architectures for Including Visual Information in Neural Language Models for Image Description


Title	On Architectures for Including Visual Information in Neural Language Models for Image Description
Authors	Marc Tanti, Albert Gatt, Kenneth P. Camilleri
Abstract	A neural language model can be conditioned into generating descriptions for images by providing visual information apart from the sentence prefix. This visual information can be included into the language model through different points of entry resulting in different neural architectures. We identify four main architectures which we call init-inject, pre-inject, par-inject, and merge. We analyse these four architectures and conclude that the best performing one is init-inject, which is when the visual information is injected into the initial state of the recurrent neural network. We confirm this using both automatic evaluation measures and human annotation. We then analyse how much influence the images have on each architecture. This is done by measuring how different the output probabilities of a model are when a partial sentence is combined with a completely different image from the one it is meant to be combined with. We find that init-inject tends to quickly become less influenced by the image as more words are generated. A different architecture called merge, which is when the visual information is merged with the recurrent neural network’s hidden state vector prior to output, loses visual influence much more slowly, suggesting that it would work better for generating longer sentences. We also observe that the merge architecture can have its recurrent neural network pre-trained in a text-only language model (transfer learning) rather than be initialised randomly as usual. This results in even better performance than the other architectures, provided that the source language model is not too good at language modelling or it will overspecialise and be less effective at image description generation. Our work opens up new avenues of research in neural architectures, explainable AI, and transfer learning.
Tasks	Language Modelling, Transfer Learning
Published	2019-11-09
URL	https://arxiv.org/abs/1911.03738v1
PDF	https://arxiv.org/pdf/1911.03738v1.pdf
PWC	https://paperswithcode.com/paper/on-architectures-for-including-visual
Repo
Framework

Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation


Title	Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation
Authors	Ruohan Wang, Carlo Ciliberto, Pierluigi Amadori, Yiannis Demiris
Abstract	We consider the problem of imitation learning from a finite set of expert trajectories, without access to reinforcement signals. The classical approach of extracting the expert’s reward function via inverse reinforcement learning, followed by reinforcement learning is indirect and may be computationally expensive. Recent generative adversarial methods based on matching the policy distribution between the expert and the agent could be unstable during training. We propose a new framework for imitation learning by estimating the support of the expert policy to compute a fixed reward function, which allows us to re-frame imitation learning within the standard reinforcement learning setting. We demonstrate the efficacy of our reward function on both discrete and continuous domains, achieving comparable or better performance than the state of the art under different reinforcement learning algorithms.
Tasks	Imitation Learning
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06750v2
PDF	https://arxiv.org/pdf/1905.06750v2.pdf
PWC	https://paperswithcode.com/paper/random-expert-distillation-imitation-learning
Repo
Framework

End-to-End Deep Neural Networks and Transfer Learning for Automatic Analysis of Nation-State Malware


Title	End-to-End Deep Neural Networks and Transfer Learning for Automatic Analysis of Nation-State Malware
Authors	Ishai Rosenberg, Guillaume Sicard, Eli David
Abstract	Malware allegedly developed by nation-states, also known as advanced persistent threats (APT), are becoming more common. The task of attributing an APT to a specific nation-state or classifying it to the correct APT family is challenging for several reasons. First, each nation-state has more than a single cyber unit that develops such malware, rendering traditional authorship attribution algorithms useless. Furthermore, the dataset of such available APTs is still extremely small. Finally, those APTs use state-of-the-art evasion techniques, making feature extraction challenging. In this paper, we use a deep neural network (DNN) as a classifier for nation-state APT attribution. We record the dynamic behavior of the APT when run in a sandbox and use it as raw input for the neural network, allowing the DNN to learn high level feature abstractions of the APTs itself. We also use the same raw features for APT family classification. Finally, we use the feature abstractions learned by the APT family classifier to solve the attribution problem. Using a test set of 1000 Chinese and Russian developed APTs, we achieved an accuracy rate of 98.6%.
Tasks	Transfer Learning
Published	2019-11-30
URL	https://arxiv.org/abs/1912.01493v1
PDF	https://arxiv.org/pdf/1912.01493v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-deep-neural-networks-and-transfer
Repo
Framework

A Regularized Convolutional Neural Network for Semantic Image Segmentation


Title	A Regularized Convolutional Neural Network for Semantic Image Segmentation
Authors	Fan Jia, Jun Liu, Xue-cheng Tai
Abstract	Convolutional neural networks (CNNs) show outstanding performance in many image processing problems, such as image recognition, object detection and image segmentation. Semantic segmentation is a very challenging task that requires recognizing, understanding what’s in the image in pixel level. Though the state of the art has been greatly improved by CNNs, there is no explicit connections between prediction of neighbouring pixels. That is, spatial regularity of the segmented objects is still a problem for CNNs. In this paper, we propose a method to add spatial regularization to the segmented objects. In our method, the spatial regularization such as total variation (TV) can be easily integrated into CNN network. It can help CNN find a better local optimum and make the segmentation results more robust to noise. We apply our proposed method to Unet and Segnet, which are well established CNNs for image segmentation, and test them on WBC, CamVid and SUN-RGBD datasets, respectively. The results show that the regularized networks not only could provide better segmentation results with regularization effect than the original ones but also have certain robustness to noise.
Tasks	Object Detection, Semantic Segmentation
Published	2019-06-28
URL	https://arxiv.org/abs/1907.05287v1
PDF	https://arxiv.org/pdf/1907.05287v1.pdf
PWC	https://paperswithcode.com/paper/a-regularized-convolutional-neural-network
Repo
Framework

The Medical Deconfounder: Assessing Treatment Effects with Electronic Health Records


Title	The Medical Deconfounder: Assessing Treatment Effects with Electronic Health Records
Authors	Linying Zhang, Yixin Wang, Anna Ostropolets, Jami J. Mulgrave, David M. Blei, George Hripcsak
Abstract	The treatment effects of medications play a key role in guiding medical prescriptions. They are usually assessed with randomized controlled trials (RCTs), which are expensive. Recently, large-scale electronic health records (EHRs) have become available, opening up new opportunities for more cost-effective assessments. However, assessing a treatment effect from EHRs is challenging: it is biased by unobserved confounders, unmeasured variables that affect both patients’ medical prescription and their outcome, e.g. the patients’ social economic status. To adjust for unobserved confounders, we develop the medical deconfounder, a machine learning algorithm that unbiasedly estimates treatment effects from EHRs. The medical deconfounder first constructs a substitute confounder by modeling which medications were prescribed to each patient; this substitute confounder is guaranteed to capture all multi-medication confounders, observed or unobserved (arXiv:1805.06826). It then uses this substitute confounder to adjust for the confounding bias in the analysis. We validate the medical deconfounder on two simulated and two real medical data sets. Compared to classical approaches, the medical deconfounder produces closer-to-truth treatment effect estimates; it also identifies effective medications that are more consistent with the findings in the medical literature.
Tasks
Published	2019-04-03
URL	https://arxiv.org/abs/1904.02098v2
PDF	https://arxiv.org/pdf/1904.02098v2.pdf
PWC	https://paperswithcode.com/paper/the-medical-deconfounder-assessing-treatment
Repo
Framework

Applying a Pre-trained Language Model to Spanish Twitter Humor Prediction


Title	Applying a Pre-trained Language Model to Spanish Twitter Humor Prediction
Authors	Bobak Farzin, Piotr Czapla, Jeremy Howard
Abstract	Our entry into the HAHA 2019 Challenge placed $3^{rd}$ in the classification task and $2^{nd}$ in the regression task. We describe our system and innovations, as well as comparing our results to a Naive Bayes baseline. A large Twitter based corpus allowed us to train a language model from scratch focused on Spanish and transfer that knowledge to our competition model. To overcome the inherent errors in some labels we reduce our class confidence with label smoothing in the loss function. All the code for our project is included in a GitHub repository for easy reference and to enable replication by others.
Tasks	Language Modelling
Published	2019-07-06
URL	https://arxiv.org/abs/1907.03187v1
PDF	https://arxiv.org/pdf/1907.03187v1.pdf
PWC	https://paperswithcode.com/paper/applying-a-pre-trained-language-model-to
Repo
Framework

Convergence of Edge Computing and Deep Learning: A Comprehensive Survey


Title	Convergence of Edge Computing and Deep Learning: A Comprehensive Survey
Authors	Xiaofei Wang, Yiwen Han, Victor C. M. Leung, Dusit Niyato, Xueqiang Yan, Xu Chen
Abstract	Ubiquitous sensors and smart devices from factories and communities are generating massive amounts of data, and ever-increasing computing power is driving the core of computation and services from the cloud to the edge of the network. As an important enabler broadly changing people’s lives, from face recognition to ambitious smart factories and cities, developments of artificial intelligence (especially deep learning, DL) based applications and services are thriving. However, due to efficiency and latency issues, the current cloud computing service architecture hinders the vision of “providing artificial intelligence for every person and every organization at everywhere”. Thus, unleashing DL services using resources at the network edge near the data sources has emerged as a desirable solution. Therefore, edge intelligence, aiming to facilitate the deployment of DL services by edge computing, has received significant attention. In addition, DL, as the representative technique of artificial intelligence, can be integrated into edge computing frameworks to build intelligent edge for dynamic, adaptive edge maintenance and management. With regard to mutually beneficial edge intelligence and intelligent edge, this paper introduces and discusses: 1) the application scenarios of both; 2) the practical implementation methods and enabling technologies, namely DL training and inference in the customized edge computing framework; 3) challenges and future trends of more pervasive and fine-grained intelligence. We believe that by consolidating information scattered across the communication, networking, and DL areas, this survey can help readers to understand the connections between enabling technologies while promoting further discussions on the fusion of edge intelligence and intelligent edge, i.e., Edge DL.
Tasks	Face Recognition
Published	2019-07-19
URL	https://arxiv.org/abs/1907.08349v3
PDF	https://arxiv.org/pdf/1907.08349v3.pdf
PWC	https://paperswithcode.com/paper/convergence-of-edge-computing-and-deep
Repo
Framework

Efficient Truncated Statistics with Unknown Truncation


Title	Efficient Truncated Statistics with Unknown Truncation
Authors	Vasilis Kontonis, Christos Tzamos, Manolis Zampetakis
Abstract	We study the problem of estimating the parameters of a Gaussian distribution when samples are only shown if they fall in some (unknown) subset $S \subseteq \R^d$. This core problem in truncated statistics has long history going back to Galton, Lee, Pearson and Fisher. Recent work by Daskalakis et al. (FOCS’18), provides the first efficient algorithm that works for arbitrary sets in high dimension when the set is known, but leaves as an open problem the more challenging and relevant case of unknown truncation set. Our main result is a computationally and sample efficient algorithm for estimating the parameters of the Gaussian under arbitrary unknown truncation sets whose performance decays with a natural measure of complexity of the set, namely its Gaussian surface area. Notably, this algorithm works for large families of sets including intersections of halfspaces, polynomial threshold functions and general convex sets. We show that our algorithm closely captures the tradeoff between the complexity of the set and the number of samples needed to learn the parameters by exhibiting a set with small Gaussian surface area for which it is information theoretically impossible to learn the true Gaussian with few samples.
Tasks
Published	2019-08-02
URL	https://arxiv.org/abs/1908.01034v1
PDF	https://arxiv.org/pdf/1908.01034v1.pdf
PWC	https://paperswithcode.com/paper/efficient-truncated-statistics-with-unknown
Repo
Framework

Persistence B-Spline Grids: Stable Vector Representation of Persistence Diagrams Based on Data Fitting


Title	Persistence B-Spline Grids: Stable Vector Representation of Persistence Diagrams Based on Data Fitting
Authors	Zhetong Dong, Hongwei Lin, Chi Zhou
Abstract	Over the last decades, many attempts have been made to optimally integrate machine learning (ML) and topological data analysis. A prominent problem in applying persistent homology to ML tasks is finding a vector representation of a persistence diagram (PD), which is a summary diagram for representing topological features. From the perspective of data fitting, a stable vector representation, persistence B-spline grid (PB), is proposed based on the efficient technique of progressive-iterative approximation for least-squares B-spline surface fitting. Meanwhile, we theoretically prove that the PB method is stable with respect to the metrics defined on the PD space, i.e., the $p$-Wasserstein distance and the bottleneck distance. The proposed method was tested on a synthetic dataset, datasets of randomly generated PDs, data of a dynamical system, and 3D CAD models.
Tasks	Topological Data Analysis
Published	2019-09-17
URL	https://arxiv.org/abs/1909.08417v1
PDF	https://arxiv.org/pdf/1909.08417v1.pdf
PWC	https://paperswithcode.com/paper/persistence-b-spline-grids-stable-vector
Repo
Framework

DeepCopy: Grounded Response Generation with Hierarchical Pointer Networks


Title	DeepCopy: Grounded Response Generation with Hierarchical Pointer Networks
Authors	Semih Yavuz, Abhinav Rastogi, Guan-Lin Chao, Dilek Hakkani-Tur
Abstract	Recent advances in neural sequence-to-sequence models have led to promising results for several language generation-based tasks, including dialogue response generation, summarization, and machine translation. However, these models are known to have several problems, especially in the context of chit-chat based dialogue systems: they tend to generate short and dull responses that are often too generic. Furthermore, these models do not ground conversational responses on knowledge and facts, resulting in turns that are not accurate, informative and engaging for the users. In this paper, we propose and experiment with a series of response generation models that aim to serve in the general scenario where in addition to the dialogue context, relevant unstructured external knowledge in the form of text is also assumed to be available for models to harness. Our proposed approach extends pointer-generator networks (See et al., 2017) by allowing the decoder to hierarchically attend and copy from external knowledge in addition to the dialogue context. We empirically show the effectiveness of the proposed model compared to several baselines including (Ghazvininejad et al., 2018; Zhang et al., 2018) through both automatic evaluation metrics and human evaluation on CONVAI2 dataset.
Tasks	Machine Translation, Text Generation
Published	2019-08-28
URL	https://arxiv.org/abs/1908.10731v1
PDF	https://arxiv.org/pdf/1908.10731v1.pdf
PWC	https://paperswithcode.com/paper/deepcopy-grounded-response-generation-with
Repo
Framework

Clustering Activity-Travel Behavior Time Series using Topological Data Analysis


Title	Clustering Activity-Travel Behavior Time Series using Topological Data Analysis
Authors	Renjie Chen, Jingyue Zhang, Nalini Ravishanker, Karthik Konduri
Abstract	Over the last few years, traffic data has been exploding and the transportation discipline has entered the era of big data. It brings out new opportunities for doing data-driven analysis, but it also challenges traditional analytic methods. This paper proposes a new Divide and Combine based approach to do K means clustering on activity-travel behavior time series using features that are derived using tools in Time Series Analysis and Topological Data Analysis. Clustering data from five waves of the National Household Travel Survey ranging from 1990 to 2017 suggests that activity-travel patterns of individuals over the last three decades can be grouped into three clusters. Results also provide evidence in support of recent claims about differences in activity-travel patterns of different survey cohorts. The proposed method is generally applicable and is not limited only to activity-travel behavior analysis in transportation studies. Driving behavior, travel mode choice, household vehicle ownership, when being characterized as categorical time series, can all be analyzed using the proposed method.
Tasks	Time Series, Time Series Analysis, Topological Data Analysis
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07603v1
PDF	https://arxiv.org/pdf/1907.07603v1.pdf
PWC	https://paperswithcode.com/paper/clustering-activity-travel-behavior-time
Repo
Framework

Artificially Evolved Chunks for Morphosyntactic Analysis


Title	Artificially Evolved Chunks for Morphosyntactic Analysis
Authors	Mark Anderson, David Vilares, Carlos Gómez-Rodríguez
Abstract	We introduce a language-agnostic evolutionary technique for automatically extracting chunks from dependency treebanks. We evaluate these chunks on a number of morphosyntactic tasks, namely POS tagging, morphological feature tagging, and dependency parsing. We test the utility of these chunks in a host of different ways. We first learn chunking as one task in a shared multi-task framework together with POS and morphological feature tagging. The predictions from this network are then used as input to augment sequence-labelling dependency parsing. Finally, we investigate the impact chunks have on dependency parsing in a multi-task framework. Our results from these analyses show that these chunks improve performance at different levels of syntactic abstraction on English UD treebanks and a small, diverse subset of non-English UD treebanks.
Tasks	Chunking, Dependency Parsing
Published	2019-08-09
URL	https://arxiv.org/abs/1908.03480v2
PDF	https://arxiv.org/pdf/1908.03480v2.pdf
PWC	https://paperswithcode.com/paper/artificially-evolved-chunks-for
Repo
Framework

Optimizing Rank-based Metrics with Blackbox Differentiation


Title	Optimizing Rank-based Metrics with Blackbox Differentiation
Authors	Michal Rolínek, Vít Musil, Anselm Paulus, Marin Vlastelica, Claudio Michaelis, Georg Martius
Abstract	Rank-based metrics are some of the most widely used criteria for performance evaluation of computer vision models. Despite years of effort, direct optimization for these metrics remains a challenge due to their non-differentiable and non-decomposable nature. We present an efficient, theoretically sound, and general method for differentiating rank-based metrics with mini-batch gradient descent. In addition, we address optimization instability and sparsity of the supervision signal that both arise from using rank-based metrics as optimization targets. Resulting losses based on recall and Average Precision are applied to image retrieval and object detection tasks. We obtain performance that is competitive with state-of-the-art on standard image retrieval datasets and consistently improve performance of near state-of-the-art object detectors. The code is available at https://github.com/martius-lab/blackbox-backprop
Tasks	Image Retrieval, Object Detection
Published	2019-12-07
URL	https://arxiv.org/abs/1912.03500v2
PDF	https://arxiv.org/pdf/1912.03500v2.pdf
PWC	https://paperswithcode.com/paper/optimizing-rank-based-metrics-with-blackbox
Repo
Framework

Hop: Heterogeneity-Aware Decentralized Training


Title	Hop: Heterogeneity-Aware Decentralized Training
Authors	Qinyi Luo, Jinkun Lin, Youwei Zhuo, Xuehai Qian
Abstract	Recent work has shown that decentralized algorithms can deliver superior performance over centralized ones in the context of machine learning. The two approaches, with the main difference residing in their distinct communication patterns, are both susceptible to performance degradation in heterogeneous environments. Although vigorous efforts have been devoted to supporting centralized algorithms against heterogeneity, little has been explored in decentralized algorithms regarding this problem. This paper proposes Hop, the first heterogeneity-aware decentralized training protocol. Based on a unique characteristic of decentralized training that we have identified, the iteration gap, we propose a queue-based synchronization mechanism that can efficiently implement backup workers and bounded staleness in the decentralized setting. To cope with deterministic slowdown, we propose skipping iterations so that the effect of slower workers is further mitigated. We build a prototype implementation of Hop on TensorFlow. The experiment results on CNN and SVM show significant speedup over standard decentralized training in heterogeneous settings.
Tasks
Published	2019-02-04
URL	http://arxiv.org/abs/1902.01064v2
PDF	http://arxiv.org/pdf/1902.01064v2.pdf
PWC	https://paperswithcode.com/paper/hop-heterogeneity-aware-decentralized
Repo
Framework

MIMA: MAPPER-Induced Manifold Alignment for Semi-Supervised Fusion of Optical Image and Polarimetric SAR Data


Title	MIMA: MAPPER-Induced Manifold Alignment for Semi-Supervised Fusion of Optical Image and Polarimetric SAR Data
Authors	Jingliang Hu, Danfeng Hong, Xiao Xiang Zhu
Abstract	Multi-modal data fusion has recently been shown promise in classification tasks in remote sensing. Optical data and radar data, two important yet intrinsically different data sources, are attracting more and more attention for potential data fusion. It is already widely known that, a machine learning based methodology often yields excellent performance. However, the methodology relies on a large training set, which is very expensive to achieve in remote sensing. The semi-supervised manifold alignment (SSMA), a multi-modal data fusion algorithm, has been designed to amplify the impact of an existing training set by linking labeled data to unlabeled data via unsupervised techniques. In this paper, we explore the potential of SSMA in fusing optical data and polarimetric SAR data, which are multi-sensory data sources. Furthermore, we propose a MAPPER-induced manifold alignment (MIMA) for semi-supervised fusion of multi-sensory data sources. Our proposed method unites SSMA with MAPPER, which is developed from the emerging topological data analysis (TDA) field. To our best knowledge, this is the first time that SSMA has been applied on fusing optical data and SAR data, and also the first time that TDA has been applied in remote sensing. The conventional SSMA derives a topological structure using k-nearest-neighbor (kNN), while MIMA employs MAPPER, which considers the field knowledge and derives a novel topological structure through the spectral clustering in a data-driven fashion. Experiment results on data fusion with respect to land cover land use classification and local climate zone classification suggest superior performance of MIMA.
Tasks	Topological Data Analysis
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05512v1
PDF	https://arxiv.org/pdf/1906.05512v1.pdf
PWC	https://paperswithcode.com/paper/mima-mapper-induced-manifold-alignment-for
Repo
Framework