Paper Group ANR 669
On Architectures for Including Visual Information in Neural Language Models for Image Description. Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation. End-to-End Deep Neural Networks and Transfer Learning for Automatic Analysis of Nation-State Malware. A Regularized Convolutional Neural Network for Semantic Image Se …
On Architectures for Including Visual Information in Neural Language Models for Image Description
Title | On Architectures for Including Visual Information in Neural Language Models for Image Description |
Authors | Marc Tanti, Albert Gatt, Kenneth P. Camilleri |
Abstract | A neural language model can be conditioned into generating descriptions for images by providing visual information apart from the sentence prefix. This visual information can be included into the language model through different points of entry resulting in different neural architectures. We identify four main architectures which we call init-inject, pre-inject, par-inject, and merge. We analyse these four architectures and conclude that the best performing one is init-inject, which is when the visual information is injected into the initial state of the recurrent neural network. We confirm this using both automatic evaluation measures and human annotation. We then analyse how much influence the images have on each architecture. This is done by measuring how different the output probabilities of a model are when a partial sentence is combined with a completely different image from the one it is meant to be combined with. We find that init-inject tends to quickly become less influenced by the image as more words are generated. A different architecture called merge, which is when the visual information is merged with the recurrent neural network’s hidden state vector prior to output, loses visual influence much more slowly, suggesting that it would work better for generating longer sentences. We also observe that the merge architecture can have its recurrent neural network pre-trained in a text-only language model (transfer learning) rather than be initialised randomly as usual. This results in even better performance than the other architectures, provided that the source language model is not too good at language modelling or it will overspecialise and be less effective at image description generation. Our work opens up new avenues of research in neural architectures, explainable AI, and transfer learning. |
Tasks | Language Modelling, Transfer Learning |
Published | 2019-11-09 |
URL | https://arxiv.org/abs/1911.03738v1 |
https://arxiv.org/pdf/1911.03738v1.pdf | |
PWC | https://paperswithcode.com/paper/on-architectures-for-including-visual |
Repo | |
Framework | |
Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation
Title | Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation |
Authors | Ruohan Wang, Carlo Ciliberto, Pierluigi Amadori, Yiannis Demiris |
Abstract | We consider the problem of imitation learning from a finite set of expert trajectories, without access to reinforcement signals. The classical approach of extracting the expert’s reward function via inverse reinforcement learning, followed by reinforcement learning is indirect and may be computationally expensive. Recent generative adversarial methods based on matching the policy distribution between the expert and the agent could be unstable during training. We propose a new framework for imitation learning by estimating the support of the expert policy to compute a fixed reward function, which allows us to re-frame imitation learning within the standard reinforcement learning setting. We demonstrate the efficacy of our reward function on both discrete and continuous domains, achieving comparable or better performance than the state of the art under different reinforcement learning algorithms. |
Tasks | Imitation Learning |
Published | 2019-05-16 |
URL | https://arxiv.org/abs/1905.06750v2 |
https://arxiv.org/pdf/1905.06750v2.pdf | |
PWC | https://paperswithcode.com/paper/random-expert-distillation-imitation-learning |
Repo | |
Framework | |
End-to-End Deep Neural Networks and Transfer Learning for Automatic Analysis of Nation-State Malware
Title | End-to-End Deep Neural Networks and Transfer Learning for Automatic Analysis of Nation-State Malware |
Authors | Ishai Rosenberg, Guillaume Sicard, Eli David |
Abstract | Malware allegedly developed by nation-states, also known as advanced persistent threats (APT), are becoming more common. The task of attributing an APT to a specific nation-state or classifying it to the correct APT family is challenging for several reasons. First, each nation-state has more than a single cyber unit that develops such malware, rendering traditional authorship attribution algorithms useless. Furthermore, the dataset of such available APTs is still extremely small. Finally, those APTs use state-of-the-art evasion techniques, making feature extraction challenging. In this paper, we use a deep neural network (DNN) as a classifier for nation-state APT attribution. We record the dynamic behavior of the APT when run in a sandbox and use it as raw input for the neural network, allowing the DNN to learn high level feature abstractions of the APTs itself. We also use the same raw features for APT family classification. Finally, we use the feature abstractions learned by the APT family classifier to solve the attribution problem. Using a test set of 1000 Chinese and Russian developed APTs, we achieved an accuracy rate of 98.6%. |
Tasks | Transfer Learning |
Published | 2019-11-30 |
URL | https://arxiv.org/abs/1912.01493v1 |
https://arxiv.org/pdf/1912.01493v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-deep-neural-networks-and-transfer |
Repo | |
Framework | |
A Regularized Convolutional Neural Network for Semantic Image Segmentation
Title | A Regularized Convolutional Neural Network for Semantic Image Segmentation |
Authors | Fan Jia, Jun Liu, Xue-cheng Tai |
Abstract | Convolutional neural networks (CNNs) show outstanding performance in many image processing problems, such as image recognition, object detection and image segmentation. Semantic segmentation is a very challenging task that requires recognizing, understanding what’s in the image in pixel level. Though the state of the art has been greatly improved by CNNs, there is no explicit connections between prediction of neighbouring pixels. That is, spatial regularity of the segmented objects is still a problem for CNNs. In this paper, we propose a method to add spatial regularization to the segmented objects. In our method, the spatial regularization such as total variation (TV) can be easily integrated into CNN network. It can help CNN find a better local optimum and make the segmentation results more robust to noise. We apply our proposed method to Unet and Segnet, which are well established CNNs for image segmentation, and test them on WBC, CamVid and SUN-RGBD datasets, respectively. The results show that the regularized networks not only could provide better segmentation results with regularization effect than the original ones but also have certain robustness to noise. |
Tasks | Object Detection, Semantic Segmentation |
Published | 2019-06-28 |
URL | https://arxiv.org/abs/1907.05287v1 |
https://arxiv.org/pdf/1907.05287v1.pdf | |
PWC | https://paperswithcode.com/paper/a-regularized-convolutional-neural-network |
Repo | |
Framework | |
The Medical Deconfounder: Assessing Treatment Effects with Electronic Health Records
Title | The Medical Deconfounder: Assessing Treatment Effects with Electronic Health Records |
Authors | Linying Zhang, Yixin Wang, Anna Ostropolets, Jami J. Mulgrave, David M. Blei, George Hripcsak |
Abstract | The treatment effects of medications play a key role in guiding medical prescriptions. They are usually assessed with randomized controlled trials (RCTs), which are expensive. Recently, large-scale electronic health records (EHRs) have become available, opening up new opportunities for more cost-effective assessments. However, assessing a treatment effect from EHRs is challenging: it is biased by unobserved confounders, unmeasured variables that affect both patients’ medical prescription and their outcome, e.g. the patients’ social economic status. To adjust for unobserved confounders, we develop the medical deconfounder, a machine learning algorithm that unbiasedly estimates treatment effects from EHRs. The medical deconfounder first constructs a substitute confounder by modeling which medications were prescribed to each patient; this substitute confounder is guaranteed to capture all multi-medication confounders, observed or unobserved (arXiv:1805.06826). It then uses this substitute confounder to adjust for the confounding bias in the analysis. We validate the medical deconfounder on two simulated and two real medical data sets. Compared to classical approaches, the medical deconfounder produces closer-to-truth treatment effect estimates; it also identifies effective medications that are more consistent with the findings in the medical literature. |
Tasks | |
Published | 2019-04-03 |
URL | https://arxiv.org/abs/1904.02098v2 |
https://arxiv.org/pdf/1904.02098v2.pdf | |
PWC | https://paperswithcode.com/paper/the-medical-deconfounder-assessing-treatment |
Repo | |
Framework | |
Applying a Pre-trained Language Model to Spanish Twitter Humor Prediction
Title | Applying a Pre-trained Language Model to Spanish Twitter Humor Prediction |
Authors | Bobak Farzin, Piotr Czapla, Jeremy Howard |
Abstract | Our entry into the HAHA 2019 Challenge placed $3^{rd}$ in the classification task and $2^{nd}$ in the regression task. We describe our system and innovations, as well as comparing our results to a Naive Bayes baseline. A large Twitter based corpus allowed us to train a language model from scratch focused on Spanish and transfer that knowledge to our competition model. To overcome the inherent errors in some labels we reduce our class confidence with label smoothing in the loss function. All the code for our project is included in a GitHub repository for easy reference and to enable replication by others. |
Tasks | Language Modelling |
Published | 2019-07-06 |
URL | https://arxiv.org/abs/1907.03187v1 |
https://arxiv.org/pdf/1907.03187v1.pdf | |
PWC | https://paperswithcode.com/paper/applying-a-pre-trained-language-model-to |
Repo | |
Framework | |
Convergence of Edge Computing and Deep Learning: A Comprehensive Survey
Title | Convergence of Edge Computing and Deep Learning: A Comprehensive Survey |
Authors | Xiaofei Wang, Yiwen Han, Victor C. M. Leung, Dusit Niyato, Xueqiang Yan, Xu Chen |
Abstract | Ubiquitous sensors and smart devices from factories and communities are generating massive amounts of data, and ever-increasing computing power is driving the core of computation and services from the cloud to the edge of the network. As an important enabler broadly changing people’s lives, from face recognition to ambitious smart factories and cities, developments of artificial intelligence (especially deep learning, DL) based applications and services are thriving. However, due to efficiency and latency issues, the current cloud computing service architecture hinders the vision of “providing artificial intelligence for every person and every organization at everywhere”. Thus, unleashing DL services using resources at the network edge near the data sources has emerged as a desirable solution. Therefore, edge intelligence, aiming to facilitate the deployment of DL services by edge computing, has received significant attention. In addition, DL, as the representative technique of artificial intelligence, can be integrated into edge computing frameworks to build intelligent edge for dynamic, adaptive edge maintenance and management. With regard to mutually beneficial edge intelligence and intelligent edge, this paper introduces and discusses: 1) the application scenarios of both; 2) the practical implementation methods and enabling technologies, namely DL training and inference in the customized edge computing framework; 3) challenges and future trends of more pervasive and fine-grained intelligence. We believe that by consolidating information scattered across the communication, networking, and DL areas, this survey can help readers to understand the connections between enabling technologies while promoting further discussions on the fusion of edge intelligence and intelligent edge, i.e., Edge DL. |
Tasks | Face Recognition |
Published | 2019-07-19 |
URL | https://arxiv.org/abs/1907.08349v3 |
https://arxiv.org/pdf/1907.08349v3.pdf | |
PWC | https://paperswithcode.com/paper/convergence-of-edge-computing-and-deep |
Repo | |
Framework | |
Efficient Truncated Statistics with Unknown Truncation
Title | Efficient Truncated Statistics with Unknown Truncation |
Authors | Vasilis Kontonis, Christos Tzamos, Manolis Zampetakis |
Abstract | We study the problem of estimating the parameters of a Gaussian distribution when samples are only shown if they fall in some (unknown) subset $S \subseteq \R^d$. This core problem in truncated statistics has long history going back to Galton, Lee, Pearson and Fisher. Recent work by Daskalakis et al. (FOCS’18), provides the first efficient algorithm that works for arbitrary sets in high dimension when the set is known, but leaves as an open problem the more challenging and relevant case of unknown truncation set. Our main result is a computationally and sample efficient algorithm for estimating the parameters of the Gaussian under arbitrary unknown truncation sets whose performance decays with a natural measure of complexity of the set, namely its Gaussian surface area. Notably, this algorithm works for large families of sets including intersections of halfspaces, polynomial threshold functions and general convex sets. We show that our algorithm closely captures the tradeoff between the complexity of the set and the number of samples needed to learn the parameters by exhibiting a set with small Gaussian surface area for which it is information theoretically impossible to learn the true Gaussian with few samples. |
Tasks | |
Published | 2019-08-02 |
URL | https://arxiv.org/abs/1908.01034v1 |
https://arxiv.org/pdf/1908.01034v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-truncated-statistics-with-unknown |
Repo | |
Framework | |
Persistence B-Spline Grids: Stable Vector Representation of Persistence Diagrams Based on Data Fitting
Title | Persistence B-Spline Grids: Stable Vector Representation of Persistence Diagrams Based on Data Fitting |
Authors | Zhetong Dong, Hongwei Lin, Chi Zhou |
Abstract | Over the last decades, many attempts have been made to optimally integrate machine learning (ML) and topological data analysis. A prominent problem in applying persistent homology to ML tasks is finding a vector representation of a persistence diagram (PD), which is a summary diagram for representing topological features. From the perspective of data fitting, a stable vector representation, persistence B-spline grid (PB), is proposed based on the efficient technique of progressive-iterative approximation for least-squares B-spline surface fitting. Meanwhile, we theoretically prove that the PB method is stable with respect to the metrics defined on the PD space, i.e., the $p$-Wasserstein distance and the bottleneck distance. The proposed method was tested on a synthetic dataset, datasets of randomly generated PDs, data of a dynamical system, and 3D CAD models. |
Tasks | Topological Data Analysis |
Published | 2019-09-17 |
URL | https://arxiv.org/abs/1909.08417v1 |
https://arxiv.org/pdf/1909.08417v1.pdf | |
PWC | https://paperswithcode.com/paper/persistence-b-spline-grids-stable-vector |
Repo | |
Framework | |
DeepCopy: Grounded Response Generation with Hierarchical Pointer Networks
Title | DeepCopy: Grounded Response Generation with Hierarchical Pointer Networks |
Authors | Semih Yavuz, Abhinav Rastogi, Guan-Lin Chao, Dilek Hakkani-Tur |
Abstract | Recent advances in neural sequence-to-sequence models have led to promising results for several language generation-based tasks, including dialogue response generation, summarization, and machine translation. However, these models are known to have several problems, especially in the context of chit-chat based dialogue systems: they tend to generate short and dull responses that are often too generic. Furthermore, these models do not ground conversational responses on knowledge and facts, resulting in turns that are not accurate, informative and engaging for the users. In this paper, we propose and experiment with a series of response generation models that aim to serve in the general scenario where in addition to the dialogue context, relevant unstructured external knowledge in the form of text is also assumed to be available for models to harness. Our proposed approach extends pointer-generator networks (See et al., 2017) by allowing the decoder to hierarchically attend and copy from external knowledge in addition to the dialogue context. We empirically show the effectiveness of the proposed model compared to several baselines including (Ghazvininejad et al., 2018; Zhang et al., 2018) through both automatic evaluation metrics and human evaluation on CONVAI2 dataset. |
Tasks | Machine Translation, Text Generation |
Published | 2019-08-28 |
URL | https://arxiv.org/abs/1908.10731v1 |
https://arxiv.org/pdf/1908.10731v1.pdf | |
PWC | https://paperswithcode.com/paper/deepcopy-grounded-response-generation-with |
Repo | |
Framework | |
Clustering Activity-Travel Behavior Time Series using Topological Data Analysis
Title | Clustering Activity-Travel Behavior Time Series using Topological Data Analysis |
Authors | Renjie Chen, Jingyue Zhang, Nalini Ravishanker, Karthik Konduri |
Abstract | Over the last few years, traffic data has been exploding and the transportation discipline has entered the era of big data. It brings out new opportunities for doing data-driven analysis, but it also challenges traditional analytic methods. This paper proposes a new Divide and Combine based approach to do K means clustering on activity-travel behavior time series using features that are derived using tools in Time Series Analysis and Topological Data Analysis. Clustering data from five waves of the National Household Travel Survey ranging from 1990 to 2017 suggests that activity-travel patterns of individuals over the last three decades can be grouped into three clusters. Results also provide evidence in support of recent claims about differences in activity-travel patterns of different survey cohorts. The proposed method is generally applicable and is not limited only to activity-travel behavior analysis in transportation studies. Driving behavior, travel mode choice, household vehicle ownership, when being characterized as categorical time series, can all be analyzed using the proposed method. |
Tasks | Time Series, Time Series Analysis, Topological Data Analysis |
Published | 2019-07-17 |
URL | https://arxiv.org/abs/1907.07603v1 |
https://arxiv.org/pdf/1907.07603v1.pdf | |
PWC | https://paperswithcode.com/paper/clustering-activity-travel-behavior-time |
Repo | |
Framework | |
Artificially Evolved Chunks for Morphosyntactic Analysis
Title | Artificially Evolved Chunks for Morphosyntactic Analysis |
Authors | Mark Anderson, David Vilares, Carlos Gómez-Rodríguez |
Abstract | We introduce a language-agnostic evolutionary technique for automatically extracting chunks from dependency treebanks. We evaluate these chunks on a number of morphosyntactic tasks, namely POS tagging, morphological feature tagging, and dependency parsing. We test the utility of these chunks in a host of different ways. We first learn chunking as one task in a shared multi-task framework together with POS and morphological feature tagging. The predictions from this network are then used as input to augment sequence-labelling dependency parsing. Finally, we investigate the impact chunks have on dependency parsing in a multi-task framework. Our results from these analyses show that these chunks improve performance at different levels of syntactic abstraction on English UD treebanks and a small, diverse subset of non-English UD treebanks. |
Tasks | Chunking, Dependency Parsing |
Published | 2019-08-09 |
URL | https://arxiv.org/abs/1908.03480v2 |
https://arxiv.org/pdf/1908.03480v2.pdf | |
PWC | https://paperswithcode.com/paper/artificially-evolved-chunks-for |
Repo | |
Framework | |
Optimizing Rank-based Metrics with Blackbox Differentiation
Title | Optimizing Rank-based Metrics with Blackbox Differentiation |
Authors | Michal Rolínek, Vít Musil, Anselm Paulus, Marin Vlastelica, Claudio Michaelis, Georg Martius |
Abstract | Rank-based metrics are some of the most widely used criteria for performance evaluation of computer vision models. Despite years of effort, direct optimization for these metrics remains a challenge due to their non-differentiable and non-decomposable nature. We present an efficient, theoretically sound, and general method for differentiating rank-based metrics with mini-batch gradient descent. In addition, we address optimization instability and sparsity of the supervision signal that both arise from using rank-based metrics as optimization targets. Resulting losses based on recall and Average Precision are applied to image retrieval and object detection tasks. We obtain performance that is competitive with state-of-the-art on standard image retrieval datasets and consistently improve performance of near state-of-the-art object detectors. The code is available at https://github.com/martius-lab/blackbox-backprop |
Tasks | Image Retrieval, Object Detection |
Published | 2019-12-07 |
URL | https://arxiv.org/abs/1912.03500v2 |
https://arxiv.org/pdf/1912.03500v2.pdf | |
PWC | https://paperswithcode.com/paper/optimizing-rank-based-metrics-with-blackbox |
Repo | |
Framework | |
Hop: Heterogeneity-Aware Decentralized Training
Title | Hop: Heterogeneity-Aware Decentralized Training |
Authors | Qinyi Luo, Jinkun Lin, Youwei Zhuo, Xuehai Qian |
Abstract | Recent work has shown that decentralized algorithms can deliver superior performance over centralized ones in the context of machine learning. The two approaches, with the main difference residing in their distinct communication patterns, are both susceptible to performance degradation in heterogeneous environments. Although vigorous efforts have been devoted to supporting centralized algorithms against heterogeneity, little has been explored in decentralized algorithms regarding this problem. This paper proposes Hop, the first heterogeneity-aware decentralized training protocol. Based on a unique characteristic of decentralized training that we have identified, the iteration gap, we propose a queue-based synchronization mechanism that can efficiently implement backup workers and bounded staleness in the decentralized setting. To cope with deterministic slowdown, we propose skipping iterations so that the effect of slower workers is further mitigated. We build a prototype implementation of Hop on TensorFlow. The experiment results on CNN and SVM show significant speedup over standard decentralized training in heterogeneous settings. |
Tasks | |
Published | 2019-02-04 |
URL | http://arxiv.org/abs/1902.01064v2 |
http://arxiv.org/pdf/1902.01064v2.pdf | |
PWC | https://paperswithcode.com/paper/hop-heterogeneity-aware-decentralized |
Repo | |
Framework | |
MIMA: MAPPER-Induced Manifold Alignment for Semi-Supervised Fusion of Optical Image and Polarimetric SAR Data
Title | MIMA: MAPPER-Induced Manifold Alignment for Semi-Supervised Fusion of Optical Image and Polarimetric SAR Data |
Authors | Jingliang Hu, Danfeng Hong, Xiao Xiang Zhu |
Abstract | Multi-modal data fusion has recently been shown promise in classification tasks in remote sensing. Optical data and radar data, two important yet intrinsically different data sources, are attracting more and more attention for potential data fusion. It is already widely known that, a machine learning based methodology often yields excellent performance. However, the methodology relies on a large training set, which is very expensive to achieve in remote sensing. The semi-supervised manifold alignment (SSMA), a multi-modal data fusion algorithm, has been designed to amplify the impact of an existing training set by linking labeled data to unlabeled data via unsupervised techniques. In this paper, we explore the potential of SSMA in fusing optical data and polarimetric SAR data, which are multi-sensory data sources. Furthermore, we propose a MAPPER-induced manifold alignment (MIMA) for semi-supervised fusion of multi-sensory data sources. Our proposed method unites SSMA with MAPPER, which is developed from the emerging topological data analysis (TDA) field. To our best knowledge, this is the first time that SSMA has been applied on fusing optical data and SAR data, and also the first time that TDA has been applied in remote sensing. The conventional SSMA derives a topological structure using k-nearest-neighbor (kNN), while MIMA employs MAPPER, which considers the field knowledge and derives a novel topological structure through the spectral clustering in a data-driven fashion. Experiment results on data fusion with respect to land cover land use classification and local climate zone classification suggest superior performance of MIMA. |
Tasks | Topological Data Analysis |
Published | 2019-06-13 |
URL | https://arxiv.org/abs/1906.05512v1 |
https://arxiv.org/pdf/1906.05512v1.pdf | |
PWC | https://paperswithcode.com/paper/mima-mapper-induced-manifold-alignment-for |
Repo | |
Framework | |