February 2, 2020

3243 words 16 mins read

Paper Group AWR 71

Paper Group AWR 71

On the consistency of supervised learning with missing values. Enhancing Domain Word Embedding via Latent Semantic Imputation. Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization. BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels. Visualizing Deep Similarity Networks. Actor- …

On the consistency of supervised learning with missing values

Title On the consistency of supervised learning with missing values
Authors Julie Josse, Nicolas Prost, Erwan Scornet, Gaël Varoquaux
Abstract In many application settings, the data have missing features which make data analysis challenging. An abundant literature addresses missing data in an inferential framework: estimating parameters and their variance from incomplete tables. Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and testing data. We show the consistency of two approaches in prediction. A striking result is that the widely-used method of imputing with the mean prior to learning is consistent when missing values are not informative. This contrasts with inferential settings where mean imputation is pointed at for distorting the distribution of the data. That such a simple approach can be consistent is important in practice. We also show that a predictor suited for complete observations can predict optimally on incomplete data, through multiple imputation.We analyze further decision trees. These can naturally tackle empirical risk minimization with missing values, due to their ability to handle the half-discrete nature of incomplete variables. After comparing theoretically and empirically different missing values strategies in trees, we recommend using the ``missing incorporated in attribute’’ method as it can handle both non-informative and informative missing values. |
Tasks Imputation
Published 2019-02-19
URL http://arxiv.org/abs/1902.06931v2
PDF http://arxiv.org/pdf/1902.06931v2.pdf
PWC https://paperswithcode.com/paper/on-the-consistency-of-supervised-learning
Repo https://github.com/nprost/supervised_missing
Framework none

Enhancing Domain Word Embedding via Latent Semantic Imputation

Title Enhancing Domain Word Embedding via Latent Semantic Imputation
Authors Shibo Yao, Dantong Yu, Keli Xiao
Abstract We present a novel method named Latent Semantic Imputation (LSI) to transfer external knowledge into semantic space for enhancing word embedding. The method integrates graph theory to extract the latent manifold structure of the entities in the affinity space and leverages non-negative least squares with standard simplex constraints and power iteration method to derive spectral embeddings. It provides an effective and efficient approach to combining entity representations defined in different Euclidean spaces. Specifically, our approach generates and imputes reliable embedding vectors for low-frequency words in the semantic space and benefits downstream language tasks that depend on word embedding. We conduct comprehensive experiments on a carefully designed classification problem and language modeling and demonstrate the superiority of the enhanced embedding via LSI over several well-known benchmark embeddings. We also confirm the consistency of the results under different parameter settings of our method.
Tasks Imputation, Language Modelling
Published 2019-05-21
URL https://arxiv.org/abs/1905.08900v1
PDF https://arxiv.org/pdf/1905.08900v1.pdf
PWC https://paperswithcode.com/paper/enhancing-domain-word-embedding-via-latent
Repo https://github.com/ShiboYao/LatentSemanticImputation
Framework none

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

Title Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization
Authors Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter Abbeel, Kurt Keutzer, Ion Stoica, Joseph E. Gonzalez
Abstract We formalize the problem of trading-off DNN training time and memory requirements as the tensor rematerialization optimization problem, a generalization of prior checkpointing strategies. We introduce Checkmate, a system that solves for optimal rematerialization schedules in reasonable times (under an hour) using off-the-shelf MILP solvers or near-optimal schedules with an approximation algorithm, then uses these schedules to accelerate millions of training iterations. Our method scales to complex, realistic architectures and is hardware-aware through the use of accelerator-specific, profile-based cost models. In addition to reducing training cost, Checkmate enables real-world networks to be trained with up to 5.1x larger input sizes. Checkmate is an open-source project, available at https://github.com/parasj/checkmate.
Tasks
Published 2019-10-07
URL https://arxiv.org/abs/1910.02653v2
PDF https://arxiv.org/pdf/1910.02653v2.pdf
PWC https://paperswithcode.com/paper/checkmate-breaking-the-memory-wall-with
Repo https://github.com/parasj/checkmate
Framework tf

BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels

Title BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels
Authors Yimin Jing, Deyi Xiong, Yan Zhen
Abstract This paper presents BiPaR, a bilingual parallel novel-style machine reading comprehension (MRC) dataset, developed to support multilingual and cross-lingual reading comprehension. The biggest difference between BiPaR and existing reading comprehension datasets is that each triple (Passage, Question, Answer) in BiPaR is written parallelly in two languages. We collect 3,667 bilingual parallel paragraphs from Chinese and English novels, from which we construct 14,668 parallel question-answer pairs via crowdsourced workers following a strict quality control procedure. We analyze BiPaR in depth and find that BiPaR offers good diversification in prefixes of questions, answer types and relationships between questions and passages. We also observe that answering questions of novels requires reading comprehension skills of coreference resolution, multi-sentence reasoning, and understanding of implicit causality, etc. With BiPaR, we build monolingual, multilingual, and cross-lingual MRC baseline models. Even for the relatively simple monolingual MRC on this dataset, experiments show that a strong BERT baseline is over 30 points behind human in terms of both EM and F1 score, indicating that BiPaR provides a challenging testbed for monolingual, multilingual and cross-lingual MRC on novels. The dataset is available at https://multinlp.github.io/BiPaR/.
Tasks Coreference Resolution, Machine Reading Comprehension, Reading Comprehension
Published 2019-10-11
URL https://arxiv.org/abs/1910.05040v1
PDF https://arxiv.org/pdf/1910.05040v1.pdf
PWC https://paperswithcode.com/paper/bipar-a-bilingual-parallel-dataset-for
Repo https://github.com/sharejing/BiPaR
Framework none

Visualizing Deep Similarity Networks

Title Visualizing Deep Similarity Networks
Authors Abby Stylianou, Richard Souvenir, Robert Pless
Abstract For convolutional neural network models that optimize an image embedding, we propose a method to highlight the regions of images that contribute most to pairwise similarity. This work is a corollary to the visualization tools developed for classification networks, but applicable to the problem domains better suited to similarity learning. The visualization shows how similarity networks that are fine-tuned learn to focus on different features. We also generalize our approach to embedding networks that use different pooling strategies and provide a simple mechanism to support image similarity searches on objects or sub-regions in the query image.
Tasks
Published 2019-01-02
URL http://arxiv.org/abs/1901.00536v1
PDF http://arxiv.org/pdf/1901.00536v1.pdf
PWC https://paperswithcode.com/paper/visualizing-deep-similarity-networks
Repo https://github.com/GWUvision/Similarity-Visualization
Framework tf

Actor-Critic Instance Segmentation

Title Actor-Critic Instance Segmentation
Authors Nikita Araslanov, Constantin Rothkopf, Stefan Roth
Abstract Most approaches to visual scene analysis have emphasised parallel processing of the image elements. However, one area in which the sequential nature of vision is apparent, is that of segmenting multiple, potentially similar and partially occluded objects in a scene. In this work, we revisit the recurrent formulation of this challenging problem in the context of reinforcement learning. Motivated by the limitations of the global max-matching assignment of the ground-truth segments to the recurrent states, we develop an actor-critic approach in which the actor recurrently predicts one instance mask at a time and utilises the gradient from a concurrently trained critic network. We formulate the state, action, and the reward such as to let the critic model long-term effects of the current prediction and incorporate this information into the gradient signal. Furthermore, to enable effective exploration in the inherently high-dimensional action space of instance masks, we learn a compact representation using a conditional variational auto-encoder. We show that our actor-critic model consistently provides accuracy benefits over the recurrent baseline on standard instance segmentation benchmarks.
Tasks Instance Segmentation, Semantic Segmentation
Published 2019-04-10
URL http://arxiv.org/abs/1904.05126v1
PDF http://arxiv.org/pdf/1904.05126v1.pdf
PWC https://paperswithcode.com/paper/actor-critic-instance-segmentation
Repo https://github.com/visinf/acis
Framework pytorch

Hybrid Forest: A Concept Drift Aware Data Stream Mining Algorithm

Title Hybrid Forest: A Concept Drift Aware Data Stream Mining Algorithm
Authors Radin Hamidi Rad, Maryam Amir Haeri
Abstract Nowadays with a growing number of online controlling systems in the organization and also a high demand of monitoring and stats facilities that uses data streams to log and control their subsystems, data stream mining becomes more and more vital. Hoeffding Trees (also called Very Fast Decision Trees a.k.a. VFDT) as a Big Data approach in dealing with the data stream for classification and regression problems showed good performance in handling facing challenges and making the possibility of any-time prediction. Although these methods outperform other methods e.g. Artificial Neural Networks (ANN) and Support Vector Regression (SVR), they suffer from high latency in adapting with new concepts when the statistical distribution of incoming data changes. In this article, we introduced a new algorithm that can detect and handle concept drift phenomenon properly. This algorithms also benefits from fast startup ability which helps systems to be able to predict faster than other algorithms at the beginning of data stream arrival. We also have shown that our approach will overperform other controversial approaches for classification and regression tasks.
Tasks
Published 2019-02-10
URL http://arxiv.org/abs/1902.03609v1
PDF http://arxiv.org/pdf/1902.03609v1.pdf
PWC https://paperswithcode.com/paper/hybrid-forest-a-concept-drift-aware-data
Repo https://github.com/radinhamidi/Hybrid_Forest
Framework none

Online Detection of Sparse Changes in High-Dimensional Data Streams Using Tailored Projections

Title Online Detection of Sparse Changes in High-Dimensional Data Streams Using Tailored Projections
Authors Martin Tveten, Ingrid K. Glad
Abstract When applying principal component analysis (PCA) for dimension reduction, the most varying projections are usually used in order to retain most of the information. For the purpose of anomaly and change detection, however, the least varying projections are often the most important ones. In this article, we present a novel method that automatically tailors the choice of projections to monitor for sparse changes in the mean and/or covariance matrix of high-dimensional data. A subset of the least varying projections is almost always selected based on a criteria of the projection’s sensitivity to changes. Our focus is on online/sequential change detection, where the aim is to detect changes as quickly as possible, while controlling false alarms at a specified level. A combination of tailored PCA and a generalized log-likelihood monitoring procedure displays high efficiency in detecting even very sparse changes in the mean, variance and correlation. We demonstrate on real data that tailored PCA monitoring is efficient for sparse change detection also when the data streams are highly auto-correlated and non-normal. Notably, error control is achieved without a large validation set, which is needed in most existing methods.
Tasks Dimensionality Reduction
Published 2019-08-06
URL https://arxiv.org/abs/1908.02029v1
PDF https://arxiv.org/pdf/1908.02029v1.pdf
PWC https://paperswithcode.com/paper/online-detection-of-sparse-changes-in-high
Repo https://github.com/Tveten/tdpcaTEP
Framework none

Relative Attributing Propagation: Interpreting the Comparative Contributions of Individual Units in Deep Neural Networks

Title Relative Attributing Propagation: Interpreting the Comparative Contributions of Individual Units in Deep Neural Networks
Authors Woo-Jeoung Nam, Shir Gur, Jaesik Choi, Lior Wolf, Seong-Whan Lee
Abstract As Deep Neural Networks (DNNs) have demonstrated superhuman performance in a variety of fields, there is an increasing interest in understanding the complex internal mechanisms of DNNs. In this paper, we propose Relative Attributing Propagation (RAP), which decomposes the output predictions of DNNs with a new perspective of separating the relevant (positive) and irrelevant (negative) attributions according to the relative influence between the layers. The relevance of each neuron is identified with respect to its degree of contribution, separated into positive and negative, while preserving the conservation rule. Considering the relevance assigned to neurons in terms of relative priority, RAP allows each neuron to be assigned with a bi-polar importance score concerning the output: from highly relevant to highly irrelevant. Therefore, our method makes it possible to interpret DNNs with much clearer and attentive visualizations of the separated attributions than the conventional explaining methods. To verify that the attributions propagated by RAP correctly account for each meaning, we utilize the evaluation metrics: (i) Outside-inside relevance ratio, (ii) Segmentation mIOU and (iii) Region perturbation. In all experiments and metrics, we present a sizable gap in comparison to the existing literature. Our source code is available in \url{https://github.com/wjNam/Relative_Attributing_Propagation}.
Tasks
Published 2019-04-01
URL https://arxiv.org/abs/1904.00605v4
PDF https://arxiv.org/pdf/1904.00605v4.pdf
PWC https://paperswithcode.com/paper/relative-attributing-propagation-interpreting
Repo https://github.com/wjNam/Relative_Attributing_Propagation
Framework tf

Instance Segmentation of Biological Images Using Harmonic Embeddings

Title Instance Segmentation of Biological Images Using Harmonic Embeddings
Authors Victor Kulikov, Victor Lempitsky
Abstract We present a new instance segmentation approach tailored to biological images, where instances may correspond to individual cells, organisms or plant parts. Unlike instance segmentation for user photographs or road scenes, in biological data object instances may be particularly densely packed, the appearance variation may be particularly low, the processing power may be restricted, while, on the other hand, the variability of sizes of individual instances may be limited. These peculiarities are successfully addressed and exploited by the proposed approach. Our approach describes each object instance using an expectation of a limited number of sine waves with frequencies and phases adjusted to particular object sizes and densities. At train time, a fully-convolutional network is learned to predict the object embeddings at each pixel using a simple pixelwise regression loss, while at test time the instances are recovered using clustering in the embeddings space. In the experiments, we show that our approach outperforms previous embedding-based instance segmentation approaches on a number of biological datasets, achieving state-of-the-art on a popular CVPPP benchmark. Notably, this excellent performance is combined with computational efficiency that is needed for deployment to domain specialists. The source code is publicly available at Github: https://github.com/kulikovv/harmonic
Tasks Instance Segmentation, Semantic Segmentation
Published 2019-04-10
URL http://arxiv.org/abs/1904.05257v1
PDF http://arxiv.org/pdf/1904.05257v1.pdf
PWC https://paperswithcode.com/paper/instance-segmentation-of-biological-images
Repo https://github.com/kulikovv/harmonic
Framework pytorch

Timage – A Robust Time Series Classification Pipeline

Title Timage – A Robust Time Series Classification Pipeline
Authors Marc Wenninger, Sebastian P. Bayerl, Jochen Schmidt, Korbinian Riedhammer
Abstract Time series are series of values ordered by time. This kind of data can be found in many real world settings. Classifying time series is a difficult task and an active area of research. This paper investigates the use of transfer learning in Deep Neural Networks and a 2D representation of time series known as Recurrence Plots. In order to utilize the research done in the area of image classification, where Deep Neural Networks have achieved very good results, we use a Residual Neural Networks architecture known as ResNet. As preprocessing of time series is a major part of every time series classification pipeline, the method proposed simplifies this step and requires only few parameters. For the first time we propose a method for multi time series classification: Training a single network to classify all datasets in the archive with one network. We are among the first to evaluate the method on the latest 2018 release of the UCR archive, a well established time series classification benchmarking dataset.
Tasks Image Classification, Time Series, Time Series Classification, Transfer Learning
Published 2019-09-19
URL https://arxiv.org/abs/1909.09149v1
PDF https://arxiv.org/pdf/1909.09149v1.pdf
PWC https://paperswithcode.com/paper/timage-a-robust-time-series-classification
Repo https://github.com/patientzero/timage-icann2019
Framework none

CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases

Title CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
Authors Tao Yu, Rui Zhang, He Yang Er, Suyi Li, Eric Xue, Bo Pang, Xi Victoria Lin, Yi Chern Tan, Tianze Shi, Zihan Li, Youxuan Jiang, Michihiro Yasunaga, Sungrok Shim, Tao Chen, Alexander Fabbri, Zifan Li, Luyao Chen, Yuwen Zhang, Shreya Dixit, Vincent Zhang, Caiming Xiong, Richard Socher, Walter S Lasecki, Dragomir Radev
Abstract We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems. It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions. When user questions are answerable by SQL, the expert describes the SQL and execution results to the user, hence maintaining a natural interaction flow. CoSQL introduces new challenges compared to existing task-oriented dialogue datasets:(1) the dialogue states are grounded in SQL, a domain-independent executable representation, instead of domain-specific slot-value pairs, and (2) because testing is done on unseen databases, success requires generalizing to new domains. CoSQL includes three tasks: SQL-grounded dialogue state tracking, response generation from query results, and user dialogue act prediction. We evaluate a set of strong baselines for each task and show that CoSQL presents significant challenges for future research. The dataset, baselines, and leaderboard will be released at https://yale-lily.github.io/cosql.
Tasks Dialogue State Tracking, Text-To-Sql
Published 2019-09-11
URL https://arxiv.org/abs/1909.05378v1
PDF https://arxiv.org/pdf/1909.05378v1.pdf
PWC https://paperswithcode.com/paper/cosql-a-conversational-text-to-sql-challenge
Repo https://github.com/ryanzhumich/sparc_atis_pytorch
Framework pytorch

Scalable and Accurate Dialogue State Tracking via Hierarchical Sequence Generation

Title Scalable and Accurate Dialogue State Tracking via Hierarchical Sequence Generation
Authors Liliang Ren, Jianmo Ni, Julian McAuley
Abstract Existing approaches to dialogue state tracking rely on pre-defined ontologies consisting of a set of all possible slot types and values. Though such approaches exhibit promising performance on single-domain benchmarks, they suffer from computational complexity that increases proportionally to the number of pre-defined slots that need tracking. This issue becomes more severe when it comes to multi-domain dialogues which include larger numbers of slots. In this paper, we investigate how to approach DST using a generation framework without the pre-defined ontology list. Given each turn of user utterance and system response, we directly generate a sequence of belief states by applying a hierarchical encoder-decoder structure. In this way, the computational complexity of our model will be a constant regardless of the number of pre-defined slots. Experiments on both the multi-domain and the single domain dialogue state tracking dataset show that our model not only scales easily with the increasing number of pre-defined domains and slots but also reaches the state-of-the-art performance.
Tasks Dialogue State Tracking
Published 2019-09-02
URL https://arxiv.org/abs/1909.00754v2
PDF https://arxiv.org/pdf/1909.00754v2.pdf
PWC https://paperswithcode.com/paper/scalable-and-accurate-dialogue-state-tracking
Repo https://github.com/renll/ComerNet
Framework pytorch

End-to-end Learning for Early Classification of Time Series

Title End-to-end Learning for Early Classification of Time Series
Authors Marc Rußwurm, Sébastien Lefèvre, Nicolas Courty, Rémi Emonet, Marco Körner, Romain Tavenard
Abstract Classification of time series is a topical issue in machine learning. While accuracy stands for the most important evaluation criterion, some applications require decisions to be made as early as possible. Optimization should then target a compromise between earliness, i.e., a capacity of providing a decision early in the sequence, and accuracy. In this work, we propose a generic, end-to-end trainable framework for early classification of time series. This framework embeds a learnable decision mechanism that can be plugged into a wide range of already existing models. We present results obtained with deep neural networks on a diverse set of time series classification problems. Our approach compares well to state-of-the-art competitors while being easily adaptable by any existing neural network topology that evaluates a hidden state at each time step.
Tasks Time Series, Time Series Classification
Published 2019-01-30
URL http://arxiv.org/abs/1901.10681v1
PDF http://arxiv.org/pdf/1901.10681v1.pdf
PWC https://paperswithcode.com/paper/end-to-end-learning-for-early-classification
Repo https://github.com/rtavenar/early_rnn
Framework pytorch

Batch Active Learning Using Determinantal Point Processes

Title Batch Active Learning Using Determinantal Point Processes
Authors Erdem Bıyık, Kenneth Wang, Nima Anari, Dorsa Sadigh
Abstract Data collection and labeling is one of the main challenges in employing machine learning algorithms in a variety of real-world applications with limited data. While active learning methods attempt to tackle this issue by labeling only the data samples that give high information, they generally suffer from large computational costs and are impractical in settings where data can be collected in parallel. Batch active learning methods attempt to overcome this computational burden by querying batches of samples at a time. To avoid redundancy between samples, previous works rely on some ad hoc combination of sample quality and diversity. In this paper, we present a new principled batch active learning method using Determinantal Point Processes, a repulsive point process that enables generating diverse batches of samples. We develop tractable algorithms to approximate the mode of a DPP distribution, and provide theoretical guarantees on the degree of approximation. We further demonstrate that an iterative greedy method for DPP maximization, which has lower computational costs but worse theoretical guarantees, still gives competitive results for batch active learning. Our experiments show the value of our methods on several datasets against state-of-the-art baselines.
Tasks Active Learning, Point Processes
Published 2019-06-19
URL https://arxiv.org/abs/1906.07975v1
PDF https://arxiv.org/pdf/1906.07975v1.pdf
PWC https://paperswithcode.com/paper/batch-active-learning-using-determinantal
Repo https://github.com/Stanford-ILIAD/DPP-Batch-Active-Learning
Framework none
comments powered by Disqus