October 20, 2019

2798 words 14 mins read

Paper Group ANR 33

Normalized Cut Loss for Weakly-supervised CNN Segmentation. Natural Language Processing for Music Knowledge Discovery. Syntax-Aware Language Modeling with Recurrent Neural Networks. Channel Charting: Locating Users within the Radio Environment using Channel State Information. Heteroskedastic PCA: Algorithm, Optimality, and Applications. Notes on Ab …

Normalized Cut Loss for Weakly-supervised CNN Segmentation


Title	Normalized Cut Loss for Weakly-supervised CNN Segmentation
Authors	Meng Tang, Abdelaziz Djelouah, Federico Perazzi, Yuri Boykov, Christopher Schroers
Abstract	Most recent semantic segmentation methods train deep convolutional neural networks with fully annotated masks requiring pixel-accuracy for good quality training. Common weakly-supervised approaches generate full masks from partial input (e.g. scribbles or seeds) using standard interactive segmentation methods as preprocessing. But, errors in such masks result in poorer training since standard loss functions (e.g. cross-entropy) do not distinguish seeds from potentially mislabeled other pixels. Inspired by the general ideas in semi-supervised learning, we address these problems via a new principled loss function evaluating network output with criteria standard in “shallow” segmentation, e.g. normalized cut. Unlike prior work, the cross entropy part of our loss evaluates only seeds where labels are known while normalized cut softly evaluates consistency of all pixels. We focus on normalized cut loss where dense Gaussian kernel is efficiently implemented in linear time by fast Bilateral filtering. Our normalized cut loss approach to segmentation brings the quality of weakly-supervised training significantly closer to fully supervised methods.
Tasks	Interactive Segmentation, Semantic Segmentation
Published	2018-04-04
URL	http://arxiv.org/abs/1804.01346v1
PDF	http://arxiv.org/pdf/1804.01346v1.pdf
PWC	https://paperswithcode.com/paper/normalized-cut-loss-for-weakly-supervised-cnn
Repo
Framework

Natural Language Processing for Music Knowledge Discovery


Title	Natural Language Processing for Music Knowledge Discovery
Authors	Sergio Oramas, Luis Espinosa-Anke, Francisco Gómez, Xavier Serra
Abstract	Today, a massive amount of musical knowledge is stored in written form, with testimonies dated as far back as several centuries ago. In this work, we present different Natural Language Processing (NLP) approaches to harness the potential of these text collections for automatic music knowledge discovery, covering different phases in a prototypical NLP pipeline, namely corpus compilation, text-mining, information extraction, knowledge graph generation and sentiment analysis. Each of these approaches is presented alongside different use cases (i.e., flamenco, Renaissance and popular music) where large collections of documents are processed, and conclusions stemming from data-driven analyses are presented and discussed.
Tasks	Graph Generation, Sentiment Analysis
Published	2018-07-06
URL	http://arxiv.org/abs/1807.02200v1
PDF	http://arxiv.org/pdf/1807.02200v1.pdf
PWC	https://paperswithcode.com/paper/natural-language-processing-for-music
Repo
Framework

Syntax-Aware Language Modeling with Recurrent Neural Networks


Title	Syntax-Aware Language Modeling with Recurrent Neural Networks
Authors	Duncan Blythe, Alan Akbik, Roland Vollgraf
Abstract	Neural language models (LMs) are typically trained using only lexical features, such as surface forms of words. In this paper, we argue this deprives the LM of crucial syntactic signals that can be detected at high confidence using existing parsers. We present a simple but highly effective approach for training neural LMs using both lexical and syntactic information, and a novel approach for applying such LMs to unparsed text using sequential Monte Carlo sampling. In experiments on a range of corpora and corpus sizes, we show our approach consistently outperforms standard lexical LMs in character-level language modeling; on the other hand, for word-level models the models are on a par with standard language models. These results indicate potential for expanding LMs beyond lexical surface features to higher-level NLP features for character-level models.
Tasks	Language Modelling
Published	2018-03-02
URL	http://arxiv.org/abs/1803.03665v1
PDF	http://arxiv.org/pdf/1803.03665v1.pdf
PWC	https://paperswithcode.com/paper/syntax-aware-language-modeling-with-recurrent
Repo
Framework

Channel Charting: Locating Users within the Radio Environment using Channel State Information


Title	Channel Charting: Locating Users within the Radio Environment using Channel State Information
Authors	Christoph Studer, Saïd Medjkouh, Emre Gönültaş, Tom Goldstein, Olav Tirkkonen
Abstract	We propose channel charting (CC), a novel framework in which a multi-antenna network element learns a chart of the radio geometry in its surrounding area. The channel chart captures the local spatial geometry of the area so that points that are close in space will also be close in the channel chart and vice versa. CC works in a fully unsupervised manner, i.e., learning is only based on channel state information (CSI) that is passively collected at a single point in space, but from multiple transmit locations in the area over time. The method then extracts channel features that characterize large-scale fading properties of the wireless channel. Finally, the channel charts are generated with tools from dimensionality reduction, manifold learning, and deep neural networks. The network element performing CC may be, for example, a multi-antenna base-station in a cellular system and the charted area in the served cell. Logical relationships related to the position and movement of a transmitter, e.g., a user equipment (UE), in the cell can then be directly deduced from comparing measured radio channel characteristics to the channel chart. The unsupervised nature of CC enables a range of new applications in UE localization, network planning, user scheduling, multipoint connectivity, hand-over, cell search, user grouping, and other cognitive tasks that rely on CSI and UE movement relative to the base-station, without the need of information from global navigation satellite systems.
Tasks	Dimensionality Reduction
Published	2018-07-13
URL	http://arxiv.org/abs/1807.05247v2
PDF	http://arxiv.org/pdf/1807.05247v2.pdf
PWC	https://paperswithcode.com/paper/channel-charting-locating-users-within-the
Repo
Framework

Heteroskedastic PCA: Algorithm, Optimality, and Applications


Title	Heteroskedastic PCA: Algorithm, Optimality, and Applications
Authors	Anru Zhang, T. Tony Cai, Yihong Wu
Abstract	Principal component analysis (PCA) and singular value decomposition (SVD) are widely used in statistics, econometrics, machine learning, and applied mathematics. It has been well studied in the case of homoskedastic noise, where the noise levels of the contamination are homogeneous. In this paper, we consider PCA and SVD in the presence of heteroskedastic noise, which is a commonly used model for factor analysis and arises naturally in a range of applications. We introduce a general framework for heteroskedastic PCA and propose an algorithm called HeteroPCA, which involves iteratively imputing the diagonal entries to remove the bias due to heteroskedasticity. This procedure is computationally efficient and provably optimal under the generalized spiked covariance model. A key technical step is a deterministic robust perturbation analysis on singular subspaces, which can be of independent interest. The effectiveness of the proposed algorithm is demonstrated in a suite of applications, including heteroskedastic low-rank matrix denoising, Poisson PCA, and SVD based on heteroskedastic and incomplete data.
Tasks	Denoising
Published	2018-10-19
URL	https://arxiv.org/abs/1810.08316v2
PDF	https://arxiv.org/pdf/1810.08316v2.pdf
PWC	https://paperswithcode.com/paper/heteroskedastic-pca-algorithm-optimality-and
Repo
Framework

Notes on Abstract Argumentation Theory


Title	Notes on Abstract Argumentation Theory
Authors	Anthony Peter Young
Abstract	This note reviews Section 2 of Dung’s seminal 1995 paper on abstract argumentation theory. In particular, we clarify and make explicit all of the proofs mentioned therein, and provide more examples to illustrate the definitions, with the aim to help readers approaching abstract argumentation theory for the first time. However, we provide minimal commentary and will refer the reader to Dung’s paper for the intuitions behind various concepts. The appropriate mathematical prerequisites are provided in the appendices.
Tasks	Abstract Argumentation
Published	2018-06-18
URL	https://arxiv.org/abs/1806.07709v3
PDF	https://arxiv.org/pdf/1806.07709v3.pdf
PWC	https://paperswithcode.com/paper/notes-on-abstract-argumentation-theory
Repo
Framework

Fuzzy quantification for linguistic data analysis and data mining


Title	Fuzzy quantification for linguistic data analysis and data mining
Authors	F. Díaz-Hermida, Juan. C. Vidal
Abstract	Fuzzy quantification is a subtopic of fuzzy logic which deals with the modelling of the quantified expressions we can find in natural language. Fuzzy quantifiers have been successfully applied in several fields like fuzzy, control, fuzzy databases, information retrieval, natural language generation, etc. Their ability to model and evaluate linguistic expressions in a mathematical way, makes fuzzy quantifiers very powerful for data analytics and data mining applications. In this paper we will give a general overview of the main applications of fuzzy quantifiers in this field as well as some ideas to use them in new application contexts.
Tasks	Information Retrieval, Text Generation
Published	2018-07-19
URL	http://arxiv.org/abs/1807.07389v1
PDF	http://arxiv.org/pdf/1807.07389v1.pdf
PWC	https://paperswithcode.com/paper/fuzzy-quantification-for-linguistic-data
Repo
Framework

Learning to Exploit Invariances in Clinical Time-Series Data using Sequence Transformer Networks


Title	Learning to Exploit Invariances in Clinical Time-Series Data using Sequence Transformer Networks
Authors	Jeeheh Oh, Jiaxuan Wang, Jenna Wiens
Abstract	Recently, researchers have started applying convolutional neural networks (CNNs) with one-dimensional convolutions to clinical tasks involving time-series data. This is due, in part, to their computational efficiency, relative to recurrent neural networks and their ability to efficiently exploit certain temporal invariances, (e.g., phase invariance). However, it is well-established that clinical data may exhibit many other types of invariances (e.g., scaling). While preprocessing techniques, (e.g., dynamic time warping) may successfully transform and align inputs, their use often requires one to identify the types of invariances in advance. In contrast, we propose the use of Sequence Transformer Networks, an end-to-end trainable architecture that learns to identify and account for invariances in clinical time-series data. Applied to the task of predicting in-hospital mortality, our proposed approach achieves an improvement in the area under the receiver operating characteristic curve (AUROC) relative to a baseline CNN (AUROC=0.851 vs. AUROC=0.838). Our results suggest that a variety of valuable invariances can be learned directly from the data.
Tasks	Time Series
Published	2018-08-21
URL	http://arxiv.org/abs/1808.06725v1
PDF	http://arxiv.org/pdf/1808.06725v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-exploit-invariances-in-clinical
Repo
Framework

Deep Neural Network inference with reduced word length


Title	Deep Neural Network inference with reduced word length
Authors	Lukas Mauch, Bin Yang
Abstract	Deep neural networks (DNN) are powerful models for many pattern recognition tasks, yet their high computational complexity and memory requirement limit them to applications on high-performance computing platforms. In this paper, we propose a new method to evaluate DNNs trained with 32bit floating point (float32) accuracy using only low precision integer arithmetics in combination with binary shift and clipping operations. Because hardware implementation of these operations is much simpler than high precision floating point calculation, our method can be used for an efficient DNN inference on dedicated hardware. In experiments on MNIST, we demonstrate that DNNs trained with float32 can be evaluated using a combination of 2bit integer arithmetics and a few float32 calculations in each layer or only 3bit integer arithmetics in combination with binary shift and clipping without significant performance degradation.
Tasks
Published	2018-10-23
URL	http://arxiv.org/abs/1810.09854v1
PDF	http://arxiv.org/pdf/1810.09854v1.pdf
PWC	https://paperswithcode.com/paper/deep-neural-network-inference-with-reduced
Repo
Framework

Modeling disease progression in longitudinal EHR data using continuous-time hidden Markov models


Title	Modeling disease progression in longitudinal EHR data using continuous-time hidden Markov models
Authors	Aman Verma, Guido Powell, Yu Luo, David Stephens, David L. Buckeridge
Abstract	Modeling disease progression in healthcare administrative databases is complicated by the fact that patients are observed only at irregular intervals when they seek healthcare services. In a longitudinal cohort of 76,888 patients with chronic obstructive pulmonary disease (COPD), we used a continuous-time hidden Markov model with a generalized linear model to model healthcare utilization events. We found that the fitted model provides interpretable results suitable for summarization and hypothesis generation.
Tasks
Published	2018-12-03
URL	http://arxiv.org/abs/1812.00528v1
PDF	http://arxiv.org/pdf/1812.00528v1.pdf
PWC	https://paperswithcode.com/paper/modeling-disease-progression-in-longitudinal
Repo
Framework

Finding Convincing Arguments Using Scalable Bayesian Preference Learning


Title	Finding Convincing Arguments Using Scalable Bayesian Preference Learning
Authors	Edwin Simpson, Iryna Gurevych
Abstract	We introduce a scalable Bayesian preference learning method for identifying convincing arguments in the absence of gold-standard rat- ings or rankings. In contrast to previous work, we avoid the need for separate methods to perform quality control on training data, predict rankings and perform pairwise classification. Bayesian approaches are an effective solution when faced with sparse or noisy training data, but have not previously been used to identify convincing arguments. One issue is scalability, which we address by developing a stochastic variational inference method for Gaussian process (GP) preference learning. We show how our method can be applied to predict argument convincingness from crowdsourced data, outperforming the previous state-of-the-art, particularly when trained with small amounts of unreliable data. We demonstrate how the Bayesian approach enables more effective active learning, thereby reducing the amount of data required to identify convincing arguments for new users and domains. While word embeddings are principally used with neural networks, our results show that word embeddings in combination with linguistic features also benefit GPs when predicting argument convincingness.
Tasks	Active Learning, Word Embeddings
Published	2018-06-06
URL	http://arxiv.org/abs/1806.02418v1
PDF	http://arxiv.org/pdf/1806.02418v1.pdf
PWC	https://paperswithcode.com/paper/finding-convincing-arguments-using-scalable
Repo
Framework

Uncertainty Quantification for Online Learning and Stochastic Approximation via Hierarchical Incremental Gradient Descent


Title	Uncertainty Quantification for Online Learning and Stochastic Approximation via Hierarchical Incremental Gradient Descent
Authors	Weijie J. Su, Yuancheng Zhu
Abstract	Stochastic gradient descent (SGD) is an immensely popular approach for online learning in settings where data arrives in a stream or data sizes are very large. However, despite an ever- increasing volume of work on SGD, much less is known about the statistical inferential properties of SGD-based predictions. Taking a fully inferential viewpoint, this paper introduces a novel procedure termed HiGrad to conduct statistical inference for online learning, without incurring additional computational cost compared with SGD. The HiGrad procedure begins by performing SGD updates for a while and then splits the single thread into several threads, and this procedure hierarchically operates in this fashion along each thread. With predictions provided by multiple threads in place, a t-based confidence interval is constructed by decorrelating predictions using covariance structures given by a Donsker-style extension of the Ruppert–Polyak averaging scheme, which is a technical contribution of independent interest. Under certain regularity conditions, the HiGrad confidence interval is shown to attain asymptotically exact coverage probability. Finally, the performance of HiGrad is evaluated through extensive simulation studies and a real data example. An R package higrad has been developed to implement the method.
Tasks
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04876v2
PDF	http://arxiv.org/pdf/1802.04876v2.pdf
PWC	https://paperswithcode.com/paper/uncertainty-quantification-for-online
Repo
Framework

Neural System Identification with Spike-triggered Non-negative Matrix Factorization


Title	Neural System Identification with Spike-triggered Non-negative Matrix Factorization
Authors	Shanshan Jia, Zhaofei Yu, Arno Onken, Yonghong Tian, Tiejun Huang, Jian K. Liu
Abstract	Neuronal circuits formed in the brain are complex with intricate connection patterns. Such complexity is also observed in the retina as a relatively simple neuronal circuit. A retinal ganglion cell receives excitatory inputs from neurons in previous layers as driving forces to fire spikes. Analytical methods are required that can decipher these components in a systematic manner. Recently a method termed spike-triggered non-negative matrix factorization (STNMF) has been proposed for this purpose. In this study, we extend the scope of the STNMF method. By using the retinal ganglion cell as a model system, we show that STNMF can detect various computational properties of upstream bipolar cells, including spatial receptive field, temporal filter, and transfer nonlinearity. In addition, we recover synaptic connection strengths from the weight matrix of STNMF. Furthermore, we show that STNMF can separate spikes of a ganglion cell into a few subsets of spikes where each subset is contributed by one presynaptic bipolar cell. Taken together, these results corroborate that STNMF is a useful method for deciphering the structure of neuronal circuits.
Tasks
Published	2018-08-12
URL	https://arxiv.org/abs/1808.03958v4
PDF	https://arxiv.org/pdf/1808.03958v4.pdf
PWC	https://paperswithcode.com/paper/neural-system-identification-with-spike
Repo
Framework

Predicting Customer Churn: Extreme Gradient Boosting with Temporal Data


Title	Predicting Customer Churn: Extreme Gradient Boosting with Temporal Data
Authors	Bryan Gregory
Abstract	Accurately predicting customer churn using large scale time-series data is a common problem facing many business domains. The creation of model features across various time windows for training and testing can be particularly challenging due to temporal issues common to time-series data. In this paper, we will explore the application of extreme gradient boosting (XGBoost) on a customer dataset with a wide-variety of temporal features in order to create a highly-accurate customer churn model. In particular, we describe an effective method for handling temporally sensitive feature engineering. The proposed model was submitted in the WSDM Cup 2018 Churn Challenge and achieved first-place out of 575 teams.
Tasks	Feature Engineering, Time Series
Published	2018-02-09
URL	http://arxiv.org/abs/1802.03396v1
PDF	http://arxiv.org/pdf/1802.03396v1.pdf
PWC	https://paperswithcode.com/paper/predicting-customer-churn-extreme-gradient
Repo
Framework

Confidence Region of Singular Subspaces for Low-rank Matrix Regression


Title	Confidence Region of Singular Subspaces for Low-rank Matrix Regression
Authors	Dong Xia
Abstract	Low-rank matrix regression refers to the instances of recovering a low-rank matrix based on specially designed measurements and the corresponding noisy outcomes. In the last decade, numerous statistical methodologies have been developed for efficiently recovering the unknown low-rank matrices. However, in some applications, the unknown singular subspace is scientifically more important than the low-rank matrix itself. In this article, we revisit the low-rank matrix regression model and introduce a two-step procedure to construct confidence regions of the singular subspace. The procedure involves the de-biasing for the typical low-rank estimators after which we calculate the empirical singular vectors. We investigate the distribution of the joint projection distance between the empirical singular subspace and the unknown true singular subspace. We specifically prove the asymptotical normality of the joint projection distance with data-dependent centering and normalization when $r^{3/2}(m_1+m_2)^{3/2}=o(n/\log n)$ where $m_1, m_2$ denote the matrix row and column sizes, $r$ is the rank and $n$ is the number of independent random measurements. Consequently, we propose data-dependent confidence regions of the true singular subspace which attains any pre-determined confidence level asymptotically. In addition, non-asymptotical convergence rates are also established. Numerical results are presented to demonstrate the merits of our methods.
Tasks
Published	2018-05-24
URL	http://arxiv.org/abs/1805.09871v3
PDF	http://arxiv.org/pdf/1805.09871v3.pdf
PWC	https://paperswithcode.com/paper/confidence-region-of-singular-subspaces-for
Repo
Framework