Paper Group ANR 33
Normalized Cut Loss for Weakly-supervised CNN Segmentation. Natural Language Processing for Music Knowledge Discovery. Syntax-Aware Language Modeling with Recurrent Neural Networks. Channel Charting: Locating Users within the Radio Environment using Channel State Information. Heteroskedastic PCA: Algorithm, Optimality, and Applications. Notes on Ab …
Normalized Cut Loss for Weakly-supervised CNN Segmentation
Title | Normalized Cut Loss for Weakly-supervised CNN Segmentation |
Authors | Meng Tang, Abdelaziz Djelouah, Federico Perazzi, Yuri Boykov, Christopher Schroers |
Abstract | Most recent semantic segmentation methods train deep convolutional neural networks with fully annotated masks requiring pixel-accuracy for good quality training. Common weakly-supervised approaches generate full masks from partial input (e.g. scribbles or seeds) using standard interactive segmentation methods as preprocessing. But, errors in such masks result in poorer training since standard loss functions (e.g. cross-entropy) do not distinguish seeds from potentially mislabeled other pixels. Inspired by the general ideas in semi-supervised learning, we address these problems via a new principled loss function evaluating network output with criteria standard in “shallow” segmentation, e.g. normalized cut. Unlike prior work, the cross entropy part of our loss evaluates only seeds where labels are known while normalized cut softly evaluates consistency of all pixels. We focus on normalized cut loss where dense Gaussian kernel is efficiently implemented in linear time by fast Bilateral filtering. Our normalized cut loss approach to segmentation brings the quality of weakly-supervised training significantly closer to fully supervised methods. |
Tasks | Interactive Segmentation, Semantic Segmentation |
Published | 2018-04-04 |
URL | http://arxiv.org/abs/1804.01346v1 |
http://arxiv.org/pdf/1804.01346v1.pdf | |
PWC | https://paperswithcode.com/paper/normalized-cut-loss-for-weakly-supervised-cnn |
Repo | |
Framework | |
Natural Language Processing for Music Knowledge Discovery
Title | Natural Language Processing for Music Knowledge Discovery |
Authors | Sergio Oramas, Luis Espinosa-Anke, Francisco Gómez, Xavier Serra |
Abstract | Today, a massive amount of musical knowledge is stored in written form, with testimonies dated as far back as several centuries ago. In this work, we present different Natural Language Processing (NLP) approaches to harness the potential of these text collections for automatic music knowledge discovery, covering different phases in a prototypical NLP pipeline, namely corpus compilation, text-mining, information extraction, knowledge graph generation and sentiment analysis. Each of these approaches is presented alongside different use cases (i.e., flamenco, Renaissance and popular music) where large collections of documents are processed, and conclusions stemming from data-driven analyses are presented and discussed. |
Tasks | Graph Generation, Sentiment Analysis |
Published | 2018-07-06 |
URL | http://arxiv.org/abs/1807.02200v1 |
http://arxiv.org/pdf/1807.02200v1.pdf | |
PWC | https://paperswithcode.com/paper/natural-language-processing-for-music |
Repo | |
Framework | |
Syntax-Aware Language Modeling with Recurrent Neural Networks
Title | Syntax-Aware Language Modeling with Recurrent Neural Networks |
Authors | Duncan Blythe, Alan Akbik, Roland Vollgraf |
Abstract | Neural language models (LMs) are typically trained using only lexical features, such as surface forms of words. In this paper, we argue this deprives the LM of crucial syntactic signals that can be detected at high confidence using existing parsers. We present a simple but highly effective approach for training neural LMs using both lexical and syntactic information, and a novel approach for applying such LMs to unparsed text using sequential Monte Carlo sampling. In experiments on a range of corpora and corpus sizes, we show our approach consistently outperforms standard lexical LMs in character-level language modeling; on the other hand, for word-level models the models are on a par with standard language models. These results indicate potential for expanding LMs beyond lexical surface features to higher-level NLP features for character-level models. |
Tasks | Language Modelling |
Published | 2018-03-02 |
URL | http://arxiv.org/abs/1803.03665v1 |
http://arxiv.org/pdf/1803.03665v1.pdf | |
PWC | https://paperswithcode.com/paper/syntax-aware-language-modeling-with-recurrent |
Repo | |
Framework | |
Channel Charting: Locating Users within the Radio Environment using Channel State Information
Title | Channel Charting: Locating Users within the Radio Environment using Channel State Information |
Authors | Christoph Studer, Saïd Medjkouh, Emre Gönültaş, Tom Goldstein, Olav Tirkkonen |
Abstract | We propose channel charting (CC), a novel framework in which a multi-antenna network element learns a chart of the radio geometry in its surrounding area. The channel chart captures the local spatial geometry of the area so that points that are close in space will also be close in the channel chart and vice versa. CC works in a fully unsupervised manner, i.e., learning is only based on channel state information (CSI) that is passively collected at a single point in space, but from multiple transmit locations in the area over time. The method then extracts channel features that characterize large-scale fading properties of the wireless channel. Finally, the channel charts are generated with tools from dimensionality reduction, manifold learning, and deep neural networks. The network element performing CC may be, for example, a multi-antenna base-station in a cellular system and the charted area in the served cell. Logical relationships related to the position and movement of a transmitter, e.g., a user equipment (UE), in the cell can then be directly deduced from comparing measured radio channel characteristics to the channel chart. The unsupervised nature of CC enables a range of new applications in UE localization, network planning, user scheduling, multipoint connectivity, hand-over, cell search, user grouping, and other cognitive tasks that rely on CSI and UE movement relative to the base-station, without the need of information from global navigation satellite systems. |
Tasks | Dimensionality Reduction |
Published | 2018-07-13 |
URL | http://arxiv.org/abs/1807.05247v2 |
http://arxiv.org/pdf/1807.05247v2.pdf | |
PWC | https://paperswithcode.com/paper/channel-charting-locating-users-within-the |
Repo | |
Framework | |
Heteroskedastic PCA: Algorithm, Optimality, and Applications
Title | Heteroskedastic PCA: Algorithm, Optimality, and Applications |
Authors | Anru Zhang, T. Tony Cai, Yihong Wu |
Abstract | Principal component analysis (PCA) and singular value decomposition (SVD) are widely used in statistics, econometrics, machine learning, and applied mathematics. It has been well studied in the case of homoskedastic noise, where the noise levels of the contamination are homogeneous. In this paper, we consider PCA and SVD in the presence of heteroskedastic noise, which is a commonly used model for factor analysis and arises naturally in a range of applications. We introduce a general framework for heteroskedastic PCA and propose an algorithm called HeteroPCA, which involves iteratively imputing the diagonal entries to remove the bias due to heteroskedasticity. This procedure is computationally efficient and provably optimal under the generalized spiked covariance model. A key technical step is a deterministic robust perturbation analysis on singular subspaces, which can be of independent interest. The effectiveness of the proposed algorithm is demonstrated in a suite of applications, including heteroskedastic low-rank matrix denoising, Poisson PCA, and SVD based on heteroskedastic and incomplete data. |
Tasks | Denoising |
Published | 2018-10-19 |
URL | https://arxiv.org/abs/1810.08316v2 |
https://arxiv.org/pdf/1810.08316v2.pdf | |
PWC | https://paperswithcode.com/paper/heteroskedastic-pca-algorithm-optimality-and |
Repo | |
Framework | |
Notes on Abstract Argumentation Theory
Title | Notes on Abstract Argumentation Theory |
Authors | Anthony Peter Young |
Abstract | This note reviews Section 2 of Dung’s seminal 1995 paper on abstract argumentation theory. In particular, we clarify and make explicit all of the proofs mentioned therein, and provide more examples to illustrate the definitions, with the aim to help readers approaching abstract argumentation theory for the first time. However, we provide minimal commentary and will refer the reader to Dung’s paper for the intuitions behind various concepts. The appropriate mathematical prerequisites are provided in the appendices. |
Tasks | Abstract Argumentation |
Published | 2018-06-18 |
URL | https://arxiv.org/abs/1806.07709v3 |
https://arxiv.org/pdf/1806.07709v3.pdf | |
PWC | https://paperswithcode.com/paper/notes-on-abstract-argumentation-theory |
Repo | |
Framework | |
Fuzzy quantification for linguistic data analysis and data mining
Title | Fuzzy quantification for linguistic data analysis and data mining |
Authors | F. Díaz-Hermida, Juan. C. Vidal |
Abstract | Fuzzy quantification is a subtopic of fuzzy logic which deals with the modelling of the quantified expressions we can find in natural language. Fuzzy quantifiers have been successfully applied in several fields like fuzzy, control, fuzzy databases, information retrieval, natural language generation, etc. Their ability to model and evaluate linguistic expressions in a mathematical way, makes fuzzy quantifiers very powerful for data analytics and data mining applications. In this paper we will give a general overview of the main applications of fuzzy quantifiers in this field as well as some ideas to use them in new application contexts. |
Tasks | Information Retrieval, Text Generation |
Published | 2018-07-19 |
URL | http://arxiv.org/abs/1807.07389v1 |
http://arxiv.org/pdf/1807.07389v1.pdf | |
PWC | https://paperswithcode.com/paper/fuzzy-quantification-for-linguistic-data |
Repo | |
Framework | |
Learning to Exploit Invariances in Clinical Time-Series Data using Sequence Transformer Networks
Title | Learning to Exploit Invariances in Clinical Time-Series Data using Sequence Transformer Networks |
Authors | Jeeheh Oh, Jiaxuan Wang, Jenna Wiens |
Abstract | Recently, researchers have started applying convolutional neural networks (CNNs) with one-dimensional convolutions to clinical tasks involving time-series data. This is due, in part, to their computational efficiency, relative to recurrent neural networks and their ability to efficiently exploit certain temporal invariances, (e.g., phase invariance). However, it is well-established that clinical data may exhibit many other types of invariances (e.g., scaling). While preprocessing techniques, (e.g., dynamic time warping) may successfully transform and align inputs, their use often requires one to identify the types of invariances in advance. In contrast, we propose the use of Sequence Transformer Networks, an end-to-end trainable architecture that learns to identify and account for invariances in clinical time-series data. Applied to the task of predicting in-hospital mortality, our proposed approach achieves an improvement in the area under the receiver operating characteristic curve (AUROC) relative to a baseline CNN (AUROC=0.851 vs. AUROC=0.838). Our results suggest that a variety of valuable invariances can be learned directly from the data. |
Tasks | Time Series |
Published | 2018-08-21 |
URL | http://arxiv.org/abs/1808.06725v1 |
http://arxiv.org/pdf/1808.06725v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-exploit-invariances-in-clinical |
Repo | |
Framework | |
Deep Neural Network inference with reduced word length
Title | Deep Neural Network inference with reduced word length |
Authors | Lukas Mauch, Bin Yang |
Abstract | Deep neural networks (DNN) are powerful models for many pattern recognition tasks, yet their high computational complexity and memory requirement limit them to applications on high-performance computing platforms. In this paper, we propose a new method to evaluate DNNs trained with 32bit floating point (float32) accuracy using only low precision integer arithmetics in combination with binary shift and clipping operations. Because hardware implementation of these operations is much simpler than high precision floating point calculation, our method can be used for an efficient DNN inference on dedicated hardware. In experiments on MNIST, we demonstrate that DNNs trained with float32 can be evaluated using a combination of 2bit integer arithmetics and a few float32 calculations in each layer or only 3bit integer arithmetics in combination with binary shift and clipping without significant performance degradation. |
Tasks | |
Published | 2018-10-23 |
URL | http://arxiv.org/abs/1810.09854v1 |
http://arxiv.org/pdf/1810.09854v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-neural-network-inference-with-reduced |
Repo | |
Framework | |
Modeling disease progression in longitudinal EHR data using continuous-time hidden Markov models
Title | Modeling disease progression in longitudinal EHR data using continuous-time hidden Markov models |
Authors | Aman Verma, Guido Powell, Yu Luo, David Stephens, David L. Buckeridge |
Abstract | Modeling disease progression in healthcare administrative databases is complicated by the fact that patients are observed only at irregular intervals when they seek healthcare services. In a longitudinal cohort of 76,888 patients with chronic obstructive pulmonary disease (COPD), we used a continuous-time hidden Markov model with a generalized linear model to model healthcare utilization events. We found that the fitted model provides interpretable results suitable for summarization and hypothesis generation. |
Tasks | |
Published | 2018-12-03 |
URL | http://arxiv.org/abs/1812.00528v1 |
http://arxiv.org/pdf/1812.00528v1.pdf | |
PWC | https://paperswithcode.com/paper/modeling-disease-progression-in-longitudinal |
Repo | |
Framework | |
Finding Convincing Arguments Using Scalable Bayesian Preference Learning
Title | Finding Convincing Arguments Using Scalable Bayesian Preference Learning |
Authors | Edwin Simpson, Iryna Gurevych |
Abstract | We introduce a scalable Bayesian preference learning method for identifying convincing arguments in the absence of gold-standard rat- ings or rankings. In contrast to previous work, we avoid the need for separate methods to perform quality control on training data, predict rankings and perform pairwise classification. Bayesian approaches are an effective solution when faced with sparse or noisy training data, but have not previously been used to identify convincing arguments. One issue is scalability, which we address by developing a stochastic variational inference method for Gaussian process (GP) preference learning. We show how our method can be applied to predict argument convincingness from crowdsourced data, outperforming the previous state-of-the-art, particularly when trained with small amounts of unreliable data. We demonstrate how the Bayesian approach enables more effective active learning, thereby reducing the amount of data required to identify convincing arguments for new users and domains. While word embeddings are principally used with neural networks, our results show that word embeddings in combination with linguistic features also benefit GPs when predicting argument convincingness. |
Tasks | Active Learning, Word Embeddings |
Published | 2018-06-06 |
URL | http://arxiv.org/abs/1806.02418v1 |
http://arxiv.org/pdf/1806.02418v1.pdf | |
PWC | https://paperswithcode.com/paper/finding-convincing-arguments-using-scalable |
Repo | |
Framework | |
Uncertainty Quantification for Online Learning and Stochastic Approximation via Hierarchical Incremental Gradient Descent
Title | Uncertainty Quantification for Online Learning and Stochastic Approximation via Hierarchical Incremental Gradient Descent |
Authors | Weijie J. Su, Yuancheng Zhu |
Abstract | Stochastic gradient descent (SGD) is an immensely popular approach for online learning in settings where data arrives in a stream or data sizes are very large. However, despite an ever- increasing volume of work on SGD, much less is known about the statistical inferential properties of SGD-based predictions. Taking a fully inferential viewpoint, this paper introduces a novel procedure termed HiGrad to conduct statistical inference for online learning, without incurring additional computational cost compared with SGD. The HiGrad procedure begins by performing SGD updates for a while and then splits the single thread into several threads, and this procedure hierarchically operates in this fashion along each thread. With predictions provided by multiple threads in place, a t-based confidence interval is constructed by decorrelating predictions using covariance structures given by a Donsker-style extension of the Ruppert–Polyak averaging scheme, which is a technical contribution of independent interest. Under certain regularity conditions, the HiGrad confidence interval is shown to attain asymptotically exact coverage probability. Finally, the performance of HiGrad is evaluated through extensive simulation studies and a real data example. An R package higrad has been developed to implement the method. |
Tasks | |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04876v2 |
http://arxiv.org/pdf/1802.04876v2.pdf | |
PWC | https://paperswithcode.com/paper/uncertainty-quantification-for-online |
Repo | |
Framework | |
Neural System Identification with Spike-triggered Non-negative Matrix Factorization
Title | Neural System Identification with Spike-triggered Non-negative Matrix Factorization |
Authors | Shanshan Jia, Zhaofei Yu, Arno Onken, Yonghong Tian, Tiejun Huang, Jian K. Liu |
Abstract | Neuronal circuits formed in the brain are complex with intricate connection patterns. Such complexity is also observed in the retina as a relatively simple neuronal circuit. A retinal ganglion cell receives excitatory inputs from neurons in previous layers as driving forces to fire spikes. Analytical methods are required that can decipher these components in a systematic manner. Recently a method termed spike-triggered non-negative matrix factorization (STNMF) has been proposed for this purpose. In this study, we extend the scope of the STNMF method. By using the retinal ganglion cell as a model system, we show that STNMF can detect various computational properties of upstream bipolar cells, including spatial receptive field, temporal filter, and transfer nonlinearity. In addition, we recover synaptic connection strengths from the weight matrix of STNMF. Furthermore, we show that STNMF can separate spikes of a ganglion cell into a few subsets of spikes where each subset is contributed by one presynaptic bipolar cell. Taken together, these results corroborate that STNMF is a useful method for deciphering the structure of neuronal circuits. |
Tasks | |
Published | 2018-08-12 |
URL | https://arxiv.org/abs/1808.03958v4 |
https://arxiv.org/pdf/1808.03958v4.pdf | |
PWC | https://paperswithcode.com/paper/neural-system-identification-with-spike |
Repo | |
Framework | |
Predicting Customer Churn: Extreme Gradient Boosting with Temporal Data
Title | Predicting Customer Churn: Extreme Gradient Boosting with Temporal Data |
Authors | Bryan Gregory |
Abstract | Accurately predicting customer churn using large scale time-series data is a common problem facing many business domains. The creation of model features across various time windows for training and testing can be particularly challenging due to temporal issues common to time-series data. In this paper, we will explore the application of extreme gradient boosting (XGBoost) on a customer dataset with a wide-variety of temporal features in order to create a highly-accurate customer churn model. In particular, we describe an effective method for handling temporally sensitive feature engineering. The proposed model was submitted in the WSDM Cup 2018 Churn Challenge and achieved first-place out of 575 teams. |
Tasks | Feature Engineering, Time Series |
Published | 2018-02-09 |
URL | http://arxiv.org/abs/1802.03396v1 |
http://arxiv.org/pdf/1802.03396v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-customer-churn-extreme-gradient |
Repo | |
Framework | |
Confidence Region of Singular Subspaces for Low-rank Matrix Regression
Title | Confidence Region of Singular Subspaces for Low-rank Matrix Regression |
Authors | Dong Xia |
Abstract | Low-rank matrix regression refers to the instances of recovering a low-rank matrix based on specially designed measurements and the corresponding noisy outcomes. In the last decade, numerous statistical methodologies have been developed for efficiently recovering the unknown low-rank matrices. However, in some applications, the unknown singular subspace is scientifically more important than the low-rank matrix itself. In this article, we revisit the low-rank matrix regression model and introduce a two-step procedure to construct confidence regions of the singular subspace. The procedure involves the de-biasing for the typical low-rank estimators after which we calculate the empirical singular vectors. We investigate the distribution of the joint projection distance between the empirical singular subspace and the unknown true singular subspace. We specifically prove the asymptotical normality of the joint projection distance with data-dependent centering and normalization when $r^{3/2}(m_1+m_2)^{3/2}=o(n/\log n)$ where $m_1, m_2$ denote the matrix row and column sizes, $r$ is the rank and $n$ is the number of independent random measurements. Consequently, we propose data-dependent confidence regions of the true singular subspace which attains any pre-determined confidence level asymptotically. In addition, non-asymptotical convergence rates are also established. Numerical results are presented to demonstrate the merits of our methods. |
Tasks | |
Published | 2018-05-24 |
URL | http://arxiv.org/abs/1805.09871v3 |
http://arxiv.org/pdf/1805.09871v3.pdf | |
PWC | https://paperswithcode.com/paper/confidence-region-of-singular-subspaces-for |
Repo | |
Framework | |