Paper Group ANR 110
Efficient Text Classification Using Tree-structured Multi-linear Principal Component Analysis. Optional Stopping with Bayes Factors: a categorization and extension of folklore results, with an application to invariant situations. On the Statistical Challenges of Echo State Networks and Some Potential Remedies. HeteroMed: Heterogeneous Information N …
Efficient Text Classification Using Tree-structured Multi-linear Principal Component Analysis
Title | Efficient Text Classification Using Tree-structured Multi-linear Principal Component Analysis |
Authors | Yuanhang Su, Yuzhong Huang, C. -C. Jay Kuo |
Abstract | A novel text data dimension reduction technique, called the tree-structured multi-linear principal component anal- ysis (TMPCA), is proposed in this work. Being different from traditional text dimension reduction methods that deal with the word-level representation, the TMPCA technique reduces the dimension of input sequences and sentences to simplify the following text classification tasks. It is shown mathematically and experimentally that the TMPCA tool demands much lower complexity (and, hence, less computing power) than the ordinary principal component analysis (PCA). Furthermore, it is demon- strated by experimental results that the support vector machine (SVM) method applied to the TMPCA-processed data achieves commensurable or better performance than the state-of-the-art recurrent neural network (RNN) approach. |
Tasks | Dimensionality Reduction, Text Classification |
Published | 2018-01-20 |
URL | http://arxiv.org/abs/1801.06607v2 |
http://arxiv.org/pdf/1801.06607v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-text-classification-using-tree |
Repo | |
Framework | |
Optional Stopping with Bayes Factors: a categorization and extension of folklore results, with an application to invariant situations
Title | Optional Stopping with Bayes Factors: a categorization and extension of folklore results, with an application to invariant situations |
Authors | Allard Hendriksen, Rianne de Heide, Peter Grünwald |
Abstract | It is often claimed that Bayesian methods, in particular Bayes factor methods for hypothesis testing, can deal with optional stopping. We first give an overview, using elementary probability theory, of three different mathematical meanings that various authors give to this claim: (1) stopping rule independence, (2) posterior calibration and (3) (semi-) frequentist robustness to optional stopping. We then prove theorems to the effect that these claims do indeed hold in a general measure-theoretic setting. For claims of type (2) and (3), such results are new. By allowing for non-integrable measures based on improper priors, we obtain particularly strong results for the practically important case of models with nuisance parameters satisfying a group invariance (such as location or scale). We also discuss the practical relevance of (1)-(3), and conclude that whether Bayes factor methods actually perform welll under optional stopping crucially depends on details of models, priors and the goal of the analysis. |
Tasks | Calibration |
Published | 2018-07-24 |
URL | https://arxiv.org/abs/1807.09077v2 |
https://arxiv.org/pdf/1807.09077v2.pdf | |
PWC | https://paperswithcode.com/paper/optional-stopping-with-bayes-factors-a |
Repo | |
Framework | |
On the Statistical Challenges of Echo State Networks and Some Potential Remedies
Title | On the Statistical Challenges of Echo State Networks and Some Potential Remedies |
Authors | Qiuyi Wu, Ernest Fokoue, Dhireesha Kudithipudi |
Abstract | Echo state networks are powerful recurrent neural networks. However, they are often unstable and shaky, making the process of finding an good ESN for a specific dataset quite hard. Obtaining a superb accuracy by using the Echo State Network is a challenging task. We create, develop and implement a family of predictably optimal robust and stable ensemble of Echo State Networks via regularizing the training and perturbing the input. Furthermore, several distributions of weights have been tried based on the shape to see if the shape of the distribution has the impact for reducing the error. We found ESN can track in short term for most dataset, but it collapses in the long run. Short-term tracking with large size reservoir enables ESN to perform strikingly with superior prediction. Based on this scenario, we go a further step to aggregate many of ESNs into an ensemble to lower the variance and stabilize the system by stochastic replications and bootstrapping of input data. |
Tasks | |
Published | 2018-02-20 |
URL | http://arxiv.org/abs/1802.07369v1 |
http://arxiv.org/pdf/1802.07369v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-statistical-challenges-of-echo-state |
Repo | |
Framework | |
HeteroMed: Heterogeneous Information Network for Medical Diagnosis
Title | HeteroMed: Heterogeneous Information Network for Medical Diagnosis |
Authors | Anahita Hosseini, Ting Chen, Wenjun Wu, Yizhou Sun, Majid Sarrafzadeh |
Abstract | With the recent availability of Electronic Health Records (EHR) and great opportunities they offer for advancing medical informatics, there has been growing interest in mining EHR for improving quality of care. Disease diagnosis due to its sensitive nature, huge costs of error, and complexity has become an increasingly important focus of research in past years. Existing studies model EHR by capturing co-occurrence of clinical events to learn their latent embeddings. However, relations among clinical events carry various semantics and contribute differently to disease diagnosis which gives precedence to a more advanced modeling of heterogeneous data types and relations in EHR data than existing solutions. To address these issues, we represent how high-dimensional EHR data and its rich relationships can be suitably translated into HeteroMed, a heterogeneous information network for robust medical diagnosis. Our modeling approach allows for straightforward handling of missing values and heterogeneity of data. HeteroMed exploits metapaths to capture higher level and semantically important relations contributing to disease diagnosis. Furthermore, it employs a joint embedding framework to tailor clinical event representations to the disease diagnosis goal. To the best of our knowledge, this is the first study to use Heterogeneous Information Network for modeling clinical data and disease diagnosis. Experimental results of our study show superior performance of HeteroMed compared to prior methods in prediction of exact diagnosis codes and general disease cohorts. Moreover, HeteroMed outperforms baseline models in capturing similarities of clinical events which are examined qualitatively through case studies. |
Tasks | Medical Diagnosis |
Published | 2018-04-22 |
URL | http://arxiv.org/abs/1804.08052v1 |
http://arxiv.org/pdf/1804.08052v1.pdf | |
PWC | https://paperswithcode.com/paper/heteromed-heterogeneous-information-network |
Repo | |
Framework | |
On Fast Leverage Score Sampling and Optimal Learning
Title | On Fast Leverage Score Sampling and Optimal Learning |
Authors | Alessandro Rudi, Daniele Calandriello, Luigi Carratino, Lorenzo Rosasco |
Abstract | Leverage score sampling provides an appealing way to perform approximate computations for large matrices. Indeed, it allows to derive faithful approximations with a complexity adapted to the problem at hand. Yet, performing leverage scores sampling is a challenge in its own right requiring further approximations. In this paper, we study the problem of leverage score sampling for positive definite matrices defined by a kernel. Our contribution is twofold. First we provide a novel algorithm for leverage score sampling and second, we exploit the proposed method in statistical learning by deriving a novel solver for kernel ridge regression. Our main technical contribution is showing that the proposed algorithms are currently the most efficient and accurate for these problems. |
Tasks | |
Published | 2018-10-31 |
URL | http://arxiv.org/abs/1810.13258v2 |
http://arxiv.org/pdf/1810.13258v2.pdf | |
PWC | https://paperswithcode.com/paper/on-fast-leverage-score-sampling-and-optimal |
Repo | |
Framework | |
Core Conflictual Relationship: Text Mining to Discover What and When
Title | Core Conflictual Relationship: Text Mining to Discover What and When |
Authors | Fionn Murtagh, Giuseppe Iurato |
Abstract | Following detailed presentation of the Core Conflictual Relationship Theme (CCRT), there is the objective of relevant methods for what has been described as verbalization and visualization of data. Such is also termed data mining and text mining, and knowledge discovery in data. The Correspondence Analysis methodology, also termed Geometric Data Analysis, is shown in a case study to be comprehensive and revealing. Computational efficiency depends on how the analysis process is structured. For both illustrative and revealing aspects of the case study here, relatively extensive dream reports are used. This Geometric Data Analysis confirms the validity of CCRT method. |
Tasks | |
Published | 2018-05-28 |
URL | http://arxiv.org/abs/1805.11140v1 |
http://arxiv.org/pdf/1805.11140v1.pdf | |
PWC | https://paperswithcode.com/paper/core-conflictual-relationship-text-mining-to |
Repo | |
Framework | |
Deep Graph Laplacian Regularization for Robust Denoising of Real Images
Title | Deep Graph Laplacian Regularization for Robust Denoising of Real Images |
Authors | Jin Zeng, Jiahao Pang, Wenxiu Sun, Gene Cheung |
Abstract | Recent developments in deep learning have revolutionized the paradigm of image restoration. However, its applications on real image denoising are still limited, due to its sensitivity to training data and the complex nature of real image noise. In this work, we combine the robustness merit of model-based approaches and the learning power of data-driven approaches for real image denoising. Specifically, by integrating graph Laplacian regularization as a trainable module into a deep learning framework, we are less susceptible to overfitting than pure CNN-based approaches, achieving higher robustness to small datasets and cross-domain denoising. First, a sparse neighborhood graph is built from the output of a convolutional neural network (CNN). Then the image is restored by solving an unconstrained quadratic programming problem, using a corresponding graph Laplacian regularizer as a prior term. The proposed restoration pipeline is fully differentiable and hence can be end-to-end trained. Experimental results demonstrate that our work is less prone to overfitting given small training data. It is also endowed with strong cross-domain generalization power, outperforming the state-of-the-art approaches by a remarkable margin. |
Tasks | Denoising, Domain Generalization, Image Denoising, Image Restoration |
Published | 2018-07-31 |
URL | https://arxiv.org/abs/1807.11637v3 |
https://arxiv.org/pdf/1807.11637v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-graph-laplacian-regularization-for |
Repo | |
Framework | |
A survey of automatic de-identification of longitudinal clinical narratives
Title | A survey of automatic de-identification of longitudinal clinical narratives |
Authors | Vithya Yogarajan, Michael Mayo, Bernhard Pfahringer |
Abstract | Use of medical data, also known as electronic health records, in research helps develop and advance medical science. However, protecting patient confidentiality and identity while using medical data for analysis is crucial. Medical data can be in the form of tabular structures (i.e. tables), free-form narratives, and images. This study focuses on medical data in the free form longitudinal text. De-identification of electronic health records provides the opportunity to use such data for research without it affecting patient privacy, and avoids the need for individual patient consent. In recent years there is increasing interest in developing an accurate, robust and adaptable automatic de-identification system for electronic health records. This is mainly due to the dilemma between the availability of an abundance of health data, and the inability to use such data in research due to legal and ethical restrictions. De-identification tracks in competitions such as the 2014 i2b2 UTHealth and the 2016 CEGS N-GRID shared tasks have provided a great platform to advance this area. The primary reasons for this include the open source nature of the dataset and the fact that raw psychiatric data were used for 2016 competitions. This study focuses on noticeable trend changes in the techniques used in the development of automatic de-identification for longitudinal clinical narratives. More specifically, the shift from using conditional random fields (CRF) based systems only or rules (regular expressions, dictionary or combinations) based systems only, to hybrid models (combining CRF and rules), and more recently to deep learning based systems. We review the literature and results that arose from the 2014 and the 2016 competitions and discuss the outcomes of these systems. We also provide a list of research questions that emerged from this survey. |
Tasks | |
Published | 2018-10-16 |
URL | http://arxiv.org/abs/1810.06765v1 |
http://arxiv.org/pdf/1810.06765v1.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-of-automatic-de-identification-of |
Repo | |
Framework | |
Band gap prediction for large organic crystal structures with machine learning
Title | Band gap prediction for large organic crystal structures with machine learning |
Authors | Bart Olsthoorn, R. Matthias Geilhufe, Stanislav S. Borysov, Alexander V. Balatsky |
Abstract | Machine-learning models are capable of capturing the structure-property relationship from a dataset of computationally demanding ab initio calculations. Over the past two years, the Organic Materials Database (OMDB) has hosted a growing number of calculated electronic properties of previously synthesized organic crystal structures. The complexity of the organic crystals contained within the OMDB, which have on average 82 atoms per unit cell, makes this database a challenging platform for machine learning applications. In this paper, the focus is on predicting the band gap which represents one of the basic properties of a crystalline materials. With this aim, a consistent dataset of 12 500 crystal structures and their corresponding DFT band gap are released, freely available for download at https://omdb.mathub.io/dataset. An ensemble of two state-of-the-art models reach a mean absolute error (MAE) of 0.388 eV, which corresponds to a percentage error of 13% for an average band gap of 3.05 eV. Finally, the trained models are employed to predict the band gap for 260 092 materials contained within the Crystallography Open Database (COD) and made available online so that the predictions can be obtained for any arbitrary crystal structure uploaded by a user. |
Tasks | Band Gap |
Published | 2018-10-30 |
URL | https://arxiv.org/abs/1810.12814v4 |
https://arxiv.org/pdf/1810.12814v4.pdf | |
PWC | https://paperswithcode.com/paper/band-gap-prediction-for-large-organic-crystal |
Repo | |
Framework | |
Building Corpora for Single-Channel Speech Separation Across Multiple Domains
Title | Building Corpora for Single-Channel Speech Separation Across Multiple Domains |
Authors | Matthew Maciejewski, Gregory Sell, Leibny Paola Garcia-Perera, Shinji Watanabe, Sanjeev Khudanpur |
Abstract | To date, the bulk of research on single-channel speech separation has been conducted using clean, near-field, read speech, which is not representative of many modern applications. In this work, we develop a procedure for constructing high-quality synthetic overlap datasets, necessary for most deep learning-based separation frameworks. We produced datasets that are more representative of realistic applications using the CHiME-5 and Mixer 6 corpora and evaluate standard methods on this data to demonstrate the shortcomings of current source-separation performance. We also demonstrate the value of a wide variety of data in training robust models that generalize well to multiple conditions. |
Tasks | Speech Separation |
Published | 2018-11-06 |
URL | http://arxiv.org/abs/1811.02641v1 |
http://arxiv.org/pdf/1811.02641v1.pdf | |
PWC | https://paperswithcode.com/paper/building-corpora-for-single-channel-speech |
Repo | |
Framework | |
Gradient Descent Finds Global Minima of Deep Neural Networks
Title | Gradient Descent Finds Global Minima of Deep Neural Networks |
Authors | Simon S. Du, Jason D. Lee, Haochuan Li, Liwei Wang, Xiyu Zhai |
Abstract | Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). Our analysis relies on the particular structure of the Gram matrix induced by the neural network architecture. This structure allows us to show the Gram matrix is stable throughout the training process and this stability implies the global optimality of the gradient descent algorithm. We further extend our analysis to deep residual convolutional neural networks and obtain a similar convergence result. |
Tasks | |
Published | 2018-11-09 |
URL | https://arxiv.org/abs/1811.03804v4 |
https://arxiv.org/pdf/1811.03804v4.pdf | |
PWC | https://paperswithcode.com/paper/gradient-descent-finds-global-minima-of-deep |
Repo | |
Framework | |
Probabilistic Model of Object Detection Based on Convolutional Neural Network
Title | Probabilistic Model of Object Detection Based on Convolutional Neural Network |
Authors | Fang-Qi Li, Xu-Die Ren, Hao-Nan Guo |
Abstract | The combination of a CNN detector and a search framework forms the basis for local object/pattern detection. To handle the waste of regional information and the defective compromise between efficiency and accuracy, this paper proposes a probabilistic model with a powerful search framework. By mapping an image into a probabilistic distribution of objects, this new model gives more informative outputs with less computation. The setting and analytic traits are elaborated in this paper, followed by a series of experiments carried out on FDDB, which show that the proposed model is sound, efficient and analytic. |
Tasks | Object Detection |
Published | 2018-08-16 |
URL | http://arxiv.org/abs/1808.08272v1 |
http://arxiv.org/pdf/1808.08272v1.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-model-of-object-detection-based |
Repo | |
Framework | |
Evaluating Conditional Cash Transfer Policies with Machine Learning Methods
Title | Evaluating Conditional Cash Transfer Policies with Machine Learning Methods |
Authors | Tzai-Shuen Chen |
Abstract | This paper presents an out-of-sample prediction comparison between major machine learning models and the structural econometric model. Over the past decade, machine learning has established itself as a powerful tool in many prediction applications, but this approach is still not widely adopted in empirical economic studies. To evaluate the benefits of this approach, I use the most common machine learning algorithms, CART, C4.5, LASSO, random forest, and adaboost, to construct prediction models for a cash transfer experiment conducted by the Progresa program in Mexico, and I compare the prediction results with those of a previous structural econometric study. Two prediction tasks are performed in this paper: the out-of-sample forecast and the long-term within-sample simulation. For the out-of-sample forecast, both the mean absolute error and the root mean square error of the school attendance rates found by all machine learning models are smaller than those found by the structural model. Random forest and adaboost have the highest accuracy for the individual outcomes of all subgroups. For the long-term within-sample simulation, the structural model has better performance than do all of the machine learning models. The poor within-sample fitness of the machine learning model results from the inaccuracy of the income and pregnancy prediction models. The result shows that the machine learning model performs better than does the structural model when there are many data to learn; however, when the data are limited, the structural model offers a more sensible prediction. The findings of this paper show promise for adopting machine learning in economic policy analyses in the era of big data. |
Tasks | |
Published | 2018-03-16 |
URL | http://arxiv.org/abs/1803.06401v1 |
http://arxiv.org/pdf/1803.06401v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-conditional-cash-transfer-policies |
Repo | |
Framework | |
The Graph-based Broad Behavior-Aware Recommendation System for Interactive News
Title | The Graph-based Broad Behavior-Aware Recommendation System for Interactive News |
Authors | Mingyuan Ma, Sen Na, Cong Xu, Xin Fan |
Abstract | In this paper, we propose a heuristic recommendation system for interactive news, called the graph-based broad behavior-aware network (G-BBAN). Different from most of existing work, our network considers six behaviors that may potentially be conducted by users, including unclick, click, like, follow, comment, and share. Further, we introduce the core and coritivity concept from graph theory into the system to measure the concentration degree of interests of each user, which we show can help to improve the performance even further if it’s considered. There are three critical steps in our recommendation system. First, we build a structured user-dependent interaction behavior graph for multi-level and multi-category data as a preprocessing step. This graph constructs the data sources and knowledge information which will be used in G-BBAN through representation learning. Second, for each user node on the graph, we calculate its core and coritivity and then add the pair as a new feature associated to this user. According to the definition of core and coritivity, this user-dependent feature provides useful insights into the concentration degree of his/her interests and affects the trade-off between accuracy and diversity of the personalized recommendation. Last, we represent item (news) information by entity semantics and environment semantics; design a multi-channel convolutional neural network called G-CNN to learn the semantic information and an attention-based LSTM to learn user’s behavior representation; combine with previous concentration feature and input into another two fully connected layers to finish the classification task. The whole network consists of the final G-BBAN. Through comparing with baselines and several variates of itself, our proposed method shows the superior performance in extensive experiments. |
Tasks | Representation Learning |
Published | 2018-11-30 |
URL | http://arxiv.org/abs/1812.00002v1 |
http://arxiv.org/pdf/1812.00002v1.pdf | |
PWC | https://paperswithcode.com/paper/the-graph-based-broad-behavior-aware |
Repo | |
Framework | |
Deep Multi-Spectral Registration Using Invariant Descriptor Learning
Title | Deep Multi-Spectral Registration Using Invariant Descriptor Learning |
Authors | Nati Ofir, Shai Silberstein, Hila Levi, Dani Rozenbaum, Yosi Keller, Sharon Duvdevani Bar |
Abstract | In this paper, we introduce a novel deep-learning method to align cross-spectral images. Our approach relies on a learned descriptor which is invariant to different spectra. Multi-modal images of the same scene capture different signals and therefore their registration is challenging and it is not solved by classic approaches. To that end, we developed a feature-based approach that solves the visible (VIS) to Near-Infra-Red (NIR) registration problem. Our algorithm detects corners by Harris and matches them by a patch-metric learned on top of CIFAR-10 network descriptor. As our experiments demonstrate we achieve a high-quality alignment of cross-spectral images with a sub-pixel accuracy. Comparing to other existing methods, our approach is more accurate in the task of VIS to NIR registration. |
Tasks | |
Published | 2018-01-16 |
URL | http://arxiv.org/abs/1801.05171v6 |
http://arxiv.org/pdf/1801.05171v6.pdf | |
PWC | https://paperswithcode.com/paper/deep-multi-spectral-registration-using |
Repo | |
Framework | |