Paper Group ANR 947
Joint Embedding Learning and Low-Rank Approximation: A Framework for Incomplete Multi-view Learning. Stochastic Variance-Reduced Cubic Regularized Newton Method. On the difficulty of a distributional semantics of spoken language. Deep Collective Matrix Factorization for Augmented Multi-View Learning. Learning to Bound the Multi-class Bayes Error. S …
Joint Embedding Learning and Low-Rank Approximation: A Framework for Incomplete Multi-view Learning
Title | Joint Embedding Learning and Low-Rank Approximation: A Framework for Incomplete Multi-view Learning |
Authors | Hong Tao, Chenping Hou, Dongyun Yi, Jubo Zhu, Dewen Hu |
Abstract | In real-world applications, not all instances in multi-view data are fully represented. To deal with incomplete data, Incomplete Multi-view Learning (IML) rises. In this paper, we propose the Joint Embedding Learning and Low-Rank Approximation (JELLA) framework for IML. The JELLA framework approximates the incomplete data by a set of low-rank matrices and learns a full and common embedding by linear transformation. Several existing IML methods can be unified as special cases of the framework. More interestingly, some linear transformation based complete multi-view methods can be adapted to IML directly with the guidance of the framework. Thus, the JELLA framework improves the efficiency of processing incomplete multi-view data, and bridges the gap between complete multi-view learning and IML. Moreover, the JELLA framework can provide guidance for developing new algorithms. For illustration, within the framework, we propose the Incomplete Multi-view Learning with Block Diagonal Representation (IML-BDR) method. Assuming that the sampled examples have approximate linear subspace structure, IML-BDR uses the block diagonal structure prior to learn the full embedding, which would lead to more correct clustering. A convergent alternating iterative algorithm with the Successive Over-Relaxation optimization technique is devised for optimization. Experimental results on various datasets demonstrate the effectiveness of IML-BDR. |
Tasks | MULTI-VIEW LEARNING |
Published | 2018-12-25 |
URL | https://arxiv.org/abs/1812.10012v2 |
https://arxiv.org/pdf/1812.10012v2.pdf | |
PWC | https://paperswithcode.com/paper/joint-embedding-learning-and-low-rank |
Repo | |
Framework | |
Stochastic Variance-Reduced Cubic Regularized Newton Method
Title | Stochastic Variance-Reduced Cubic Regularized Newton Method |
Authors | Dongruo Zhou, Pan Xu, Quanquan Gu |
Abstract | We propose a stochastic variance-reduced cubic regularized Newton method for non-convex optimization. At the core of our algorithm is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic regularization method. We show that our algorithm is guaranteed to converge to an $(\epsilon,\sqrt{\epsilon})$-approximately local minimum within $\tilde{O}(n^{4/5}/\epsilon^{3/2})$ second-order oracle calls, which outperforms the state-of-the-art cubic regularization algorithms including subsampled cubic regularization. Our work also sheds light on the application of variance reduction technique to high-order non-convex optimization methods. Thorough experiments on various non-convex optimization problems support our theory. |
Tasks | |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04796v1 |
http://arxiv.org/pdf/1802.04796v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-variance-reduced-cubic-regularized |
Repo | |
Framework | |
On the difficulty of a distributional semantics of spoken language
Title | On the difficulty of a distributional semantics of spoken language |
Authors | Grzegorz Chrupała, Lieke Gelderloos, Ákos Kádár, Afra Alishahi |
Abstract | In the domain of unsupervised learning most work on speech has focused on discovering low-level constructs such as phoneme inventories or word-like units. In contrast, for written language, where there is a large body of work on unsupervised induction of semantic representations of words, whole sentences and longer texts. In this study we examine the challenges of adapting these approaches from written to spoken language. We conjecture that unsupervised learning of the semantics of spoken language becomes feasible if we abstract from the surface variability. We simulate this setting with a dataset of utterances spoken by a realistic but uniform synthetic voice. We evaluate two simple unsupervised models which, to varying degrees of success, learn semantic representations of speech fragments. Finally we present inconclusive results on human speech, and discuss the challenges inherent in learning distributional semantic representations on unrestricted natural spoken language. |
Tasks | |
Published | 2018-03-23 |
URL | http://arxiv.org/abs/1803.08869v2 |
http://arxiv.org/pdf/1803.08869v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-difficulty-of-a-distributional |
Repo | |
Framework | |
Deep Collective Matrix Factorization for Augmented Multi-View Learning
Title | Deep Collective Matrix Factorization for Augmented Multi-View Learning |
Authors | Ragunathan Mariappan, Vaibhav Rajan |
Abstract | Learning by integrating multiple heterogeneous data sources is a common requirement in many tasks. Collective Matrix Factorization (CMF) is a technique to learn shared latent representations from arbitrary collections of matrices. It can be used to simultaneously complete one or more matrices, for predicting the unknown entries. Classical CMF methods assume linearity in the interaction of latent factors which can be restrictive and fails to capture complex non-linear interactions. In this paper, we develop the first deep-learning based method, called dCMF, for unsupervised learning of multiple shared representations, that can model such non-linear interactions, from an arbitrary collection of matrices. We address optimization challenges that arise due to dependencies between shared representations through Multi-Task Bayesian Optimization and design an acquisition function adapted for collective learning of hyperparameters. Our experiments show that dCMF significantly outperforms previous CMF algorithms in integrating heterogeneous data for predictive modeling. Further, on two tasks - recommendation and prediction of gene-disease association - dCMF outperforms state-of-the-art matrix completion algorithms that can utilize auxiliary sources of information. |
Tasks | Matrix Completion, MULTI-VIEW LEARNING |
Published | 2018-11-28 |
URL | http://arxiv.org/abs/1811.11427v2 |
http://arxiv.org/pdf/1811.11427v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-collective-matrix-factorization-for |
Repo | |
Framework | |
Learning to Bound the Multi-class Bayes Error
Title | Learning to Bound the Multi-class Bayes Error |
Authors | Salimeh Yasaei Sekeh, Brandon Oselio, Alfred O. Hero |
Abstract | In the context of supervised learning, meta learning uses features, metadata and other information to learn about the difficulty, behavior, or composition of the problem. Using this knowledge can be useful to contextualize classifier results or allow for targeted decisions about future data sampling. In this paper, we are specifically interested in learning the Bayes error rate (BER) based on a labeled data sample. Providing a tight bound on the BER that is also feasible to estimate has been a challenge. Previous work[1] has shown that a pairwise bound based on the sum of Henze-Penrose (HP) divergence over label pairs can be directly estimated using a sum of Friedman-Rafsky (FR) multivariate run test statistics. However, in situations in which the dataset and number of classes are large, this bound is computationally infeasible to calculate and may not be tight. Other multi-class bounds also suffer from computationally complex estimation procedures. In this paper, we present a generalized HP divergence measure that allows us to estimate the Bayes error rate with log-linear computation. We prove that the proposed bound is tighter than both the pairwise method and a bound proposed by Lin [2]. We also empirically show that these bounds are close to the BER. We illustrate the proposed method on the MNIST dataset, and show its utility for the evaluation of feature reduction strategies. We further demonstrate an approach for evaluation of deep learning architectures using the proposed bounds. |
Tasks | Meta-Learning |
Published | 2018-11-15 |
URL | https://arxiv.org/abs/1811.06419v2 |
https://arxiv.org/pdf/1811.06419v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-bound-the-multi-class-bayes-error |
Repo | |
Framework | |
Survey and cross-benchmark comparison of remaining time prediction methods in business process monitoring
Title | Survey and cross-benchmark comparison of remaining time prediction methods in business process monitoring |
Authors | Ilya Verenich, Marlon Dumas, Marcello La Rosa, Fabrizio Maggi, Irene Teinemaa |
Abstract | Predictive business process monitoring methods exploit historical process execution logs to generate predictions about running instances (called cases) of a business process, such as the prediction of the outcome, next activity or remaining cycle time of a given process case. These insights could be used to support operational managers in taking remedial actions as business processes unfold, e.g. shifting resources from one case onto another to ensure this latter is completed on time. A number of methods to tackle the remaining cycle time prediction problem have been proposed in the literature. However, due to differences in their experimental setup, choice of datasets, evaluation measures and baselines, the relative merits of each method remain unclear. This article presents a systematic literature review and taxonomy of methods for remaining time prediction in the context of business processes, as well as a cross-benchmark comparison of 16 such methods based on 16 real-life datasets originating from different industry domains. |
Tasks | |
Published | 2018-05-08 |
URL | http://arxiv.org/abs/1805.02896v2 |
http://arxiv.org/pdf/1805.02896v2.pdf | |
PWC | https://paperswithcode.com/paper/survey-and-cross-benchmark-comparison-of |
Repo | |
Framework | |
Aggregating Predictions on Multiple Non-disclosed Datasets using Conformal Prediction
Title | Aggregating Predictions on Multiple Non-disclosed Datasets using Conformal Prediction |
Authors | Ola Spjuth, Lars Carlsson, Niharika Gauraha |
Abstract | Conformal Prediction is a machine learning methodology that produces valid prediction regions under mild conditions. In this paper, we explore the application of making predictions over multiple data sources of different sizes without disclosing data between the sources. We propose that each data source applies a transductive conformal predictor independently using the local data, and that the individual predictions are then aggregated to form a combined prediction region. We demonstrate the method on several data sets, and show that the proposed method produces conservatively valid predictions and reduces the variance in the aggregated predictions. We also study the effect that the number of data sources and size of each source has on aggregated predictions, as compared with equally sized sources and pooled data. |
Tasks | |
Published | 2018-06-11 |
URL | http://arxiv.org/abs/1806.04000v2 |
http://arxiv.org/pdf/1806.04000v2.pdf | |
PWC | https://paperswithcode.com/paper/aggregating-predictions-on-multiple-non |
Repo | |
Framework | |
A Unified View of Causal and Non-causal Feature Selection
Title | A Unified View of Causal and Non-causal Feature Selection |
Authors | Kui Yu, Lin Liu, Jiuyong Li |
Abstract | In this paper, we aim to develop a unified view of causal and non-causal feature selection methods. The unified view will fill in the gap in the research of the relation between the two types of methods. Based on the Bayesian network framework and information theory, we first show that causal and non-causal feature selection methods share the same objective. That is to find the Markov blanket of a class attribute, the theoretically optimal feature set for classification. We then examine the assumptions made by causal and non-causal feature selection methods when searching for the optimal feature set, and unify the assumptions by mapping them to the restrictions on the structure of the Bayesian network model of the studied problem. We further analyze in detail how the structural assumptions lead to the different levels of approximations employed by the methods in their search, which then result in the approximations in the feature sets found by the methods with respect to the optimal feature set. With the unified view, we are able to interpret the output of non-causal methods from a causal perspective and derive the error bounds of both types of methods. Finally, we present practical understanding of the relation between causal and non-causal methods using extensive experiments with synthetic data and various types of real-word data. |
Tasks | Feature Selection |
Published | 2018-02-16 |
URL | http://arxiv.org/abs/1802.05844v4 |
http://arxiv.org/pdf/1802.05844v4.pdf | |
PWC | https://paperswithcode.com/paper/a-unified-view-of-causal-and-non-causal |
Repo | |
Framework | |
Predicting thermoelectric properties from crystal graphs and material descriptors - first application for functional materials
Title | Predicting thermoelectric properties from crystal graphs and material descriptors - first application for functional materials |
Authors | Leo Laugier, Daniil Bash, Jose Recatala, Hong Kuan Ng, Savitha Ramasamy, Chuan-Sheng Foo, Vijay R Chandrasekhar, Kedar Hippalgaonkar |
Abstract | We introduce the use of Crystal Graph Convolutional Neural Networks (CGCNN), Fully Connected Neural Networks (FCNN) and XGBoost to predict thermoelectric properties. The dataset for the CGCNN is independent of Density Functional Theory (DFT) and only relies on the crystal and atomic information, while that for the FCNN is based on a rich attribute list mined from Materialsproject.org. The results show that the optimized FCNN is three layer deep and is able to predict the scattering-time independent thermoelectric powerfactor much better than the CGCNN (or XGBoost), suggesting that bonding and density of states descriptors informed from materials science knowledge obtained partially from DFT are vital to predict functional properties. |
Tasks | |
Published | 2018-11-15 |
URL | http://arxiv.org/abs/1811.06219v1 |
http://arxiv.org/pdf/1811.06219v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-thermoelectric-properties-from |
Repo | |
Framework | |
SDN Flow Entry Management Using Reinforcement Learning
Title | SDN Flow Entry Management Using Reinforcement Learning |
Authors | Ting-Yu Mu, Ala Al-Fuqaha, Khaled Shuaib, Farag M. Sallabi, Junaid Qadir |
Abstract | Modern information technology services largely depend on cloud infrastructures to provide their services. These cloud infrastructures are built on top of datacenter networks (DCNs) constructed with high-speed links, fast switching gear, and redundancy to offer better flexibility and resiliency. In this environment, network traffic includes long-lived (elephant) and short-lived (mice) flows with partitioned and aggregated traffic patterns. Although SDN-based approaches can efficiently allocate networking resources for such flows, the overhead due to network reconfiguration can be significant. With limited capacity of Ternary Content-Addressable Memory (TCAM) deployed in an OpenFlow enabled switch, it is crucial to determine which forwarding rules should remain in the flow table, and which rules should be processed by the SDN controller in case of a table-miss on the SDN switch. This is needed in order to obtain the flow entries that satisfy the goal of reducing the long-term control plane overhead introduced between the controller and the switches. To achieve this goal, we propose a machine learning technique that utilizes two variations of reinforcement learning (RL) algorithms-the first of which is traditional reinforcement learning algorithm based while the other is deep reinforcement learning based. Emulation results using the RL algorithm show around 60% improvement in reducing the long-term control plane overhead, and around 14% improvement in the table-hit ratio compared to the Multiple Bloom Filters (MBF) method given a fixed size flow table of 4KB. |
Tasks | |
Published | 2018-09-24 |
URL | http://arxiv.org/abs/1809.09003v1 |
http://arxiv.org/pdf/1809.09003v1.pdf | |
PWC | https://paperswithcode.com/paper/sdn-flow-entry-management-using-reinforcement |
Repo | |
Framework | |
Spectral Filtering for General Linear Dynamical Systems
Title | Spectral Filtering for General Linear Dynamical Systems |
Authors | Elad Hazan, Holden Lee, Karan Singh, Cyril Zhang, Yi Zhang |
Abstract | We give a polynomial-time algorithm for learning latent-state linear dynamical systems without system identification, and without assumptions on the spectral radius of the system’s transition matrix. The algorithm extends the recently introduced technique of spectral filtering, previously applied only to systems with a symmetric transition matrix, using a novel convex relaxation to allow for the efficient identification of phases. |
Tasks | |
Published | 2018-02-12 |
URL | http://arxiv.org/abs/1802.03981v1 |
http://arxiv.org/pdf/1802.03981v1.pdf | |
PWC | https://paperswithcode.com/paper/spectral-filtering-for-general-linear |
Repo | |
Framework | |
A New Concept of Deep Reinforcement Learning based Augmented General Sequence Tagging System
Title | A New Concept of Deep Reinforcement Learning based Augmented General Sequence Tagging System |
Authors | Yu Wang, Abhishek Patel, Hongxia Jin |
Abstract | In this paper, a new deep reinforcement learning based augmented general sequence tagging system is proposed. The new system contains two parts: a deep neural network (DNN) based sequence tagging model and a deep reinforcement learning (DRL) based augmented tagger. The augmented tagger helps improve system performance by modeling the data with minority tags. The new system is evaluated on SLU and NLU sequence tagging tasks using ATIS and CoNLL-2003 benchmark datasets, to demonstrate the new system’s outstanding performance on general tagging tasks. Evaluated by F1 scores, it shows that the new system outperforms the current state-of-the-art model on ATIS dataset by 1.9% and that on CoNLL-2003 dataset by 1.4%. |
Tasks | |
Published | 2018-12-26 |
URL | http://arxiv.org/abs/1812.10234v1 |
http://arxiv.org/pdf/1812.10234v1.pdf | |
PWC | https://paperswithcode.com/paper/a-new-concept-of-deep-reinforcement-learning |
Repo | |
Framework | |
Brain Tumor Image Retrieval via Multitask Learning
Title | Brain Tumor Image Retrieval via Multitask Learning |
Authors | Maxim Pisov, Gleb Makarchuk, Valery Kostjuchenko, Alexandra Dalechina, Andrey Golanov, Mikhail Belyaev |
Abstract | Classification-based image retrieval systems are built by training convolutional neural networks (CNNs) on a relevant classification problem and using the distance in the resulting feature space as a similarity metric. However, in practical applications, it is often desirable to have representations which take into account several aspects of the data (e.g., brain tumor type and its localization). In our work, we extend the classification-based approach with multitask learning: we train a CNN on brain MRI scans with heterogeneous labels and implement a corresponding tumor image retrieval system. We validate our approach on brain tumor data which contains information about tumor types, shapes and localization. We show that our method allows us to build representations that contain more relevant information about tumors than single-task classification-based approaches. |
Tasks | Image Retrieval |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09369v1 |
http://arxiv.org/pdf/1810.09369v1.pdf | |
PWC | https://paperswithcode.com/paper/brain-tumor-image-retrieval-via-multitask |
Repo | |
Framework | |
DifNet: Semantic Segmentation by Diffusion Networks
Title | DifNet: Semantic Segmentation by Diffusion Networks |
Authors | Peng Jiang, Fanglin Gu, Yunhai Wang, Changhe Tu, Baoquan Chen |
Abstract | Deep Neural Networks (DNNs) have recently shown state of the art performance on semantic segmentation tasks, however, they still suffer from problems of poor boundary localization and spatial fragmented predictions. The difficulties lie in the requirement of making dense predictions from a long path model all at once since details are hard to keep when data goes through deeper layers. Instead, in this work, we decompose this difficult task into two relative simple sub-tasks: seed detection which is required to predict initial predictions without the need of wholeness and preciseness, and similarity estimation which measures the possibility of any two nodes belong to the same class without the need of knowing which class they are. We use one branch network for one sub-task each, and apply a cascade of random walks base on hierarchical semantics to approximate a complex diffusion process which propagates seed information to the whole image according to the estimated similarities. The proposed DifNet consistently produces improvements over the baseline models with the same depth and with the equivalent number of parameters, and also achieves promising performance on Pascal VOC and Pascal Context dataset. OurDifNet is trained end-to-end without complex loss functions. |
Tasks | Semantic Segmentation |
Published | 2018-05-21 |
URL | http://arxiv.org/abs/1805.08015v4 |
http://arxiv.org/pdf/1805.08015v4.pdf | |
PWC | https://paperswithcode.com/paper/difnet-semantic-segmentation-by-diffusion |
Repo | |
Framework | |
Language Style Transfer from Sentences with Arbitrary Unknown Styles
Title | Language Style Transfer from Sentences with Arbitrary Unknown Styles |
Authors | Yanpeng Zhao, Wei Bi, Deng Cai, Xiaojiang Liu, Kewei Tu, Shuming Shi |
Abstract | Language style transfer is the problem of migrating the content of a source sentence to a target style. In many of its applications, parallel training data are not available and source sentences to be transferred may have arbitrary and unknown styles. First, each sentence is encoded into its content and style latent representations. Then, by recombining the content with the target style, we decode a sentence aligned in the target domain. To adequately constrain the encoding and decoding functions, we couple them with two loss functions. The first is a style discrepancy loss, enforcing that the style representation accurately encodes the style information guided by the discrepancy between the sentence style and the target style. The second is a cycle consistency loss, which ensures that the transferred sentence should preserve the content of the original sentence disentangled from its style. We validate the effectiveness of our model in three tasks: sentiment modification of restaurant reviews, dialog response revision with a romantic style, and sentence rewriting with a Shakespearean style. |
Tasks | Style Transfer |
Published | 2018-08-13 |
URL | http://arxiv.org/abs/1808.04071v1 |
http://arxiv.org/pdf/1808.04071v1.pdf | |
PWC | https://paperswithcode.com/paper/language-style-transfer-from-sentences-with |
Repo | |
Framework | |