Paper Group ANR 340
Parametric Models for Mutual Kernel Matrix Completion. Automatic Derivation Of Formulas Using Reforcement Learning. Adaptive Denoising of Signals with Shift-Invariant Structure. Deep clustering of longitudinal data. From Random to Supervised: A Novel Dropout Mechanism Integrated with Global Information. SJTU-NLP at SemEval-2018 Task 9: Neural Hyper …
Parametric Models for Mutual Kernel Matrix Completion
Title | Parametric Models for Mutual Kernel Matrix Completion |
Authors | Rachelle Rivero, Tsuyoshi Kato |
Abstract | Recent studies utilize multiple kernel learning to deal with incomplete-data problem. In this study, we introduce new methods that do not only complete multiple incomplete kernel matrices simultaneously, but also allow control of the flexibility of the model by parameterizing the model matrix. By imposing restrictions on the model covariance, overfitting of the data is avoided. A limitation of kernel matrix estimations done via optimization of an objective function is that the positive definiteness of the result is not guaranteed. In view of this limitation, our proposed methods employ the LogDet divergence, which ensures the positive definiteness of the resulting inferred kernel matrix. We empirically show that our proposed restricted covariance models, employed with LogDet divergence, yield significant improvements in the generalization performance of previous completion methods. |
Tasks | Matrix Completion |
Published | 2018-04-17 |
URL | http://arxiv.org/abs/1804.06095v1 |
http://arxiv.org/pdf/1804.06095v1.pdf | |
PWC | https://paperswithcode.com/paper/parametric-models-for-mutual-kernel-matrix |
Repo | |
Framework | |
Automatic Derivation Of Formulas Using Reforcement Learning
Title | Automatic Derivation Of Formulas Using Reforcement Learning |
Authors | MinZhong Luo, Li Liu |
Abstract | This paper presents an artificial intelligence algorithm that can be used to derive formulas from various scientific disciplines called automatic derivation machine. First, the formula is abstractly expressed as a multiway tree model, and then each step of the formula derivation transformation is abstracted as a mapping of multiway trees. Derivation steps similar can be expressed as a reusable formula template by a multiway tree map. After that, the formula multiway tree is eigen-encoded to feature vectors construct the feature space of formulas, the Q-learning model using in this feature space can achieve the derivation by making training data from derivation process. Finally, an automatic formula derivation machine is made to choose the next derivation step based on the current state and object. We also make an example about the nuclear reactor physics problem to show how the automatic derivation machine works. |
Tasks | Q-Learning |
Published | 2018-08-15 |
URL | http://arxiv.org/abs/1808.04946v1 |
http://arxiv.org/pdf/1808.04946v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-derivation-of-formulas-using |
Repo | |
Framework | |
Adaptive Denoising of Signals with Shift-Invariant Structure
Title | Adaptive Denoising of Signals with Shift-Invariant Structure |
Authors | Dmitrii Ostrovskii, Zaid Harchaoui, Anatoli Juditsky, Arkadi Nemirovski |
Abstract | We study the problem of discrete-time signal denoising, following the line of research initiated by [Nem91] and further developed in [JN09, JN10, HJNO15, OHJN16]. Previous papers considered the following setup: the signal is assumed to admit a convolution-type linear oracle – an unknown linear estimator in the form of the convolution of the observations with an unknown time-invariant filter with small $\ell_2$-norm. It was shown that such an oracle can be “mimicked” by an efficiently computable non-linear convolution-type estimator, in which the filter minimizes the Fourier-domain $\ell_\infty$-norm of the residual, regularized by the Fourier-domain $\ell_1$-norm of the filter. Following [OHJN16], here we study an alternative family of estimators, replacing the $\ell_\infty$-norm of the residual with the $\ell_2$-norm. Such estimators are found to have better statistical properties, in particular, we prove sharp oracle inequalities for their $\ell_2$-loss. Our guarantees require an extra assumption of approximate shift-invariance: the signal must be $\varkappa$-close, in $\ell_2$-metric, to some shift-invariant linear subspace with bounded dimension $s$. However, this subspace can be completely unknown, and the remainder terms in the oracle inequalities scale at most polynomially with $s$ and $\varkappa$. In conclusion, we show that the new assumption implies the previously considered one, providing explicit constructions of the convolution-type linear oracles with $\ell_2$-norm bounded in terms of parameters $s$ and $\varkappa$. |
Tasks | Denoising |
Published | 2018-06-11 |
URL | http://arxiv.org/abs/1806.04028v1 |
http://arxiv.org/pdf/1806.04028v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-denoising-of-signals-with-shift |
Repo | |
Framework | |
Deep clustering of longitudinal data
Title | Deep clustering of longitudinal data |
Authors | Louis Falissard, Guy Fagherazzi, Newton Howard, Bruno Falissard |
Abstract | Deep neural networks are a family of computational models that have led to a dramatical improvement of the state of the art in several domains such as image, voice or text analysis. These methods provide a framework to model complex, non-linear interactions in large datasets, and are naturally suited to the analysis of hierarchical data such as, for instance, longitudinal data with the use of recurrent neural networks. In the other hand, cohort studies have become a tool of importance in the research field of epidemiology. In such studies, variables are measured repeatedly over time, to allow the practitioner to study their temporal evolution as trajectories, and, as such, as longitudinal data. This paper investigates the application of the advanced modelling techniques provided by the deep learning framework in the analysis of the longitudinal data provided by cohort studies. Methods: A method for visualizing and clustering longitudinal dataset is proposed, and compared to other widely used approaches to the problem on both real and simulated datasets. Results: The proposed method is shown to be coherent with the preexisting procedures on simple tasks, and to outperform them on more complex tasks such as the partitioning of longitudinal datasets into non-spherical clusters. Conclusion: Deep artificial neural networks can be used to visualize longitudinal data in a low dimensional manifold that is much simpler to interpret than traditional longitudinal plots are. Consequently, practitioners should start considering the use of deep artificial neural networks for the analysis of their longitudinal data in studies to come. |
Tasks | Epidemiology |
Published | 2018-02-09 |
URL | http://arxiv.org/abs/1802.03212v1 |
http://arxiv.org/pdf/1802.03212v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-clustering-of-longitudinal-data |
Repo | |
Framework | |
From Random to Supervised: A Novel Dropout Mechanism Integrated with Global Information
Title | From Random to Supervised: A Novel Dropout Mechanism Integrated with Global Information |
Authors | Hengru Xu, Shen Li, Renfen Hu, Si Li, Sheng Gao |
Abstract | Dropout is used to avoid overfitting by randomly dropping units from the neural networks during training. Inspired by dropout, this paper presents GI-Dropout, a novel dropout method integrating with global information to improve neural networks for text classification. Unlike the traditional dropout method in which the units are dropped randomly according to the same probability, we aim to use explicit instructions based on global information of the dataset to guide the training process. With GI-Dropout, the model is supposed to pay more attention to inapparent features or patterns. Experiments demonstrate the effectiveness of the dropout with global information on seven text classification tasks, including sentiment analysis and topic classification. |
Tasks | Sentiment Analysis, Text Classification |
Published | 2018-08-24 |
URL | http://arxiv.org/abs/1808.08149v3 |
http://arxiv.org/pdf/1808.08149v3.pdf | |
PWC | https://paperswithcode.com/paper/from-random-to-supervised-a-novel-dropout |
Repo | |
Framework | |
SJTU-NLP at SemEval-2018 Task 9: Neural Hypernym Discovery with Term Embeddings
Title | SJTU-NLP at SemEval-2018 Task 9: Neural Hypernym Discovery with Term Embeddings |
Authors | Zhuosheng Zhang, Jiangtong Li, Hai Zhao, Bingjie Tang |
Abstract | This paper describes a hypernym discovery system for our participation in the SemEval-2018 Task 9, which aims to discover the best (set of) candidate hypernyms for input concepts or entities, given the search space of a pre-defined vocabulary. We introduce a neural network architecture for the concerned task and empirically study various neural network models to build the representations in latent space for words and phrases. The evaluated models include convolutional neural network, long-short term memory network, gated recurrent unit and recurrent convolutional neural network. We also explore different embedding methods, including word embedding and sense embedding for better performance. |
Tasks | Hypernym Discovery |
Published | 2018-05-26 |
URL | http://arxiv.org/abs/1805.10465v1 |
http://arxiv.org/pdf/1805.10465v1.pdf | |
PWC | https://paperswithcode.com/paper/sjtu-nlp-at-semeval-2018-task-9-neural |
Repo | |
Framework | |
Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks
Title | Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks |
Authors | Peng Wang, Qi Wu, Jiewei Cao, Chunhua Shen, Lianli Gao, Anton van den Hengel |
Abstract | The task in referring expression comprehension is to localise the object instance in an image described by a referring expression phrased in natural language. As a language-to-vision matching task, the key to this problem is to learn a discriminative object feature that can adapt to the expression used. To avoid ambiguity, the expression normally tends to describe not only the properties of the referent itself, but also its relationships to its neighbourhood. To capture and exploit this important information we propose a graph-based, language-guided attention mechanism. Being composed of node attention component and edge attention component, the proposed graph attention mechanism explicitly represents inter-object relationships, and properties with a flexibility and power impossible with competing approaches. Furthermore, the proposed graph attention mechanism enables the comprehension decision to be visualisable and explainable. Experiments on three referring expression comprehension datasets show the advantage of the proposed approach. |
Tasks | |
Published | 2018-12-12 |
URL | http://arxiv.org/abs/1812.04794v1 |
http://arxiv.org/pdf/1812.04794v1.pdf | |
PWC | https://paperswithcode.com/paper/neighbourhood-watch-referring-expression |
Repo | |
Framework | |
Inheritance-Based Diversity Measures for Explicit Convergence Control in Evolutionary Algorithms
Title | Inheritance-Based Diversity Measures for Explicit Convergence Control in Evolutionary Algorithms |
Authors | Thomas Gabor, Lenz Belzner, Claudia Linnhoff-Popien |
Abstract | Diversity is an important factor in evolutionary algorithms to prevent premature convergence towards a single local optimum. In order to maintain diversity throughout the process of evolution, various means exist in literature. We analyze approaches to diversity that (a) have an explicit and quantifiable influence on fitness at the individual level and (b) require no (or very little) additional domain knowledge such as domain-specific distance functions. We also introduce the concept of genealogical diversity in a broader study. We show that employing these approaches can help evolutionary algorithms for global optimization in many cases. |
Tasks | |
Published | 2018-10-30 |
URL | http://arxiv.org/abs/1810.12470v1 |
http://arxiv.org/pdf/1810.12470v1.pdf | |
PWC | https://paperswithcode.com/paper/inheritance-based-diversity-measures-for |
Repo | |
Framework | |
Markov Chain Importance Sampling – a highly efficient estimator for MCMC
Title | Markov Chain Importance Sampling – a highly efficient estimator for MCMC |
Authors | Ingmar Schuster, Ilja Klebanov |
Abstract | Markov chain (MC) algorithms are ubiquitous in machine learning and statistics and many other disciplines. Typically, these algorithms can be formulated as acceptance rejection methods. In this work we present a novel estimator applicable to these methods, dubbed Markov chain importance sampling (MCIS), which efficiently makes use of rejected proposals. For the unadjusted Langevin algorithm, it provides a novel way of correcting the discretization error. Our estimator satisfies a central limit theorem and improves on error per CPU cycle, often to a large extent. As a by-product it enables estimating the normalizing constant, an important quantity in Bayesian machine learning and statistics. |
Tasks | |
Published | 2018-05-18 |
URL | https://arxiv.org/abs/1805.07179v3 |
https://arxiv.org/pdf/1805.07179v3.pdf | |
PWC | https://paperswithcode.com/paper/markov-chain-importance-sampling-a-highly |
Repo | |
Framework | |
Regret Bounds for Reinforcement Learning via Markov Chain Concentration
Title | Regret Bounds for Reinforcement Learning via Markov Chain Concentration |
Authors | Ronald Ortner |
Abstract | We give a simple optimistic algorithm for which it is easy to derive regret bounds of $\tilde{O}(\sqrt{t_{\rm mix} SAT})$ after $T$ steps in uniformly ergodic Markov decision processes with $S$ states, $A$ actions, and mixing time parameter $t_{\rm mix}$. These bounds are the first regret bounds in the general, non-episodic setting with an optimal dependence on all given parameters. They could only be improved by using an alternative mixing time parameter. |
Tasks | |
Published | 2018-08-06 |
URL | http://arxiv.org/abs/1808.01813v2 |
http://arxiv.org/pdf/1808.01813v2.pdf | |
PWC | https://paperswithcode.com/paper/regret-bounds-for-reinforcement-learning-via |
Repo | |
Framework | |
Towards an Efficient Anomaly-Based Intrusion Detection for Software-Defined Networks
Title | Towards an Efficient Anomaly-Based Intrusion Detection for Software-Defined Networks |
Authors | Majd Latah, Levent Toker |
Abstract | Software-defined networking (SDN) is a new paradigm that allows developing more flexible network applications. SDN controller, which represents a centralized controlling point, is responsible for running various network applications as well as maintaining different network services and functionalities. Choosing an efficient intrusion detection system helps in reducing the overhead of the running controller and creates a more secure network. In this study, we investigate the performance of the well-known anomaly-based intrusion detection approaches in terms of accuracy, false alarm rate, precision, recall, f1-measure, area under ROC curve, execution time and Mc Nemar’s test. Precisely, we focus on supervised machine-learning approaches where we use the following classifiers: Decision Trees (DT), Extreme Learning Machine (ELM), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Neural Networks (NN), Support Vector Machines (SVM), Random Forest (RT), K Nearest-Neighbour (KNN), AdaBoost, RUSBoost, LogitBoost and BaggingTrees where we employ the well-known NSL-KDD benchmark dataset to compare the performance of each one of these classifiers. |
Tasks | Intrusion Detection |
Published | 2018-03-18 |
URL | http://arxiv.org/abs/1803.06762v2 |
http://arxiv.org/pdf/1803.06762v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-an-efficient-anomaly-based-intrusion |
Repo | |
Framework | |
A Novel Approach to Quantized Matrix Completion Using Huber Loss Measure
Title | A Novel Approach to Quantized Matrix Completion Using Huber Loss Measure |
Authors | Ashkan Esmaeili, Farokh Marvasti |
Abstract | In this paper, we introduce a novel and robust approach to Quantized Matrix Completion (QMC). First, we propose a rank minimization problem with constraints induced by quantization bounds. Next, we form an unconstrained optimization problem by regularizing the rank function with Huber loss. Huber loss is leveraged to control the violation from quantization bounds due to two properties: 1- It is differentiable, 2- It is less sensitive to outliers than the quadratic loss. A Smooth Rank Approximation is utilized to endorse lower rank on the genuine data matrix. Thus, an unconstrained optimization problem with differentiable objective function is obtained allowing us to advantage from Gradient Descent (GD) technique. Novel and firm theoretical analysis on problem model and convergence of our algorithm to the global solution are provided. Another contribution of our work is that our method does not require projections or initial rank estimation unlike the state- of-the-art. In the Numerical Experiments Section, the noticeable outperformance of our proposed method in learning accuracy and computational complexity compared to those of the state-of- the-art literature methods is illustrated as the main contribution. |
Tasks | Matrix Completion, Quantization |
Published | 2018-10-29 |
URL | http://arxiv.org/abs/1810.12460v1 |
http://arxiv.org/pdf/1810.12460v1.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-approach-to-quantized-matrix |
Repo | |
Framework | |
A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks
Title | A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks |
Authors | César A. Uribe, Soomin Lee, Alexander Gasnikov, Angelia Nedić |
Abstract | We study dual-based algorithms for distributed convex optimization problems over networks, where the objective is to minimize a sum $\sum_{i=1}^{m}f_i(z)$ of functions over in a network. We provide complexity bounds for four different cases, namely: each function $f_i$ is strongly convex and smooth, each function is either strongly convex or smooth, and when it is convex but neither strongly convex nor smooth. Our approach is based on the dual of an appropriately formulated primal problem, which includes a graph that models the communication restrictions. We propose distributed algorithms that achieve the same optimal rates as their centralized counterparts (up to constant and logarithmic factors), with an additional optimal cost related to the spectral properties of the network. Initially, we focus on functions for which we can explicitly minimize its Legendre-Fenchel conjugate, i.e., admissible or dual friendly functions. Then, we study distributed optimization algorithms for non-dual friendly functions, as well as a method to improve the dependency on the parameters of the functions involved. Numerical analysis of the proposed algorithms is also provided. |
Tasks | Distributed Optimization |
Published | 2018-09-03 |
URL | https://arxiv.org/abs/1809.00710v3 |
https://arxiv.org/pdf/1809.00710v3.pdf | |
PWC | https://paperswithcode.com/paper/a-dual-approach-for-optimal-algorithms-in |
Repo | |
Framework | |
A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates
Title | A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates |
Authors | Yossi Arjevani, Ohad Shamir, Nathan Srebro |
Abstract | We provide tight finite-time convergence bounds for gradient descent and stochastic gradient descent on quadratic functions, when the gradients are delayed and reflect iterates from $\tau$ rounds ago. First, we show that without stochastic noise, delays strongly affect the attainable optimization error: In fact, the error can be as bad as non-delayed gradient descent ran on only $1/\tau$ of the gradients. In sharp contrast, we quantify how stochastic noise makes the effect of delays negligible, improving on previous work which only showed this phenomenon asymptotically or for much smaller delays. Also, in the context of distributed optimization, the results indicate that the performance of gradient descent with delays is competitive with synchronous approaches such as mini-batching. Our results are based on a novel technique for analyzing convergence of optimization algorithms using generating functions. |
Tasks | Distributed Optimization |
Published | 2018-06-26 |
URL | http://arxiv.org/abs/1806.10188v1 |
http://arxiv.org/pdf/1806.10188v1.pdf | |
PWC | https://paperswithcode.com/paper/a-tight-convergence-analysis-for-stochastic |
Repo | |
Framework | |
Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization
Title | Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization |
Authors | Jiaxiang Wu, Weidong Huang, Junzhou Huang, Tong Zhang |
Abstract | Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the error compensated quantized stochastic gradient descent algorithm to improve the training efficiency. Local gradients are quantized to reduce the communication overhead, and accumulated quantization error is utilized to speed up the convergence. Furthermore, we present theoretical analysis on the convergence behaviour, and demonstrate its advantage over competitors. Extensive experiments indicate that our algorithm can compress gradients by a factor of up to two magnitudes without performance degradation. |
Tasks | Distributed Optimization, Quantization |
Published | 2018-06-21 |
URL | http://arxiv.org/abs/1806.08054v1 |
http://arxiv.org/pdf/1806.08054v1.pdf | |
PWC | https://paperswithcode.com/paper/error-compensated-quantized-sgd-and-its |
Repo | |
Framework | |