October 19, 2019

2757 words 13 mins read

Paper Group ANR 340

Parametric Models for Mutual Kernel Matrix Completion. Automatic Derivation Of Formulas Using Reforcement Learning. Adaptive Denoising of Signals with Shift-Invariant Structure. Deep clustering of longitudinal data. From Random to Supervised: A Novel Dropout Mechanism Integrated with Global Information. SJTU-NLP at SemEval-2018 Task 9: Neural Hyper …

Parametric Models for Mutual Kernel Matrix Completion


Title	Parametric Models for Mutual Kernel Matrix Completion
Authors	Rachelle Rivero, Tsuyoshi Kato
Abstract	Recent studies utilize multiple kernel learning to deal with incomplete-data problem. In this study, we introduce new methods that do not only complete multiple incomplete kernel matrices simultaneously, but also allow control of the flexibility of the model by parameterizing the model matrix. By imposing restrictions on the model covariance, overfitting of the data is avoided. A limitation of kernel matrix estimations done via optimization of an objective function is that the positive definiteness of the result is not guaranteed. In view of this limitation, our proposed methods employ the LogDet divergence, which ensures the positive definiteness of the resulting inferred kernel matrix. We empirically show that our proposed restricted covariance models, employed with LogDet divergence, yield significant improvements in the generalization performance of previous completion methods.
Tasks	Matrix Completion
Published	2018-04-17
URL	http://arxiv.org/abs/1804.06095v1
PDF	http://arxiv.org/pdf/1804.06095v1.pdf
PWC	https://paperswithcode.com/paper/parametric-models-for-mutual-kernel-matrix
Repo
Framework

Automatic Derivation Of Formulas Using Reforcement Learning


Title	Automatic Derivation Of Formulas Using Reforcement Learning
Authors	MinZhong Luo, Li Liu
Abstract	This paper presents an artificial intelligence algorithm that can be used to derive formulas from various scientific disciplines called automatic derivation machine. First, the formula is abstractly expressed as a multiway tree model, and then each step of the formula derivation transformation is abstracted as a mapping of multiway trees. Derivation steps similar can be expressed as a reusable formula template by a multiway tree map. After that, the formula multiway tree is eigen-encoded to feature vectors construct the feature space of formulas, the Q-learning model using in this feature space can achieve the derivation by making training data from derivation process. Finally, an automatic formula derivation machine is made to choose the next derivation step based on the current state and object. We also make an example about the nuclear reactor physics problem to show how the automatic derivation machine works.
Tasks	Q-Learning
Published	2018-08-15
URL	http://arxiv.org/abs/1808.04946v1
PDF	http://arxiv.org/pdf/1808.04946v1.pdf
PWC	https://paperswithcode.com/paper/automatic-derivation-of-formulas-using
Repo
Framework

Adaptive Denoising of Signals with Shift-Invariant Structure


Title	Adaptive Denoising of Signals with Shift-Invariant Structure
Authors	Dmitrii Ostrovskii, Zaid Harchaoui, Anatoli Juditsky, Arkadi Nemirovski
Abstract	We study the problem of discrete-time signal denoising, following the line of research initiated by [Nem91] and further developed in [JN09, JN10, HJNO15, OHJN16]. Previous papers considered the following setup: the signal is assumed to admit a convolution-type linear oracle – an unknown linear estimator in the form of the convolution of the observations with an unknown time-invariant filter with small $\ell_2$-norm. It was shown that such an oracle can be “mimicked” by an efficiently computable non-linear convolution-type estimator, in which the filter minimizes the Fourier-domain $\ell_\infty$-norm of the residual, regularized by the Fourier-domain $\ell_1$-norm of the filter. Following [OHJN16], here we study an alternative family of estimators, replacing the $\ell_\infty$-norm of the residual with the $\ell_2$-norm. Such estimators are found to have better statistical properties, in particular, we prove sharp oracle inequalities for their $\ell_2$-loss. Our guarantees require an extra assumption of approximate shift-invariance: the signal must be $\varkappa$-close, in $\ell_2$-metric, to some shift-invariant linear subspace with bounded dimension $s$. However, this subspace can be completely unknown, and the remainder terms in the oracle inequalities scale at most polynomially with $s$ and $\varkappa$. In conclusion, we show that the new assumption implies the previously considered one, providing explicit constructions of the convolution-type linear oracles with $\ell_2$-norm bounded in terms of parameters $s$ and $\varkappa$.
Tasks	Denoising
Published	2018-06-11
URL	http://arxiv.org/abs/1806.04028v1
PDF	http://arxiv.org/pdf/1806.04028v1.pdf
PWC	https://paperswithcode.com/paper/adaptive-denoising-of-signals-with-shift
Repo
Framework

Deep clustering of longitudinal data


Title	Deep clustering of longitudinal data
Authors	Louis Falissard, Guy Fagherazzi, Newton Howard, Bruno Falissard
Abstract	Deep neural networks are a family of computational models that have led to a dramatical improvement of the state of the art in several domains such as image, voice or text analysis. These methods provide a framework to model complex, non-linear interactions in large datasets, and are naturally suited to the analysis of hierarchical data such as, for instance, longitudinal data with the use of recurrent neural networks. In the other hand, cohort studies have become a tool of importance in the research field of epidemiology. In such studies, variables are measured repeatedly over time, to allow the practitioner to study their temporal evolution as trajectories, and, as such, as longitudinal data. This paper investigates the application of the advanced modelling techniques provided by the deep learning framework in the analysis of the longitudinal data provided by cohort studies. Methods: A method for visualizing and clustering longitudinal dataset is proposed, and compared to other widely used approaches to the problem on both real and simulated datasets. Results: The proposed method is shown to be coherent with the preexisting procedures on simple tasks, and to outperform them on more complex tasks such as the partitioning of longitudinal datasets into non-spherical clusters. Conclusion: Deep artificial neural networks can be used to visualize longitudinal data in a low dimensional manifold that is much simpler to interpret than traditional longitudinal plots are. Consequently, practitioners should start considering the use of deep artificial neural networks for the analysis of their longitudinal data in studies to come.
Tasks	Epidemiology
Published	2018-02-09
URL	http://arxiv.org/abs/1802.03212v1
PDF	http://arxiv.org/pdf/1802.03212v1.pdf
PWC	https://paperswithcode.com/paper/deep-clustering-of-longitudinal-data
Repo
Framework

From Random to Supervised: A Novel Dropout Mechanism Integrated with Global Information


Title	From Random to Supervised: A Novel Dropout Mechanism Integrated with Global Information
Authors	Hengru Xu, Shen Li, Renfen Hu, Si Li, Sheng Gao
Abstract	Dropout is used to avoid overfitting by randomly dropping units from the neural networks during training. Inspired by dropout, this paper presents GI-Dropout, a novel dropout method integrating with global information to improve neural networks for text classification. Unlike the traditional dropout method in which the units are dropped randomly according to the same probability, we aim to use explicit instructions based on global information of the dataset to guide the training process. With GI-Dropout, the model is supposed to pay more attention to inapparent features or patterns. Experiments demonstrate the effectiveness of the dropout with global information on seven text classification tasks, including sentiment analysis and topic classification.
Tasks	Sentiment Analysis, Text Classification
Published	2018-08-24
URL	http://arxiv.org/abs/1808.08149v3
PDF	http://arxiv.org/pdf/1808.08149v3.pdf
PWC	https://paperswithcode.com/paper/from-random-to-supervised-a-novel-dropout
Repo
Framework

SJTU-NLP at SemEval-2018 Task 9: Neural Hypernym Discovery with Term Embeddings


Title	SJTU-NLP at SemEval-2018 Task 9: Neural Hypernym Discovery with Term Embeddings
Authors	Zhuosheng Zhang, Jiangtong Li, Hai Zhao, Bingjie Tang
Abstract	This paper describes a hypernym discovery system for our participation in the SemEval-2018 Task 9, which aims to discover the best (set of) candidate hypernyms for input concepts or entities, given the search space of a pre-defined vocabulary. We introduce a neural network architecture for the concerned task and empirically study various neural network models to build the representations in latent space for words and phrases. The evaluated models include convolutional neural network, long-short term memory network, gated recurrent unit and recurrent convolutional neural network. We also explore different embedding methods, including word embedding and sense embedding for better performance.
Tasks	Hypernym Discovery
Published	2018-05-26
URL	http://arxiv.org/abs/1805.10465v1
PDF	http://arxiv.org/pdf/1805.10465v1.pdf
PWC	https://paperswithcode.com/paper/sjtu-nlp-at-semeval-2018-task-9-neural
Repo
Framework

Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks


Title	Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks
Authors	Peng Wang, Qi Wu, Jiewei Cao, Chunhua Shen, Lianli Gao, Anton van den Hengel
Abstract	The task in referring expression comprehension is to localise the object instance in an image described by a referring expression phrased in natural language. As a language-to-vision matching task, the key to this problem is to learn a discriminative object feature that can adapt to the expression used. To avoid ambiguity, the expression normally tends to describe not only the properties of the referent itself, but also its relationships to its neighbourhood. To capture and exploit this important information we propose a graph-based, language-guided attention mechanism. Being composed of node attention component and edge attention component, the proposed graph attention mechanism explicitly represents inter-object relationships, and properties with a flexibility and power impossible with competing approaches. Furthermore, the proposed graph attention mechanism enables the comprehension decision to be visualisable and explainable. Experiments on three referring expression comprehension datasets show the advantage of the proposed approach.
Tasks
Published	2018-12-12
URL	http://arxiv.org/abs/1812.04794v1
PDF	http://arxiv.org/pdf/1812.04794v1.pdf
PWC	https://paperswithcode.com/paper/neighbourhood-watch-referring-expression
Repo
Framework

Inheritance-Based Diversity Measures for Explicit Convergence Control in Evolutionary Algorithms


Title	Inheritance-Based Diversity Measures for Explicit Convergence Control in Evolutionary Algorithms
Authors	Thomas Gabor, Lenz Belzner, Claudia Linnhoff-Popien
Abstract	Diversity is an important factor in evolutionary algorithms to prevent premature convergence towards a single local optimum. In order to maintain diversity throughout the process of evolution, various means exist in literature. We analyze approaches to diversity that (a) have an explicit and quantifiable influence on fitness at the individual level and (b) require no (or very little) additional domain knowledge such as domain-specific distance functions. We also introduce the concept of genealogical diversity in a broader study. We show that employing these approaches can help evolutionary algorithms for global optimization in many cases.
Tasks
Published	2018-10-30
URL	http://arxiv.org/abs/1810.12470v1
PDF	http://arxiv.org/pdf/1810.12470v1.pdf
PWC	https://paperswithcode.com/paper/inheritance-based-diversity-measures-for
Repo
Framework

Markov Chain Importance Sampling – a highly efficient estimator for MCMC


Title	Markov Chain Importance Sampling – a highly efficient estimator for MCMC
Authors	Ingmar Schuster, Ilja Klebanov
Abstract	Markov chain (MC) algorithms are ubiquitous in machine learning and statistics and many other disciplines. Typically, these algorithms can be formulated as acceptance rejection methods. In this work we present a novel estimator applicable to these methods, dubbed Markov chain importance sampling (MCIS), which efficiently makes use of rejected proposals. For the unadjusted Langevin algorithm, it provides a novel way of correcting the discretization error. Our estimator satisfies a central limit theorem and improves on error per CPU cycle, often to a large extent. As a by-product it enables estimating the normalizing constant, an important quantity in Bayesian machine learning and statistics.
Tasks
Published	2018-05-18
URL	https://arxiv.org/abs/1805.07179v3
PDF	https://arxiv.org/pdf/1805.07179v3.pdf
PWC	https://paperswithcode.com/paper/markov-chain-importance-sampling-a-highly
Repo
Framework

Regret Bounds for Reinforcement Learning via Markov Chain Concentration


Title	Regret Bounds for Reinforcement Learning via Markov Chain Concentration
Authors	Ronald Ortner
Abstract	We give a simple optimistic algorithm for which it is easy to derive regret bounds of $\tilde{O}(\sqrt{t_{\rm mix} SAT})$ after $T$ steps in uniformly ergodic Markov decision processes with $S$ states, $A$ actions, and mixing time parameter $t_{\rm mix}$. These bounds are the first regret bounds in the general, non-episodic setting with an optimal dependence on all given parameters. They could only be improved by using an alternative mixing time parameter.
Tasks
Published	2018-08-06
URL	http://arxiv.org/abs/1808.01813v2
PDF	http://arxiv.org/pdf/1808.01813v2.pdf
PWC	https://paperswithcode.com/paper/regret-bounds-for-reinforcement-learning-via
Repo
Framework

Towards an Efficient Anomaly-Based Intrusion Detection for Software-Defined Networks


Title	Towards an Efficient Anomaly-Based Intrusion Detection for Software-Defined Networks
Authors	Majd Latah, Levent Toker
Abstract	Software-defined networking (SDN) is a new paradigm that allows developing more flexible network applications. SDN controller, which represents a centralized controlling point, is responsible for running various network applications as well as maintaining different network services and functionalities. Choosing an efficient intrusion detection system helps in reducing the overhead of the running controller and creates a more secure network. In this study, we investigate the performance of the well-known anomaly-based intrusion detection approaches in terms of accuracy, false alarm rate, precision, recall, f1-measure, area under ROC curve, execution time and Mc Nemar’s test. Precisely, we focus on supervised machine-learning approaches where we use the following classifiers: Decision Trees (DT), Extreme Learning Machine (ELM), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Neural Networks (NN), Support Vector Machines (SVM), Random Forest (RT), K Nearest-Neighbour (KNN), AdaBoost, RUSBoost, LogitBoost and BaggingTrees where we employ the well-known NSL-KDD benchmark dataset to compare the performance of each one of these classifiers.
Tasks	Intrusion Detection
Published	2018-03-18
URL	http://arxiv.org/abs/1803.06762v2
PDF	http://arxiv.org/pdf/1803.06762v2.pdf
PWC	https://paperswithcode.com/paper/towards-an-efficient-anomaly-based-intrusion
Repo
Framework

A Novel Approach to Quantized Matrix Completion Using Huber Loss Measure


Title	A Novel Approach to Quantized Matrix Completion Using Huber Loss Measure
Authors	Ashkan Esmaeili, Farokh Marvasti
Abstract	In this paper, we introduce a novel and robust approach to Quantized Matrix Completion (QMC). First, we propose a rank minimization problem with constraints induced by quantization bounds. Next, we form an unconstrained optimization problem by regularizing the rank function with Huber loss. Huber loss is leveraged to control the violation from quantization bounds due to two properties: 1- It is differentiable, 2- It is less sensitive to outliers than the quadratic loss. A Smooth Rank Approximation is utilized to endorse lower rank on the genuine data matrix. Thus, an unconstrained optimization problem with differentiable objective function is obtained allowing us to advantage from Gradient Descent (GD) technique. Novel and firm theoretical analysis on problem model and convergence of our algorithm to the global solution are provided. Another contribution of our work is that our method does not require projections or initial rank estimation unlike the state- of-the-art. In the Numerical Experiments Section, the noticeable outperformance of our proposed method in learning accuracy and computational complexity compared to those of the state-of- the-art literature methods is illustrated as the main contribution.
Tasks	Matrix Completion, Quantization
Published	2018-10-29
URL	http://arxiv.org/abs/1810.12460v1
PDF	http://arxiv.org/pdf/1810.12460v1.pdf
PWC	https://paperswithcode.com/paper/a-novel-approach-to-quantized-matrix
Repo
Framework

A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks


Title	A Dual Approach for Optimal Algorithms in Distributed Optimization over Networks
Authors	César A. Uribe, Soomin Lee, Alexander Gasnikov, Angelia Nedić
Abstract	We study dual-based algorithms for distributed convex optimization problems over networks, where the objective is to minimize a sum $\sum_{i=1}^{m}f_i(z)$ of functions over in a network. We provide complexity bounds for four different cases, namely: each function $f_i$ is strongly convex and smooth, each function is either strongly convex or smooth, and when it is convex but neither strongly convex nor smooth. Our approach is based on the dual of an appropriately formulated primal problem, which includes a graph that models the communication restrictions. We propose distributed algorithms that achieve the same optimal rates as their centralized counterparts (up to constant and logarithmic factors), with an additional optimal cost related to the spectral properties of the network. Initially, we focus on functions for which we can explicitly minimize its Legendre-Fenchel conjugate, i.e., admissible or dual friendly functions. Then, we study distributed optimization algorithms for non-dual friendly functions, as well as a method to improve the dependency on the parameters of the functions involved. Numerical analysis of the proposed algorithms is also provided.
Tasks	Distributed Optimization
Published	2018-09-03
URL	https://arxiv.org/abs/1809.00710v3
PDF	https://arxiv.org/pdf/1809.00710v3.pdf
PWC	https://paperswithcode.com/paper/a-dual-approach-for-optimal-algorithms-in
Repo
Framework

A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates


Title	A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates
Authors	Yossi Arjevani, Ohad Shamir, Nathan Srebro
Abstract	We provide tight finite-time convergence bounds for gradient descent and stochastic gradient descent on quadratic functions, when the gradients are delayed and reflect iterates from $\tau$ rounds ago. First, we show that without stochastic noise, delays strongly affect the attainable optimization error: In fact, the error can be as bad as non-delayed gradient descent ran on only $1/\tau$ of the gradients. In sharp contrast, we quantify how stochastic noise makes the effect of delays negligible, improving on previous work which only showed this phenomenon asymptotically or for much smaller delays. Also, in the context of distributed optimization, the results indicate that the performance of gradient descent with delays is competitive with synchronous approaches such as mini-batching. Our results are based on a novel technique for analyzing convergence of optimization algorithms using generating functions.
Tasks	Distributed Optimization
Published	2018-06-26
URL	http://arxiv.org/abs/1806.10188v1
PDF	http://arxiv.org/pdf/1806.10188v1.pdf
PWC	https://paperswithcode.com/paper/a-tight-convergence-analysis-for-stochastic
Repo
Framework

Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization


Title	Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization
Authors	Jiaxiang Wu, Weidong Huang, Junzhou Huang, Tong Zhang
Abstract	Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the error compensated quantized stochastic gradient descent algorithm to improve the training efficiency. Local gradients are quantized to reduce the communication overhead, and accumulated quantization error is utilized to speed up the convergence. Furthermore, we present theoretical analysis on the convergence behaviour, and demonstrate its advantage over competitors. Extensive experiments indicate that our algorithm can compress gradients by a factor of up to two magnitudes without performance degradation.
Tasks	Distributed Optimization, Quantization
Published	2018-06-21
URL	http://arxiv.org/abs/1806.08054v1
PDF	http://arxiv.org/pdf/1806.08054v1.pdf
PWC	https://paperswithcode.com/paper/error-compensated-quantized-sgd-and-its
Repo
Framework