May 5, 2019

2960 words 14 mins read

Paper Group ANR 504

PageRank in Malware Categorization. SimTensor: A synthetic tensor data generator. Normative Multiagent Systems: A Dynamic Generalization. OpenCL-accelerated object classification in video streams using Spatial Pooler of Hierarchical Temporal Memory. A topological insight into restricted Boltzmann machines. Entropy/IP: Uncovering Structure in IPv6 A …

PageRank in Malware Categorization


Title	PageRank in Malware Categorization
Authors	BooJoong Kang, Suleiman Y. Yerima, Kieran McLaughlin, Sakir Sezer
Abstract	In this paper, we propose a malware categorization method that models malware behavior in terms of instructions using PageRank. PageRank computes ranks of web pages based on structural information and can also compute ranks of instructions that represent the structural information of the instructions in malware analysis methods. Our malware categorization method uses the computed ranks as features in machine learning algorithms. In the evaluation, we compare the effectiveness of different PageRank algorithms and also investigate bagging and boosting algorithms to improve the categorization accuracy.
Tasks
Published	2016-08-02
URL	http://arxiv.org/abs/1608.00866v1
PDF	http://arxiv.org/pdf/1608.00866v1.pdf
PWC	https://paperswithcode.com/paper/pagerank-in-malware-categorization
Repo
Framework

SimTensor: A synthetic tensor data generator


Title	SimTensor: A synthetic tensor data generator
Authors	Hadi Fanaee-T, Joao Gama
Abstract	SimTensor is a multi-platform, open-source software for generating artificial tensor data (either with CP/PARAFAC or Tucker structure) for reproducible research on tensor factorization algorithms. SimTensor is a stand-alone application based on MATALB. It provides a wide range of facilities for generating tensor data with various configurations. It comes with a user-friendly graphical user interface, which enables the user to generate tensors with complicated settings in an easy way. It also has this facility to export generated data to universal formats such as CSV and HDF5, which can be imported via a wide range of programming languages (C, C++, Java, R, Fortran, MATLAB, Perl, Python, and many more). The most innovative part of SimTensor is this that can generate temporal tensors with periodic waves, seasonal effects and streaming structure. it can apply constraints such as non-negativity and different kinds of sparsity to the data. SimTensor also provides this facility to simulate different kinds of change-points and inject various types of anomalies. The source code and binary versions of SimTensor is available for download in http://www.simtensor.org.
Tasks
Published	2016-12-09
URL	http://arxiv.org/abs/1612.03772v1
PDF	http://arxiv.org/pdf/1612.03772v1.pdf
PWC	https://paperswithcode.com/paper/simtensor-a-synthetic-tensor-data-generator
Repo
Framework

Normative Multiagent Systems: A Dynamic Generalization


Title	Normative Multiagent Systems: A Dynamic Generalization
Authors	Xiaowei Huang, Ji Ruan, Qingliang Chen, Kaile Su
Abstract	Social norms are powerful formalism in coordinating autonomous agents’ behaviour to achieve certain objectives. In this paper, we propose a dynamic normative system to enable the reasoning of the changes of norms under different circumstances, which cannot be done in the existing static normative systems. We study two important problems (norm synthesis and norm recognition) related to the autonomy of the entire system and the agents, and characterise the computational complexities of solving these problems.
Tasks
Published	2016-04-18
URL	http://arxiv.org/abs/1604.05086v1
PDF	http://arxiv.org/pdf/1604.05086v1.pdf
PWC	https://paperswithcode.com/paper/normative-multiagent-systems-a-dynamic
Repo
Framework

OpenCL-accelerated object classification in video streams using Spatial Pooler of Hierarchical Temporal Memory


Title	OpenCL-accelerated object classification in video streams using Spatial Pooler of Hierarchical Temporal Memory
Authors	Maciej Wielgosz, Marcin Pietroń
Abstract	We present a method to classify objects in video streams using a brain-inspired Hierarchical Temporal Memory (HTM) algorithm. Object classification is a challenging task where humans still significantly outperform machine learning algorithms due to their unique capabilities. We have implemented a system which achieves very promising performance in terms of recognition accuracy. Unfortunately, conducting more advanced experiments is very computationally demanding; some of the trials run on a standard CPU may take as long as several days for 960x540 video streams frames. Therefore we have decided to accelerate selected parts of the system using OpenCL. In particular, we seek to determine to what extent porting selected and computationally demanding parts of a core may speed up calculations. The classification accuracy of the system was examined through a series of experiments and the performance was given in terms of F1 score as a function of the number of columns, synapses, $min_overlap$ and $winners_set_size$. The system achieves the highest F1 score of 0.95 and 0.91 for $min_overlap=4$ and 256 synapses, respectively. We have also conduced a series of experiments with different hardware setups and measured CPU/GPU acceleration. The best kernel speed-up of 632x and 207x was reached for 256 synapses and 1024 columns. However, overall acceleration including transfer time was significantly lower and amounted to 6.5x and 3.2x for the same setup.
Tasks	Object Classification
Published	2016-08-05
URL	http://arxiv.org/abs/1608.01966v1
PDF	http://arxiv.org/pdf/1608.01966v1.pdf
PWC	https://paperswithcode.com/paper/opencl-accelerated-object-classification-in
Repo
Framework

A topological insight into restricted Boltzmann machines


Title	A topological insight into restricted Boltzmann machines
Authors	Decebal Constantin Mocanu, Elena Mocanu, Phuong H. Nguyen, Madeleine Gibescu, Antonio Liotta
Abstract	Restricted Boltzmann Machines (RBMs) and models derived from them have been successfully used as basic building blocks in deep artificial neural networks for automatic features extraction, unsupervised weights initialization, but also as density estimators. Thus, their generative and discriminative capabilities, but also their computational time are instrumental to a wide range of applications. Our main contribution is to look at RBMs from a topological perspective, bringing insights from network science. Firstly, here we show that RBMs and Gaussian RBMs (GRBMs) are bipartite graphs which naturally have a small-world topology. Secondly, we demonstrate both on synthetic and real-world datasets that by constraining RBMs and GRBMs to a scale-free topology (while still considering local neighborhoods and data distribution), we reduce the number of weights that need to be computed by a few orders of magnitude, at virtually no loss in generative performance. Thirdly, we show that, for a fixed number of weights, our proposed sparse models (which by design have a higher number of hidden neurons) achieve better generative capabilities than standard fully connected RBMs and GRBMs (which by design have a smaller number of hidden neurons), at no additional computational costs.
Tasks
Published	2016-04-20
URL	http://arxiv.org/abs/1604.05978v2
PDF	http://arxiv.org/pdf/1604.05978v2.pdf
PWC	https://paperswithcode.com/paper/a-topological-insight-into-restricted
Repo
Framework

Entropy/IP: Uncovering Structure in IPv6 Addresses


Title	Entropy/IP: Uncovering Structure in IPv6 Addresses
Authors	Pawel Foremski, David Plonka, Arthur Berger
Abstract	In this paper, we introduce Entropy/IP: a system that discovers Internet address structure based on analyses of a subset of IPv6 addresses known to be active, i.e., training data, gleaned by readily available passive and active means. The system is completely automated and employs a combination of information-theoretic and machine learning techniques to probabilistically model IPv6 addresses. We present results showing that our system is effective in exposing structural characteristics of portions of the IPv6 Internet address space populated by active client, service, and router addresses. In addition to visualizing the address structure for exploration, the system uses its models to generate candidate target addresses for scanning. For each of 15 evaluated datasets, we train on 1K addresses and generate 1M candidates for scanning. We achieve some success in 14 datasets, finding up to 40% of the generated addresses to be active. In 11 of these datasets, we find active network identifiers (e.g., /64 prefixes or `subnets’) not seen in training. Thus, we provide the first evidence that it is practical to discover subnets and hosts by scanning probabilistically selected areas of the IPv6 address space not known to contain active hosts a priori. \|
Tasks
Published	2016-06-14
URL	http://arxiv.org/abs/1606.04327v2
PDF	http://arxiv.org/pdf/1606.04327v2.pdf
PWC	https://paperswithcode.com/paper/entropyip-uncovering-structure-in-ipv6
Repo
Framework

Local Search Yields a PTAS for k-Means in Doubling Metrics


Title	Local Search Yields a PTAS for k-Means in Doubling Metrics
Authors	Zachary Friggstad, Mohsen Rezapour, Mohammad R. Salavatipour
Abstract	The most well known and ubiquitous clustering problem encountered in nearly every branch of science is undoubtedly $k$-means: given a set of data points and a parameter $k$, select $k$ centres and partition the data points into $k$ clusters around these centres so that the sum of squares of distances of the points to their cluster centre is minimized. Typically these data points lie $\mathbb{R}^d$ for some $d\geq 2$. $k$-means and the first algorithms for it were introduced in the 1950’s. Since then, hundreds of papers have studied this problem and many algorithms have been proposed for it. The most commonly used algorithm is known as Lloyd-Forgy, which is also referred to as “the” $k$-means algorithm, and various extensions of it often work very well in practice. However, they may produce solutions whose cost is arbitrarily large compared to the optimum solution. Kanungo et al. [2004] analyzed a simple local search heuristic to get a polynomial-time algorithm with approximation ratio $9+\epsilon$ for any fixed $\epsilon>0$ for $k$-means in Euclidean space. Finding an algorithm with a better approximation guarantee has remained one of the biggest open questions in this area, in particular whether one can get a true PTAS for fixed dimension Euclidean space. We settle this problem by showing that a simple local search algorithm provides a PTAS for $k$-means in $\mathbb{R}^d$ for any fixed $d$. More precisely, for any error parameter $\epsilon>0$, the local search algorithm that considers swaps of up to $\rho=d^{O(d)}\cdot{\epsilon}^{-O(d/\epsilon)}$ centres at a time finds a solution using exactly $k$ centres whose cost is at most a $(1+\epsilon)$-factor greater than the optimum. Finally, we provide the first demonstration that local search yields a PTAS for the uncapacitated facility location problem and $k$-median with non-uniform opening costs in doubling metrics.
Tasks
Published	2016-03-29
URL	http://arxiv.org/abs/1603.08976v2
PDF	http://arxiv.org/pdf/1603.08976v2.pdf
PWC	https://paperswithcode.com/paper/local-search-yields-a-ptas-for-k-means-in
Repo
Framework

A Unified View of Multi-Label Performance Measures


Title	A Unified View of Multi-Label Performance Measures
Authors	Xi-Zhu Wu, Zhi-Hua Zhou
Abstract	Multi-label classification deals with the problem where each instance is associated with multiple class labels. Because evaluation in multi-label classification is more complicated than single-label setting, a number of performance measures have been proposed. It is noticed that an algorithm usually performs differently on different measures. Therefore, it is important to understand which algorithms perform well on which measure(s) and why. In this paper, we propose a unified margin view to revisit eleven performance measures in multi-label classification. In particular, we define label-wise margin and instance-wise margin, and prove that through maximizing these margins, different corresponding performance measures will be optimized. Based on the defined margins, a max-margin approach called LIMO is designed and empirical results verify our theoretical findings.
Tasks	Multi-Label Classification
Published	2016-09-01
URL	http://arxiv.org/abs/1609.00288v2
PDF	http://arxiv.org/pdf/1609.00288v2.pdf
PWC	https://paperswithcode.com/paper/a-unified-view-of-multi-label-performance
Repo
Framework

Tight Lower Bounds for Multiplicative Weights Algorithmic Families


Title	Tight Lower Bounds for Multiplicative Weights Algorithmic Families
Authors	Nick Gravin, Yuval Peres, Balasubramanian Sivan
Abstract	We study the fundamental problem of prediction with expert advice and develop regret lower bounds for a large family of algorithms for this problem. We develop simple adversarial primitives, that lend themselves to various combinations leading to sharp lower bounds for many algorithmic families. We use these primitives to show that the classic Multiplicative Weights Algorithm (MWA) has a regret of $\sqrt{\frac{T \ln k}{2}}$, there by completely closing the gap between upper and lower bounds. We further show a regret lower bound of $\frac{2}{3}\sqrt{\frac{T\ln k}{2}}$ for a much more general family of algorithms than MWA, where the learning rate can be arbitrarily varied over time, or even picked from arbitrary distributions over time. We also use our primitives to construct adversaries in the geometric horizon setting for MWA to precisely characterize the regret at $\frac{0.391}{\sqrt{\delta}}$ for the case of $2$ experts and a lower bound of $\frac{1}{2}\sqrt{\frac{\ln k}{2\delta}}$ for the case of arbitrary number of experts $k$.
Tasks
Published	2016-07-11
URL	http://arxiv.org/abs/1607.02834v2
PDF	http://arxiv.org/pdf/1607.02834v2.pdf
PWC	https://paperswithcode.com/paper/tight-lower-bounds-for-multiplicative-weights
Repo
Framework

Universal, Unsupervised (Rule-Based), Uncovered Sentiment Analysis


Title	Universal, Unsupervised (Rule-Based), Uncovered Sentiment Analysis
Authors	David Vilares, Carlos Gómez-Rodríguez, Miguel A. Alonso
Abstract	We present a novel unsupervised approach for multilingual sentiment analysis driven by compositional syntax-based rules. On the one hand, we exploit some of the main advantages of unsupervised algorithms: (1) the interpretability of their output, in contrast with most supervised models, which behave as a black box and (2) their robustness across different corpora and domains. On the other hand, by introducing the concept of compositional operations and exploiting syntactic information in the form of universal dependencies, we tackle one of their main drawbacks: their rigidity on data that are structured differently depending on the language concerned. Experiments show an improvement both over existing unsupervised methods, and over state-of-the-art supervised models when evaluating outside their corpus of origin. Experiments also show how the same compositional operations can be shared across languages. The system is available at http://www.grupolys.org/software/UUUSA/
Tasks	Sentiment Analysis
Published	2016-06-17
URL	http://arxiv.org/abs/1606.05545v2
PDF	http://arxiv.org/pdf/1606.05545v2.pdf
PWC	https://paperswithcode.com/paper/universal-unsupervised-rule-based-uncovered
Repo
Framework

On the uniqueness and stability of dictionaries for sparse representation of noisy signals


Title	On the uniqueness and stability of dictionaries for sparse representation of noisy signals
Authors	Charles J. Garfinkle, Christopher J. Hillar
Abstract	Learning optimal dictionaries for sparse coding has exposed characteristic sparse features of many natural signals. However, universal guarantees of the stability of such features in the presence of noise are lacking. Here, we provide very general conditions guaranteeing when dictionaries yielding the sparsest encodings are unique and stable with respect to measurement or modeling error. We demonstrate that some or all original dictionary elements are recoverable from noisy data even if the dictionary fails to satisfy the spark condition, its size is overestimated, or only a polynomial number of distinct sparse supports appear in the data. Importantly, we derive these guarantees without requiring any constraints on the recovered dictionary beyond a natural upper bound on its size. Our results also yield an effective procedure sufficient to affirm if a proposed solution to the dictionary learning problem is unique within bounds commensurate with the noise. We suggest applications to data analysis, engineering, and neuroscience and close with some remaining challenges left open by our work.
Tasks	Dictionary Learning
Published	2016-06-22
URL	https://arxiv.org/abs/1606.06997v4
PDF	https://arxiv.org/pdf/1606.06997v4.pdf
PWC	https://paperswithcode.com/paper/on-the-uniqueness-and-stability-of
Repo
Framework

Parameterized Compilation Lower Bounds for Restricted CNF-formulas


Title	Parameterized Compilation Lower Bounds for Restricted CNF-formulas
Authors	Stefan Mengel
Abstract	We show unconditional parameterized lower bounds in the area of knowledge compilation, more specifically on the size of circuits in decomposable negation normal form (DNNF) that encode CNF-formulas restricted by several graph width measures. In particular, we show that - there are CNF formulas of size $n$ and modular incidence treewidth $k$ whose smallest DNNF-encoding has size $n^{\Omega(k)}$, and - there are CNF formulas of size $n$ and incidence neighborhood diversity $k$ whose smallest DNNF-encoding has size $n^{\Omega(\sqrt{k})}$. These results complement recent upper bounds for compiling CNF into DNNF and strengthen—quantitatively and qualitatively—known conditional low-er bounds for cliquewidth. Moreover, they show that, unlike for many graph problems, the parameters considered here behave significantly differently from treewidth.
Tasks
Published	2016-04-22
URL	http://arxiv.org/abs/1604.06715v1
PDF	http://arxiv.org/pdf/1604.06715v1.pdf
PWC	https://paperswithcode.com/paper/parameterized-compilation-lower-bounds-for
Repo
Framework

Neural Tree Indexers for Text Understanding


Title	Neural Tree Indexers for Text Understanding
Authors	Tsendsuren Munkhdalai, Hong Yu
Abstract	Recurrent neural networks (RNNs) process input text sequentially and model the conditional transition between word tokens. In contrast, the advantages of recursive networks include that they explicitly model the compositionality and the recursive structure of natural language. However, the current recursive architecture is limited by its dependence on syntactic tree. In this paper, we introduce a robust syntactic parsing-independent tree structured model, Neural Tree Indexers (NTI) that provides a middle ground between the sequential RNNs and the syntactic treebased recursive models. NTI constructs a full n-ary tree by processing the input text with its node function in a bottom-up fashion. Attention mechanism can then be applied to both structure and node function. We implemented and evaluated a binarytree model of NTI, showing the model achieved the state-of-the-art performance on three different NLP tasks: natural language inference, answer sentence selection, and sentence classification, outperforming state-of-the-art recurrent and recursive neural networks.
Tasks	Natural Language Inference, Sentence Classification
Published	2016-07-15
URL	http://arxiv.org/abs/1607.04492v2
PDF	http://arxiv.org/pdf/1607.04492v2.pdf
PWC	https://paperswithcode.com/paper/neural-tree-indexers-for-text-understanding
Repo
Framework

Word Recognition with Deep Conditional Random Fields


Title	Word Recognition with Deep Conditional Random Fields
Authors	Gang Chen, Yawei Li, Sargur N. Srihari
Abstract	Recognition of handwritten words continues to be an important problem in document analysis and recognition. Existing approaches extract hand-engineered features from word images–which can perform poorly with new data sets. Recently, deep learning has attracted great attention because of the ability to learn features from raw data. Moreover they have yielded state-of-the-art results in classification tasks including character recognition and scene recognition. On the other hand, word recognition is a sequential problem where we need to model the correlation between characters. In this paper, we propose using deep Conditional Random Fields (deep CRFs) for word recognition. Basically, we combine CRFs with deep learning, in which deep features are learned and sequences are labeled in a unified framework. We pre-train the deep structure with stacked restricted Boltzmann machines (RBMs) for feature learning and optimize the entire network with an online learning algorithm. The proposed model was evaluated on two datasets, and seen to perform significantly better than competitive baseline models. The source code is available at https://github.com/ganggit/deepCRFs.
Tasks	Scene Recognition
Published	2016-12-04
URL	http://arxiv.org/abs/1612.01072v1
PDF	http://arxiv.org/pdf/1612.01072v1.pdf
PWC	https://paperswithcode.com/paper/word-recognition-with-deep-conditional-random
Repo
Framework

On the Exploration of Convolutional Fusion Networks for Visual Recognition


Title	On the Exploration of Convolutional Fusion Networks for Visual Recognition
Authors	Yu Liu, Yanming Guo, Michael S. Lew
Abstract	Despite recent advances in multi-scale deep representations, their limitations are attributed to expensive parameters and weak fusion modules. Hence, we propose an efficient approach to fuse multi-scale deep representations, called convolutional fusion networks (CFN). Owing to using 1$\times$1 convolution and global average pooling, CFN can efficiently generate the side branches while adding few parameters. In addition, we present a locally-connected fusion module, which can learn adaptive weights for the side branches and form a discriminatively fused feature. CFN models trained on the CIFAR and ImageNet datasets demonstrate remarkable improvements over the plain CNNs. Furthermore, we generalize CFN to three new tasks, including scene recognition, fine-grained recognition and image retrieval. Our experiments show that it can obtain consistent improvements towards the transferring tasks.
Tasks	Image Retrieval, Scene Recognition
Published	2016-11-16
URL	http://arxiv.org/abs/1611.05503v1
PDF	http://arxiv.org/pdf/1611.05503v1.pdf
PWC	https://paperswithcode.com/paper/on-the-exploration-of-convolutional-fusion
Repo
Framework