May 7, 2019

2840 words 14 mins read

Paper Group ANR 56

Ensemble Methods of Classification for Power Systems Security Assessment. Firefly Algorithm for optimization problems with non-continuous variables: A Review and Analysis. Shamela: A Large-Scale Historical Arabic Corpus. On the Complexity of Connection Games. Ordinal Constrained Binary Code Learning for Nearest Neighbor Search. Deep Attributes Driv …

Ensemble Methods of Classification for Power Systems Security Assessment


Title	Ensemble Methods of Classification for Power Systems Security Assessment
Authors	Alexei Zhukov, Victor Kurbatsky, Nikita Tomin, Denis Sidorov, Daniil Panasetsky, Aoife Foley
Abstract	One of the most promising approaches for complex technical systems analysis employs ensemble methods of classification. Ensemble methods enable to build a reliable decision rules for feature space classification in the presence of many possible states of the system. In this paper, novel techniques based on decision trees are used for evaluation of the reliability of the regime of electric power systems. We proposed hybrid approach based on random forests models and boosting models. Such techniques can be applied to predict the interaction of increasing renewable power, storage devices and swiching of smart loads from intelligent domestic appliances, heaters and air-conditioning units and electric vehicles with grid for enhanced decision making. The ensemble classification methods were tested on the modified 118-bus IEEE power system showing that proposed technique can be employed to examine whether the power system is secured under steady-state operating conditions.
Tasks	Decision Making
Published	2016-01-07
URL	http://arxiv.org/abs/1601.01675v1
PDF	http://arxiv.org/pdf/1601.01675v1.pdf
PWC	https://paperswithcode.com/paper/ensemble-methods-of-classification-for-power
Repo
Framework

Firefly Algorithm for optimization problems with non-continuous variables: A Review and Analysis


Title	Firefly Algorithm for optimization problems with non-continuous variables: A Review and Analysis
Authors	Surafel Luleseged Tilahun, Jean Medard T Ngnotchouye
Abstract	Firefly algorithm is a swarm based metaheuristic algorithm inspired by the flashing behavior of fireflies. It is an effective and an easy to implement algorithm. It has been tested on different problems from different disciplines and found to be effective. Even though the algorithm is proposed for optimization problems with continuous variables, it has been modified and used for problems with non-continuous variables, including binary and integer valued problems. In this paper a detailed review of this modifications of firefly algorithm for problems with non-continuous variables will be discussed. The strength and weakness of the modifications along with possible future works will be presented.
Tasks
Published	2016-02-25
URL	http://arxiv.org/abs/1602.07884v1
PDF	http://arxiv.org/pdf/1602.07884v1.pdf
PWC	https://paperswithcode.com/paper/firefly-algorithm-for-optimization-problems
Repo
Framework

Shamela: A Large-Scale Historical Arabic Corpus


Title	Shamela: A Large-Scale Historical Arabic Corpus
Authors	Yonatan Belinkov, Alexander Magidow, Maxim Romanov, Avi Shmidman, Moshe Koppel
Abstract	Arabic is a widely-spoken language with a rich and long history spanning more than fourteen centuries. Yet existing Arabic corpora largely focus on the modern period or lack sufficient diachronic information. We develop a large-scale, historical corpus of Arabic of about 1 billion words from diverse periods of time. We clean this corpus, process it with a morphological analyzer, and enhance it by detecting parallel passages and automatically dating undated texts. We demonstrate its utility with selected case-studies in which we show its application to the digital humanities.
Tasks
Published	2016-12-28
URL	http://arxiv.org/abs/1612.08989v1
PDF	http://arxiv.org/pdf/1612.08989v1.pdf
PWC	https://paperswithcode.com/paper/shamela-a-large-scale-historical-arabic
Repo
Framework

On the Complexity of Connection Games


Title	On the Complexity of Connection Games
Authors	Édouard Bonnet, Florian Jamain, Abdallah Saffidine
Abstract	In this paper, we study three connection games among the most widely played: Havannah, Twixt, and Slither. We show that determining the outcome of an arbitrary input position is PSPACE-complete in all three cases. Our reductions are based on the popular graph problem Generalized Geography and on Hex itself. We also consider the complexity of generalizations of Hex parameterized by the length of the solution and establish that while Short Generalized Hex is W[1]-hard, Short Hex is FPT. Finally, we prove that the ultra-weak solution to the empty starting position in hex cannot be fully adapted to any of these three games.
Tasks
Published	2016-05-16
URL	http://arxiv.org/abs/1605.04715v1
PDF	http://arxiv.org/pdf/1605.04715v1.pdf
PWC	https://paperswithcode.com/paper/on-the-complexity-of-connection-games
Repo
Framework

Ordinal Constrained Binary Code Learning for Nearest Neighbor Search


Title	Ordinal Constrained Binary Code Learning for Nearest Neighbor Search
Authors	Hong Liu, Rongrong Ji, Yongjian Wu, Feiyue Huang
Abstract	Recent years have witnessed extensive attention in binary code learning, a.k.a. hashing, for nearest neighbor search problems. It has been seen that high-dimensional data points can be quantized into binary codes to give an efficient similarity approximation via Hamming distance. Among existing schemes, ranking-based hashing is recent promising that targets at preserving ordinal relations of ranking in the Hamming space to minimize retrieval loss. However, the size of the ranking tuples, which shows the ordinal relations, is quadratic or cubic to the size of training samples. By given a large-scale training data set, it is very expensive to embed such ranking tuples in binary code learning. Besides, it remains a dificulty to build ranking tuples efficiently for most ranking-preserving hashing, which are deployed over an ordinal graph-based setting. To handle these problems, we propose a novel ranking-preserving hashing method, dubbed Ordinal Constraint Hashing (OCH), which efficiently learns the optimal hashing functions with a graph-based approximation to embed the ordinal relations. The core idea is to reduce the size of ordinal graph with ordinal constraint projection, which preserves the ordinal relations through a small data set (such as clusters or random samples). In particular, to learn such hash functions effectively, we further relax the discrete constraints and design a specific stochastic gradient decent algorithm for optimization. Experimental results on three large-scale visual search benchmark datasets, i.e. LabelMe, Tiny100K and GIST1M, show that the proposed OCH method can achieve superior performance over the state-of-the-arts approaches.
Tasks
Published	2016-11-19
URL	http://arxiv.org/abs/1611.06362v1
PDF	http://arxiv.org/pdf/1611.06362v1.pdf
PWC	https://paperswithcode.com/paper/ordinal-constrained-binary-code-learning-for
Repo
Framework

Deep Attributes Driven Multi-Camera Person Re-identification


Title	Deep Attributes Driven Multi-Camera Person Re-identification
Authors	Chi Su, Shiliang Zhang, Junliang Xing, Wen Gao, Qi Tian
Abstract	The visual appearance of a person is easily affected by many factors like pose variations, viewpoint changes and camera parameter differences. This makes person Re-Identification (ReID) among multiple cameras a very challenging task. This work is motivated to learn mid-level human attributes which are robust to such visual appearance variations. And we propose a semi-supervised attribute learning framework which progressively boosts the accuracy of attributes only using a limited number of labeled data. Specifically, this framework involves a three-stage training. A deep Convolutional Neural Network (dCNN) is first trained on an independent dataset labeled with attributes. Then it is fine-tuned on another dataset only labeled with person IDs using our defined triplet loss. Finally, the updated dCNN predicts attribute labels for the target dataset, which is combined with the independent dataset for the final round of fine-tuning. The predicted attributes, namely \emph{deep attributes} exhibit superior generalization ability across different datasets. By directly using the deep attributes with simple Cosine distance, we have obtained surprisingly good accuracy on four person ReID datasets. Experiments also show that a simple metric learning modular further boosts our method, making it significantly outperform many recent works.
Tasks	Metric Learning, Person Re-Identification
Published	2016-05-11
URL	http://arxiv.org/abs/1605.03259v2
PDF	http://arxiv.org/pdf/1605.03259v2.pdf
PWC	https://paperswithcode.com/paper/deep-attributes-driven-multi-camera-person-re
Repo
Framework

Automatic Detection of Solar Photovoltaic Arrays in High Resolution Aerial Imagery


Title	Automatic Detection of Solar Photovoltaic Arrays in High Resolution Aerial Imagery
Authors	Jordan M. Malof, Kyle Bradbury, Leslie M. Collins, Richard G. Newell
Abstract	The quantity of small scale solar photovoltaic (PV) arrays in the United States has grown rapidly in recent years. As a result, there is substantial interest in high quality information about the quantity, power capacity, and energy generated by such arrays, including at a high spatial resolution (e.g., counties, cities, or even smaller regions). Unfortunately, existing methods for obtaining this information, such as surveys and utility interconnection filings, are limited in their completeness and spatial resolution. This work presents a computer algorithm that automatically detects PV panels using very high resolution color satellite imagery. The approach potentially offers a fast, scalable method for obtaining accurate information on PV array location and size, and at much higher spatial resolutions than are currently available. The method is validated using a very large (135 km^2) collection of publicly available [1] aerial imagery, with over 2,700 human annotated PV array locations. The results demonstrate the algorithm is highly effective on a per-pixel basis. It is likewise effective at object-level PV array detection, but with significant potential for improvement in estimating the precise shape/size of the PV arrays. These results are the first of their kind for the detection of solar PV in aerial imagery, demonstrating the feasibility of the approach and establishing a baseline performance for future investigations.
Tasks
Published	2016-07-20
URL	http://arxiv.org/abs/1607.06029v1
PDF	http://arxiv.org/pdf/1607.06029v1.pdf
PWC	https://paperswithcode.com/paper/automatic-detection-of-solar-photovoltaic
Repo
Framework

Estimating Treatment Effects using Multiple Surrogates: The Role of the Surrogate Score and the Surrogate Index


Title	Estimating Treatment Effects using Multiple Surrogates: The Role of the Surrogate Score and the Surrogate Index
Authors	Susan Athey, Raj Chetty, Guido Imbens, Hyunseung Kang
Abstract	Estimating the long-term effects of treatments is of interest in many fields. A common challenge in estimating such treatment effects is that long-term outcomes are unobserved in the time frame needed to make policy decisions. One approach to overcome this missing data problem is to analyze treatments effects on an intermediate outcome, often called a statistical surrogate, if it satisfies the condition that treatment and outcome are independent conditional on the statistical surrogate. The validity of the surrogacy condition is often controversial. Here we exploit that fact that in modern datasets, researchers often observe a large number, possibly hundreds or thousands, of intermediate outcomes, thought to lie on or close to the causal chain between the treatment and the long-term outcome of interest. Even if none of the individual proxies satisfies the statistical surrogacy criterion by itself, using multiple proxies can be useful in causal inference. We focus primarily on a setting with two samples, an experimental sample containing data about the treatment indicator and the surrogates and an observational sample containing information about the surrogates and the primary outcome. We state assumptions under which the average treatment effect be identified and estimated with a high-dimensional vector of proxies that collectively satisfy the surrogacy assumption, and derive the bias from violations of the surrogacy assumption, and show that even if the primary outcome is also observed in the experimental sample, there is still information to be gained from using surrogates.
Tasks	Causal Inference
Published	2016-03-30
URL	https://arxiv.org/abs/1603.09326v3
PDF	https://arxiv.org/pdf/1603.09326v3.pdf
PWC	https://paperswithcode.com/paper/estimating-treatment-effects-using-multiple
Repo
Framework

The Effects of Data Size and Frequency Range on Distributional Semantic Models


Title	The Effects of Data Size and Frequency Range on Distributional Semantic Models
Authors	Magnus Sahlgren, Alessandro Lenci
Abstract	This paper investigates the effects of data size and frequency range on distributional semantic models. We compare the performance of a number of representative models for several test settings over data of varying sizes, and over test items of various frequency. Our results show that neural network-based models underperform when the data is small, and that the most reliable model over data of varying sizes and frequency ranges is the inverted factorized model.
Tasks
Published	2016-09-27
URL	http://arxiv.org/abs/1609.08293v1
PDF	http://arxiv.org/pdf/1609.08293v1.pdf
PWC	https://paperswithcode.com/paper/the-effects-of-data-size-and-frequency-range
Repo
Framework

Deep Learning of Part-based Representation of Data Using Sparse Autoencoders with Nonnegativity Constraints


Title	Deep Learning of Part-based Representation of Data Using Sparse Autoencoders with Nonnegativity Constraints
Authors	Ehsan Hosseini-Asl, Jacek M. Zurada, Olfa Nasraoui
Abstract	We demonstrate a new deep learning autoencoder network, trained by a nonnegativity constraint algorithm (NCAE), that learns features which show part-based representation of data. The learning algorithm is based on constraining negative weights. The performance of the algorithm is assessed based on decomposing data into parts and its prediction performance is tested on three standard image data sets and one text dataset. The results indicate that the nonnegativity constraint forces the autoencoder to learn features that amount to a part-based representation of data, while improving sparsity and reconstruction quality in comparison with the traditional sparse autoencoder and Nonnegative Matrix Factorization. It is also shown that this newly acquired representation improves the prediction performance of a deep neural network.
Tasks
Published	2016-01-12
URL	http://arxiv.org/abs/1601.02733v1
PDF	http://arxiv.org/pdf/1601.02733v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-of-part-based-representation-of
Repo
Framework

Large-Scale Electron Microscopy Image Segmentation in Spark


Title	Large-Scale Electron Microscopy Image Segmentation in Spark
Authors	Stephen M. Plaza, Stuart E. Berg
Abstract	The emerging field of connectomics aims to unlock the mysteries of the brain by understanding the connectivity between neurons. To map this connectivity, we acquire thousands of electron microscopy (EM) images with nanometer-scale resolution. After aligning these images, the resulting dataset has the potential to reveal the shapes of neurons and the synaptic connections between them. However, imaging the brain of even a tiny organism like the fruit fly yields terabytes of data. It can take years of manual effort to examine such image volumes and trace their neuronal connections. One solution is to apply image segmentation algorithms to help automate the tracing tasks. In this paper, we propose a novel strategy to apply such segmentation on very large datasets that exceed the capacity of a single machine. Our solution is robust to potential segmentation errors which could otherwise severely compromise the quality of the overall segmentation, for example those due to poor classifier generalizability or anomalies in the image dataset. We implement our algorithms in a Spark application which minimizes disk I/O, and apply them to a few large EM datasets, revealing both their effectiveness and scalability. We hope this work will encourage external contributions to EM segmentation by providing 1) a flexible plugin architecture that deploys easily on different cluster environments and 2) an in-memory representation of segmentation that could be conducive to new advances.
Tasks	Electron Microscopy Image Segmentation, Semantic Segmentation
Published	2016-04-01
URL	http://arxiv.org/abs/1604.00385v1
PDF	http://arxiv.org/pdf/1604.00385v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-electron-microscopy-image
Repo
Framework

Nonconvex Sparse Learning via Stochastic Optimization with Progressive Variance Reduction


Title	Nonconvex Sparse Learning via Stochastic Optimization with Progressive Variance Reduction
Authors	Xingguo Li, Raman Arora, Han Liu, Jarvis Haupt, Tuo Zhao
Abstract	We propose a stochastic variance reduced optimization algorithm for solving sparse learning problems with cardinality constraints. Sufficient conditions are provided, under which the proposed algorithm enjoys strong linear convergence guarantees and optimal estimation accuracy in high dimensions. We further extend the proposed algorithm to an asynchronous parallel variant with a near linear speedup. Numerical experiments demonstrate the efficiency of our algorithm in terms of both parameter estimation and computational performance.
Tasks	Sparse Learning, Stochastic Optimization
Published	2016-05-09
URL	http://arxiv.org/abs/1605.02711v5
PDF	http://arxiv.org/pdf/1605.02711v5.pdf
PWC	https://paperswithcode.com/paper/nonconvex-sparse-learning-via-stochastic
Repo
Framework

On the exact recovery of sparse signals via conic relaxations


Title	On the exact recovery of sparse signals via conic relaxations
Authors	Hongbo Dong
Abstract	In this note we compare two recently proposed semidefinite relaxations for the sparse linear regression problem by Pilanci, Wainwright and El Ghaoui (Sparse learning via boolean relaxations, 2015) and Dong, Chen and Linderoth (Relaxation vs. Regularization A conic optimization perspective of statistical variable selection, 2015). We focus on the cardinality constrained formulation, and prove that the relaxation proposed by Dong, etc. is theoretically no weaker than the one proposed by Pilanci, etc. Therefore any sufficient condition of exact recovery derived by Pilanci can be readily applied to the other relaxation, including their results on high probability recovery for Gaussian ensemble. Finally we provide empirical evidence that the relaxation by Dong, etc. requires much fewer observations to guarantee the recovery of true support.
Tasks	Sparse Learning
Published	2016-03-15
URL	http://arxiv.org/abs/1603.04572v1
PDF	http://arxiv.org/pdf/1603.04572v1.pdf
PWC	https://paperswithcode.com/paper/on-the-exact-recovery-of-sparse-signals-via
Repo
Framework

The Image Torque Operator for Contour Processing


Title	The Image Torque Operator for Contour Processing
Authors	Morimichi Nishigaki, Cornelia Fermüller
Abstract	Contours are salient features for image description, but the detection and localization of boundary contours is still considered a challenging problem. This paper introduces a new tool for edge processing implementing the Gestaltism idea of edge grouping. This tool is a mid-level image operator, called the Torque operator, that is designed to help detect closed contours in images. The torque operator takes as input the raw image and creates an image map by computing from the image gradients within regions of multiple sizes a measure of how well the edges are aligned to form closed convex contours. Fundamental properties of the torque are explored and illustrated through examples. Then it is applied in pure bottom-up processing in a variety of applications, including edge detection, visual attention and segmentation and experimentally demonstrated a useful tool that can improve existing techniques. Finally, its extension as a more general grouping mechanism and application in object recognition is discussed.
Tasks	Edge Detection, Object Recognition
Published	2016-01-18
URL	http://arxiv.org/abs/1601.04669v1
PDF	http://arxiv.org/pdf/1601.04669v1.pdf
PWC	https://paperswithcode.com/paper/the-image-torque-operator-for-contour
Repo
Framework

Bank Card Usage Prediction Exploiting Geolocation Information


Title	Bank Card Usage Prediction Exploiting Geolocation Information
Authors	Martin Wistuba, Nghia Duong-Trung, Nicolas Schilling, Lars Schmidt-Thieme
Abstract	We describe the solution of team ISMLL for the ECML-PKDD 2016 Discovery Challenge on Bank Card Usage for both tasks. Our solution is based on three pillars. Gradient boosted decision trees as a strong regression and classification model, an intensive search for good hyperparameter configurations and strong features that exploit geolocation information. This approach achieved the best performance on the public leaderboard for the first task and a decent fourth position for the second task.
Tasks
Published	2016-10-13
URL	http://arxiv.org/abs/1610.03996v1
PDF	http://arxiv.org/pdf/1610.03996v1.pdf
PWC	https://paperswithcode.com/paper/bank-card-usage-prediction-exploiting
Repo
Framework