Paper Group ANR 219
Multi-View Constraint Propagation with Consensus Prior Knowledge. DeepDGA: Adversarially-Tuned Domain Generation and Detection. Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent. Texture Enhancement via High-Resolution Style Transfer for Single-Image Super-Resolution. Symmetry, Saddle Points, and Global Optimiza …
Multi-View Constraint Propagation with Consensus Prior Knowledge
Title | Multi-View Constraint Propagation with Consensus Prior Knowledge |
Authors | Yaoyi Li, Hongtao Lu |
Abstract | In many applications, the pairwise constraint is a kind of weaker supervisory information which can be collected easily. The constraint propagation has been proved to be a success of exploiting such side-information. In recent years, some methods of multi-view constraint propagation have been proposed. However, the problem of reasonably fusing different views remains unaddressed. In this paper, we present a method dubbed Consensus Prior Constraint Propagation (CPCP), which can provide the prior knowledge of the robustness of each data instance and its neighborhood. With the robustness generated from the consensus information of each view, we build a unified affinity matrix as a result of the propagation. Specifically, we fuse the affinity of different views at a data instance level instead of a view level. This paper also introduces an approach to deal with the imbalance between the positive and negative constraints. The proposed method has been tested in clustering tasks on two publicly available multi-view data sets to show the superior performance. |
Tasks | |
Published | 2016-09-21 |
URL | http://arxiv.org/abs/1609.06456v1 |
http://arxiv.org/pdf/1609.06456v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-view-constraint-propagation-with |
Repo | |
Framework | |
DeepDGA: Adversarially-Tuned Domain Generation and Detection
Title | DeepDGA: Adversarially-Tuned Domain Generation and Detection |
Authors | Hyrum S. Anderson, Jonathan Woodbridge, Bobby Filar |
Abstract | Many malware families utilize domain generation algorithms (DGAs) to establish command and control (C&C) connections. While there are many methods to pseudorandomly generate domains, we focus in this paper on detecting (and generating) domains on a per-domain basis which provides a simple and flexible means to detect known DGA families. Recent machine learning approaches to DGA detection have been successful on fairly simplistic DGAs, many of which produce names of fixed length. However, models trained on limited datasets are somewhat blind to new DGA variants. In this paper, we leverage the concept of generative adversarial networks to construct a deep learning based DGA that is designed to intentionally bypass a deep learning based detector. In a series of adversarial rounds, the generator learns to generate domain names that are increasingly more difficult to detect. In turn, a detector model updates its parameters to compensate for the adversarially generated domains. We test the hypothesis of whether adversarially generated domains may be used to augment training sets in order to harden other machine learning models against yet-to-be-observed DGAs. We detail solutions to several challenges in training this character-based generative adversarial network (GAN). In particular, our deep learning architecture begins as a domain name auto-encoder (encoder + decoder) trained on domains in the Alexa one million. Then the encoder and decoder are reassembled competitively in a generative adversarial network (detector + generator), with novel neural architectures and training strategies to improve convergence. |
Tasks | |
Published | 2016-10-06 |
URL | http://arxiv.org/abs/1610.01969v1 |
http://arxiv.org/pdf/1610.01969v1.pdf | |
PWC | https://paperswithcode.com/paper/deepdga-adversarially-tuned-domain-generation |
Repo | |
Framework | |
Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent
Title | Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent |
Authors | Chi Jin, Sham M. Kakade, Praneeth Netrapalli |
Abstract | Matrix completion, where we wish to recover a low rank matrix by observing a few entries from it, is a widely studied problem in both theory and practice with wide applications. Most of the provable algorithms so far on this problem have been restricted to the offline setting where they provide an estimate of the unknown matrix using all observations simultaneously. However, in many applications, the online version, where we observe one entry at a time and dynamically update our estimate, is more appealing. While existing algorithms are efficient for the offline setting, they could be highly inefficient for the online setting. In this paper, we propose the first provable, efficient online algorithm for matrix completion. Our algorithm starts from an initial estimate of the matrix and then performs non-convex stochastic gradient descent (SGD). After every observation, it performs a fast update involving only one row of two tall matrices, giving near linear total runtime. Our algorithm can be naturally used in the offline setting as well, where it gives competitive sample complexity and runtime to state of the art algorithms. Our proofs introduce a general framework to show that SGD updates tend to stay away from saddle surfaces and could be of broader interests for other non-convex problems to prove tight rates. |
Tasks | Matrix Completion |
Published | 2016-05-26 |
URL | http://arxiv.org/abs/1605.08370v1 |
http://arxiv.org/pdf/1605.08370v1.pdf | |
PWC | https://paperswithcode.com/paper/provable-efficient-online-matrix-completion |
Repo | |
Framework | |
Texture Enhancement via High-Resolution Style Transfer for Single-Image Super-Resolution
Title | Texture Enhancement via High-Resolution Style Transfer for Single-Image Super-Resolution |
Authors | Il Jun Ahn, Woo Hyun Nam |
Abstract | Recently, various deep-neural-network (DNN)-based approaches have been proposed for single-image super-resolution (SISR). Despite their promising results on major structure regions such as edges and lines, they still suffer from limited performance on texture regions that consist of very complex and fine patterns. This is because, during the acquisition of a low-resolution (LR) image via down-sampling, these regions lose most of the high frequency information necessary to represent the texture details. In this paper, we present a novel texture enhancement framework for SISR to effectively improve the spatial resolution in the texture regions as well as edges and lines. We call our method, high-resolution (HR) style transfer algorithm. Our framework consists of three steps: (i) generate an initial HR image from an interpolated LR image via an SISR algorithm, (ii) generate an HR style image from the initial HR image via down-scaling and tiling, and (iii) combine the HR style image with the initial HR image via a customized style transfer algorithm. Here, the HR style image is obtained by down-scaling the initial HR image and then repetitively tiling it into an image of the same size as the HR image. This down-scaling and tiling process comes from the idea that texture regions are often composed of small regions that similar in appearance albeit sometimes different in scale. This process creates an HR style image that is rich in details, which can be used to restore high-frequency texture details back into the initial HR image via the style transfer algorithm. Experimental results on a number of texture datasets show that our proposed HR style transfer algorithm provides more visually pleasing results compared with competitive methods. |
Tasks | Image Super-Resolution, Style Transfer, Super-Resolution |
Published | 2016-12-01 |
URL | http://arxiv.org/abs/1612.00085v1 |
http://arxiv.org/pdf/1612.00085v1.pdf | |
PWC | https://paperswithcode.com/paper/texture-enhancement-via-high-resolution-style |
Repo | |
Framework | |
Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization
Title | Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization |
Authors | Xingguo Li, Junwei Lu, Raman Arora, Jarvis Haupt, Han Liu, Zhaoran Wang, Tuo Zhao |
Abstract | We propose a general theory for studying the \xl{landscape} of nonconvex \xl{optimization} with underlying symmetric structures \tz{for a class of machine learning problems (e.g., low-rank matrix factorization, phase retrieval, and deep linear neural networks)}. In specific, we characterize the locations of stationary points and the null space of Hessian matrices \xl{of the objective function} via the lens of invariant groups\removed{for associated optimization problems, including low-rank matrix factorization, phase retrieval, and deep linear neural networks}. As a major motivating example, we apply the proposed general theory to characterize the global \xl{landscape} of the \xl{nonconvex optimization in} low-rank matrix factorization problem. In particular, we illustrate how the rotational symmetry group gives rise to infinitely many nonisolated strict saddle points and equivalent global minima of the objective function. By explicitly identifying all stationary points, we divide the entire parameter space into three regions: ($\cR_1$) the region containing the neighborhoods of all strict saddle points, where the objective has negative curvatures; ($\cR_2$) the region containing neighborhoods of all global minima, where the objective enjoys strong convexity along certain directions; and ($\cR_3$) the complement of the above regions, where the gradient has sufficiently large magnitudes. We further extend our result to the matrix sensing problem. Such global landscape implies strong global convergence guarantees for popular iterative algorithms with arbitrary initial solutions. |
Tasks | |
Published | 2016-12-29 |
URL | http://arxiv.org/abs/1612.09296v3 |
http://arxiv.org/pdf/1612.09296v3.pdf | |
PWC | https://paperswithcode.com/paper/symmetry-saddle-points-and-global |
Repo | |
Framework | |
A Hybrid Both Filter and Wrapper Feature Selection Method for Microarray Classification
Title | A Hybrid Both Filter and Wrapper Feature Selection Method for Microarray Classification |
Authors | Li-Yeh Chuang, Chao-Hsuan Ke, Cheng-Hong Yang |
Abstract | Gene expression data is widely used in disease analysis and cancer diagnosis. However, since gene expression data could contain thousands of genes simultaneously, successful microarray classification is rather difficult. Feature selection is an important pre-treatment for any classification process. Selecting a useful gene subset as a classifier not only decreases the computational time and cost, but also increases classification accuracy. In this study, we applied the information gain method as a filter approach, and an improved binary particle swarm optimization as a wrapper approach to implement feature selection; selected gene subsets were used to evaluate the performance of classification. Experimental results show that by employing the proposed method fewer gene subsets needed to be selected and better classification accuracy could be obtained. |
Tasks | Feature Selection |
Published | 2016-12-27 |
URL | http://arxiv.org/abs/1612.08669v1 |
http://arxiv.org/pdf/1612.08669v1.pdf | |
PWC | https://paperswithcode.com/paper/a-hybrid-both-filter-and-wrapper-feature |
Repo | |
Framework | |
Learning Structured Sparsity in Deep Neural Networks
Title | Learning Structured Sparsity in Deep Neural Networks |
Authors | Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, Hai Li |
Abstract | High demand for computation resources severely hinders deployment of large-scale Deep Neural Networks (DNN) in resource constrained devices. In this work, we propose a Structured Sparsity Learning (SSL) method to regularize the structures (i.e., filters, channels, filter shapes, and layer depth) of DNNs. SSL can: (1) learn a compact structure from a bigger DNN to reduce computation cost; (2) obtain a hardware-friendly structured sparsity of DNN to efficiently accelerate the DNNs evaluation. Experimental results show that SSL achieves on average 5.1x and 3.1x speedups of convolutional layer computation of AlexNet against CPU and GPU, respectively, with off-the-shelf libraries. These speedups are about twice speedups of non-structured sparsity; (3) regularize the DNN structure to improve classification accuracy. The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network (ResNet) to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers. For AlexNet, structure regularization by SSL also reduces the error by around ~1%. Open source code is in https://github.com/wenwei202/caffe/tree/scnn |
Tasks | |
Published | 2016-08-12 |
URL | http://arxiv.org/abs/1608.03665v4 |
http://arxiv.org/pdf/1608.03665v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-structured-sparsity-in-deep-neural |
Repo | |
Framework | |
Theta-RBM: Unfactored Gated Restricted Boltzmann Machine for Rotation-Invariant Representations
Title | Theta-RBM: Unfactored Gated Restricted Boltzmann Machine for Rotation-Invariant Representations |
Authors | Mario Valerio Giuffrida, Sotirios A. Tsaftaris |
Abstract | Learning invariant representations is a critical task in computer vision. In this paper, we propose the Theta-Restricted Boltzmann Machine ({\theta}-RBM in short), which builds upon the original RBM formulation and injects the notion of rotation-invariance during the learning procedure. In contrast to previous approaches, we do not transform the training set with all possible rotations. Instead, we rotate the gradient filters when they are computed during the Contrastive Divergence algorithm. We formulate our model as an unfactored gated Boltzmann machine, where another input layer is used to modulate the input visible layer to drive the optimisation procedure. Among our contributions is a mathematical proof that demonstrates that {\theta}-RBM is able to learn rotation-invariant features according to a recently proposed invariance measure. Our method reaches an invariance score of ~90% on mnist-rot dataset, which is the highest result compared with the baseline methods and the current state of the art in transformation-invariant feature learning in RBM. Using an SVM classifier, we also showed that our network learns discriminative features as well, obtaining ~10% of testing error. |
Tasks | |
Published | 2016-06-28 |
URL | http://arxiv.org/abs/1606.08805v2 |
http://arxiv.org/pdf/1606.08805v2.pdf | |
PWC | https://paperswithcode.com/paper/theta-rbm-unfactored-gated-restricted |
Repo | |
Framework | |
A quantitative analysis of tilt in the Café Wall illusion: a bioplausible model for foveal and peripheral vision
Title | A quantitative analysis of tilt in the Café Wall illusion: a bioplausible model for foveal and peripheral vision |
Authors | Nasim Nematzadeh, David M. W. Powers |
Abstract | The biological characteristics of human visual processing can be investigated through the study of optical illusions and their perception, giving rise to intuitions that may improve computer vision to match human performance. Geometric illusions are a specific subfamily in which orientations and angles are misperceived. This paper reports quantifiable predictions of the degree of tilt for a typical geometric illusion called Caf'e Wall, in which the mortar between the tiles seems to tilt or bow. Our study employs a common bioplausible model of retinal processing and we further develop an analytic processing pipeline to quantify and thus predict the specific angle of tilt. We further study the effect of resolution and feature size in order to predict the different perceived tilts in different areas of the fovea and periphery, where resolution varies as the eye saccades to different parts of the image. In the experiments, several different minimal portions of the pattern, modeling monocular and binocular foveal views, are investigated across multiple scales, in order to quantify tilts with confidence intervals and explore the difference between local and global tilt. |
Tasks | |
Published | 2016-09-22 |
URL | http://arxiv.org/abs/1609.06927v1 |
http://arxiv.org/pdf/1609.06927v1.pdf | |
PWC | https://paperswithcode.com/paper/a-quantitative-analysis-of-tilt-in-the-cafe |
Repo | |
Framework | |
Grammatical Templates: Improving Text Difficulty Evaluation for Language Learners
Title | Grammatical Templates: Improving Text Difficulty Evaluation for Language Learners |
Authors | Shuhan Wang, Erik Andersen |
Abstract | Language students are most engaged while reading texts at an appropriate difficulty level. However, existing methods of evaluating text difficulty focus mainly on vocabulary and do not prioritize grammatical features, hence they do not work well for language learners with limited knowledge of grammar. In this paper, we introduce grammatical templates, the expert-identified units of grammar that students learn from class, as an important feature of text difficulty evaluation. Experimental classification results show that grammatical template features significantly improve text difficulty prediction accuracy over baseline readability features by 7.4%. Moreover, we build a simple and human-understandable text difficulty evaluation approach with 87.7% accuracy, using only 5 grammatical template features. |
Tasks | |
Published | 2016-09-16 |
URL | http://arxiv.org/abs/1609.05180v2 |
http://arxiv.org/pdf/1609.05180v2.pdf | |
PWC | https://paperswithcode.com/paper/grammatical-templates-improving-text |
Repo | |
Framework | |
Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes
Title | Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes |
Authors | Jordi Grau-Moya, Felix Leibfried, Tim Genewein, Daniel A. Braun |
Abstract | Information-theoretic principles for learning and acting have been proposed to solve particular classes of Markov Decision Problems. Mathematically, such approaches are governed by a variational free energy principle and allow solving MDP planning problems with information-processing constraints expressed in terms of a Kullback-Leibler divergence with respect to a reference distribution. Here we consider a generalization of such MDP planners by taking model uncertainty into account. As model uncertainty can also be formalized as an information-processing constraint, we can derive a unified solution from a single generalized variational principle. We provide a generalized value iteration scheme together with a convergence proof. As limit cases, this generalized scheme includes standard value iteration with a known model, Bayesian MDP planning, and robust planning. We demonstrate the benefits of this approach in a grid world simulation. |
Tasks | |
Published | 2016-04-07 |
URL | http://arxiv.org/abs/1604.02080v1 |
http://arxiv.org/pdf/1604.02080v1.pdf | |
PWC | https://paperswithcode.com/paper/planning-with-information-processing |
Repo | |
Framework | |
Improved Strongly Adaptive Online Learning using Coin Betting
Title | Improved Strongly Adaptive Online Learning using Coin Betting |
Authors | Kwang-Sung Jun, Francesco Orabona, Rebecca Willett, Stephen Wright |
Abstract | This paper describes a new parameter-free online learning algorithm for changing environments. In comparing against algorithms with the same time complexity as ours, we obtain a strongly adaptive regret bound that is a factor of at least $\sqrt{\log(T)}$ better, where $T$ is the time horizon. Empirical results show that our algorithm outperforms state-of-the-art methods in learning with expert advice and metric learning scenarios. |
Tasks | Metric Learning |
Published | 2016-10-14 |
URL | http://arxiv.org/abs/1610.04578v3 |
http://arxiv.org/pdf/1610.04578v3.pdf | |
PWC | https://paperswithcode.com/paper/improved-strongly-adaptive-online-learning |
Repo | |
Framework | |
Persistence Lenses: Segmentation, Simplification, Vectorization, Scale Space and Fractal Analysis of Images
Title | Persistence Lenses: Segmentation, Simplification, Vectorization, Scale Space and Fractal Analysis of Images |
Authors | Martin Brooks |
Abstract | A persistence lens is a hierarchy of disjoint upper and lower level sets of a continuous luminance image’s Reeb graph. The boundary components of a persistence lens’s interior components are Jordan curves that serve as a hierarchical segmentation of the image, and may be rendered as vector graphics. A persistence lens determines a varilet basis for the luminance image, in which image simplification is a realized by subspace projection. Image scale space, and image fractal analysis, result from applying a scale measure to each basis function. |
Tasks | |
Published | 2016-04-25 |
URL | http://arxiv.org/abs/1604.07361v3 |
http://arxiv.org/pdf/1604.07361v3.pdf | |
PWC | https://paperswithcode.com/paper/persistence-lenses-segmentation |
Repo | |
Framework | |
Simultaneous Sparse Dictionary Learning and Pruning
Title | Simultaneous Sparse Dictionary Learning and Pruning |
Authors | Simeng Qu, Xiao Wang |
Abstract | Dictionary learning is a cutting-edge area in imaging processing, that has recently led to state-of-the-art results in many signal processing tasks. The idea is to conduct a linear decomposition of a signal using a few atoms of a learned and usually over-completed dictionary instead of a pre-defined basis. Determining a proper size of the to-be-learned dictionary is crucial for both precision and efficiency of the process, while most of the existing dictionary learning algorithms choose the size quite arbitrarily. In this paper, a novel regularization method called the Grouped Smoothly Clipped Absolute Deviation (GSCAD) is employed for learning the dictionary. The proposed method can simultaneously learn a sparse dictionary and select the appropriate dictionary size. Efficient algorithm is designed based on the alternative direction method of multipliers (ADMM) which decomposes the joint non-convex problem with the non-convex penalty into two convex optimization problems. Several examples are presented for image denoising and the experimental results are compared with other state-of-the-art approaches. |
Tasks | Denoising, Dictionary Learning, Image Denoising |
Published | 2016-05-25 |
URL | http://arxiv.org/abs/1605.07870v1 |
http://arxiv.org/pdf/1605.07870v1.pdf | |
PWC | https://paperswithcode.com/paper/simultaneous-sparse-dictionary-learning-and |
Repo | |
Framework | |
Diving deeper into mentee networks
Title | Diving deeper into mentee networks |
Authors | Ragav Venkatesan, Baoxin Li |
Abstract | Modern computer vision is all about the possession of powerful image representations. Deeper and deeper convolutional neural networks have been built using larger and larger datasets and are made publicly available. A large swath of computer vision scientists use these pre-trained networks with varying degrees of successes in various tasks. Even though there is tremendous success in copying these networks, the representational space is not learnt from the target dataset in a traditional manner. One of the reasons for opting to use a pre-trained network over a network learnt from scratch is that small datasets provide less supervision and require meticulous regularization, smaller and careful tweaking of learning rates to even achieve stable learning without weight explosion. It is often the case that large deep networks are not portable, which necessitates the ability to learn mid-sized networks from scratch. In this article, we dive deeper into training these mid-sized networks on small datasets from scratch by drawing additional supervision from a large pre-trained network. Such learning also provides better generalization accuracies than networks trained with common regularization techniques such as l2, l1 and dropouts. We show that features learnt thus, are more general than those learnt independently. We studied various characteristics of such networks and found some interesting behaviors. |
Tasks | |
Published | 2016-04-27 |
URL | http://arxiv.org/abs/1604.08220v1 |
http://arxiv.org/pdf/1604.08220v1.pdf | |
PWC | https://paperswithcode.com/paper/diving-deeper-into-mentee-networks |
Repo | |
Framework | |