Paper Group ANR 942
Rademacher Complexity for Adversarially Robust Generalization. Performance Guarantees for Homomorphisms Beyond Markov Decision Processes. Finding Better Topologies for Deep Convolutional Neural Networks by Evolution. An Aggressive Genetic Programming Approach for Searching Neural Network Structure Under Computational Constraints. CrossNet: An End-t …
Rademacher Complexity for Adversarially Robust Generalization
Title | Rademacher Complexity for Adversarially Robust Generalization |
Authors | Dong Yin, Kannan Ramchandran, Peter Bartlett |
Abstract | Many machine learning models are vulnerable to adversarial attacks; for example, adding adversarial perturbations that are imperceptible to humans can often make machine learning models produce wrong predictions with high confidence. Moreover, although we may obtain robust models on the training dataset via adversarial training, in some problems the learned models cannot generalize well to the test data. In this paper, we focus on $\ell_\infty$ attacks, and study the adversarially robust generalization problem through the lens of Rademacher complexity. For binary linear classifiers, we prove tight bounds for the adversarial Rademacher complexity, and show that the adversarial Rademacher complexity is never smaller than its natural counterpart, and it has an unavoidable dimension dependence, unless the weight vector has bounded $\ell_1$ norm. The results also extend to multi-class linear classifiers. For (nonlinear) neural networks, we show that the dimension dependence in the adversarial Rademacher complexity also exists. We further consider a surrogate adversarial loss for one-hidden layer ReLU network and prove margin bounds for this setting. Our results indicate that having $\ell_1$ norm constraints on the weight matrices might be a potential way to improve generalization in the adversarial setting. We demonstrate experimental results that validate our theoretical findings. |
Tasks | |
Published | 2018-10-29 |
URL | http://arxiv.org/abs/1810.11914v3 |
http://arxiv.org/pdf/1810.11914v3.pdf | |
PWC | https://paperswithcode.com/paper/rademacher-complexity-for-adversarially |
Repo | |
Framework | |
Performance Guarantees for Homomorphisms Beyond Markov Decision Processes
Title | Performance Guarantees for Homomorphisms Beyond Markov Decision Processes |
Authors | Sultan Javed Majeed, Marcus Hutter |
Abstract | Most real-world problems have huge state and/or action spaces. Therefore, a naive application of existing tabular solution methods is not tractable on such problems. Nonetheless, these solution methods are quite useful if an agent has access to a relatively small state-action space homomorphism of the true environment and near-optimal performance is guaranteed by the map. A plethora of research is focused on the case when the homomorphism is a Markovian representation of the underlying process. However, we show that near-optimal performance is sometimes guaranteed even if the homomorphism is non-Markovian. Moreover, we can aggregate significantly more states by lifting the Markovian requirement without compromising on performance. In this work, we expand Extreme State Aggregation (ESA) framework to joint state-action aggregations. We also lift the policy uniformity condition for aggregation in ESA that allows even coarser modeling of the true environment. |
Tasks | |
Published | 2018-11-09 |
URL | http://arxiv.org/abs/1811.03895v1 |
http://arxiv.org/pdf/1811.03895v1.pdf | |
PWC | https://paperswithcode.com/paper/performance-guarantees-for-homomorphisms |
Repo | |
Framework | |
Finding Better Topologies for Deep Convolutional Neural Networks by Evolution
Title | Finding Better Topologies for Deep Convolutional Neural Networks by Evolution |
Authors | Honglei Zhang, Serkan Kiranyaz, Moncef Gabbouj |
Abstract | Due to the nonlinearity of artificial neural networks, designing topologies for deep convolutional neural networks (CNN) is a challenging task and often only heuristic approach, such as trial and error, can be applied. An evolutionary algorithm can solve optimization problems where the fitness landscape is unknown. However, evolutionary algorithms are computing resource intensive, which makes it difficult for problems when deep CNNs are involved. In this paper, we propose an evolutionary strategy to find better topologies for deep CNNs. Incorporating the concept of knowledge inheritance and knowledge learning, our evolutionary algorithm can be executed with limited computing resources. We applied the proposed algorithm in finding effective topologies of deep CNNs for the image classification task using CIFAR-10 dataset. After the evolution, we analyzed the topologies that performed well for this task. Our studies verify the techniques that have been commonly used in human designed deep CNNs. We also discovered that some of the graph properties greatly affect the system performance. We applied the guidelines learned from the evolution and designed new network topologies that outperform Residual Net with less layers on CIFAR-10, CIFAR-100, and SVHN dataset. |
Tasks | Image Classification |
Published | 2018-09-10 |
URL | http://arxiv.org/abs/1809.03242v1 |
http://arxiv.org/pdf/1809.03242v1.pdf | |
PWC | https://paperswithcode.com/paper/finding-better-topologies-for-deep |
Repo | |
Framework | |
An Aggressive Genetic Programming Approach for Searching Neural Network Structure Under Computational Constraints
Title | An Aggressive Genetic Programming Approach for Searching Neural Network Structure Under Computational Constraints |
Authors | Zhe Li, Xuehan Xiong, Zhou Ren, Ning Zhang, Xiaoyu Wang, Tianbao Yang |
Abstract | Recently, there emerged revived interests of designing automatic programs (e.g., using genetic/evolutionary algorithms) to optimize the structure of Convolutional Neural Networks (CNNs) for a specific task. The challenge in designing such programs lies in how to balance between large search space of the network structures and high computational costs. Existing works either impose strong restrictions on the search space or use enormous computing resources. In this paper, we study how to design a genetic programming approach for optimizing the structure of a CNN for a given task under limited computational resources yet without imposing strong restrictions on the search space. To reduce the computational costs, we propose two general strategies that are observed to be helpful: (i) aggressively selecting strongest individuals for survival and reproduction, and killing weaker individuals at a very early age; (ii) increasing mutation frequency to encourage diversity and faster evolution. The combined strategy with additional optimization techniques allows us to explore a large search space but with affordable computational costs. Our results on standard benchmark datasets (MNIST, SVHN, CIFAR-10, CIFAR-100) are competitive to similar approaches with significantly reduced computational costs. |
Tasks | |
Published | 2018-06-03 |
URL | http://arxiv.org/abs/1806.00851v1 |
http://arxiv.org/pdf/1806.00851v1.pdf | |
PWC | https://paperswithcode.com/paper/an-aggressive-genetic-programming-approach |
Repo | |
Framework | |
CrossNet: An End-to-end Reference-based Super Resolution Network using Cross-scale Warping
Title | CrossNet: An End-to-end Reference-based Super Resolution Network using Cross-scale Warping |
Authors | Haitian Zheng, Mengqi Ji, Haoqian Wang, Yebin Liu, Lu Fang |
Abstract | The Reference-based Super-resolution (RefSR) super-resolves a low-resolution (LR) image given an external high-resolution (HR) reference image, where the reference image and LR image share similar viewpoint but with significant resolution gap x8. Existing RefSR methods work in a cascaded way such as patch matching followed by synthesis pipeline with two independently defined objective functions, leading to the inter-patch misalignment, grid effect and inefficient optimization. To resolve these issues, we present CrossNet, an end-to-end and fully-convolutional deep neural network using cross-scale warping. Our network contains image encoders, cross-scale warping layers, and fusion decoder: the encoder serves to extract multi-scale features from both the LR and the reference images; the cross-scale warping layers spatially aligns the reference feature map with the LR feature map; the decoder finally aggregates feature maps from both domains to synthesize the HR output. Using cross-scale warping, our network is able to perform spatial alignment at pixel-level in an end-to-end fashion, which improves the existing schemes both in precision (around 2dB-4dB) and efficiency (more than 100 times faster). |
Tasks | Super-Resolution |
Published | 2018-07-27 |
URL | http://arxiv.org/abs/1807.10547v1 |
http://arxiv.org/pdf/1807.10547v1.pdf | |
PWC | https://paperswithcode.com/paper/crossnet-an-end-to-end-reference-based-super |
Repo | |
Framework | |
Stochastic Doubly Robust Gradient
Title | Stochastic Doubly Robust Gradient |
Authors | Kanghoon Lee, Jihye Choi, Moonsu Cha, Jung-Kwon Lee, Taeyoon Kim |
Abstract | When training a machine learning model with observational data, it is often encountered that some values are systemically missing. Learning from the incomplete data in which the missingness depends on some covariates may lead to biased estimation of parameters and even harm the fairness of decision outcome. This paper proposes how to adjust the causal effect of covariates on the missingness when training models using stochastic gradient descent (SGD). Inspired by the design of doubly robust estimator and its theoretical property of double robustness, we introduce stochastic doubly robust gradient (SDRG) consisting of two models: weight-corrected gradients for inverse propensity score weighting and per-covariate control variates for regression adjustment. Also, we identify the connection between double robustness and variance reduction in SGD by demonstrating the SDRG algorithm with a unifying framework for variance reduced SGD. The performance of our approach is empirically tested by showing the convergence in training image classifiers with several examples of missing data. |
Tasks | |
Published | 2018-12-21 |
URL | http://arxiv.org/abs/1812.08997v1 |
http://arxiv.org/pdf/1812.08997v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-doubly-robust-gradient |
Repo | |
Framework | |
Regularized Evolution for Image Classifier Architecture Search
Title | Regularized Evolution for Image Classifier Architecture Search |
Authors | Esteban Real, Alok Aggarwal, Yanping Huang, Quoc V Le |
Abstract | The effort devoted to hand-crafting neural network image classifiers has motivated the use of architecture search to discover them automatically. Although evolutionary algorithms have been repeatedly applied to neural network topologies, the image classifiers thus discovered have remained inferior to human-crafted ones. Here, we evolve an image classifier—AmoebaNet-A—that surpasses hand-designs for the first time. To do this, we modify the tournament selection evolutionary algorithm by introducing an age property to favor the younger genotypes. Matching size, AmoebaNet-A has comparable accuracy to current state-of-the-art ImageNet models discovered with more complex architecture-search methods. Scaled to larger size, AmoebaNet-A sets a new state-of-the-art 83.9% / 96.6% top-5 ImageNet accuracy. In a controlled comparison against a well known reinforcement learning algorithm, we give evidence that evolution can obtain results faster with the same hardware, especially at the earlier stages of the search. This is relevant when fewer compute resources are available. Evolution is, thus, a simple method to effectively discover high-quality architectures. |
Tasks | Image Classification, Neural Architecture Search |
Published | 2018-02-05 |
URL | http://arxiv.org/abs/1802.01548v7 |
http://arxiv.org/pdf/1802.01548v7.pdf | |
PWC | https://paperswithcode.com/paper/regularized-evolution-for-image-classifier |
Repo | |
Framework | |
Semantic and Contrast-Aware Saliency
Title | Semantic and Contrast-Aware Saliency |
Authors | Xiaoshuai Sun |
Abstract | In this paper, we proposed an integrated model of semantic-aware and contrast-aware saliency combining both bottom-up and top-down cues for effective saliency estimation and eye fixation prediction. The proposed model processes visual information using two pathways. The first pathway aims to capture the attractive semantic information in images, especially for the presence of meaningful objects and object parts such as human faces. The second pathway is based on multi-scale on-line feature learning and information maximization, which learns an adaptive sparse representation for the input and discovers the high contrast salient patterns within the image context. The two pathways characterize both long-term and short-term attention cues and are integrated dynamically using maxima normalization. We investigate two different implementations of the semantic pathway including an End-to-End deep neural network solution and a dynamic feature integration solution, resulting in the SCA and SCAFI model respectively. Experimental results on artificial images and 5 popular benchmark datasets demonstrate the superior performance and better plausibility of the proposed model over both classic approaches and recent deep models. |
Tasks | Saliency Prediction |
Published | 2018-11-09 |
URL | http://arxiv.org/abs/1811.03736v1 |
http://arxiv.org/pdf/1811.03736v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-and-contrast-aware-saliency |
Repo | |
Framework | |
Motion Blur removal via Coupled Autoencoder
Title | Motion Blur removal via Coupled Autoencoder |
Authors | Kavya Gupta, Brojeshwar Bhowmick, Angshul Majumdar |
Abstract | In this paper a joint optimization technique has been proposed for coupled autoencoder which learns the autoencoder weights and coupling map (between source and target) simultaneously. The technique is applicable to any transfer learning problem. In this work, we propose a new formulation that recasts deblurring as a transfer learning problem, it is solved using the proposed coupled autoencoder. The proposed technique can operate on-the-fly, since it does not require solving any costly inverse problem. Experiments have been carried out on state-of-the-art techniques, our method yields better quality images in shorter operating times. |
Tasks | Deblurring, Transfer Learning |
Published | 2018-12-24 |
URL | http://arxiv.org/abs/1812.09888v1 |
http://arxiv.org/pdf/1812.09888v1.pdf | |
PWC | https://paperswithcode.com/paper/motion-blur-removal-via-coupled-autoencoder |
Repo | |
Framework | |
Explorations in an English Poetry Corpus: A Neurocognitive Poetics Perspective
Title | Explorations in an English Poetry Corpus: A Neurocognitive Poetics Perspective |
Authors | Arthur M. Jacobs |
Abstract | This paper describes a corpus of about 3000 English literary texts with about 250 million words extracted from the Gutenberg project that span a range of genres from both fiction and non-fiction written by more than 130 authors (e.g., Darwin, Dickens, Shakespeare). Quantitative Narrative Analysis (QNA) is used to explore a cleaned subcorpus, the Gutenberg English Poetry Corpus (GEPC) which comprises over 100 poetic texts with around 2 million words from about 50 authors (e.g., Keats, Joyce, Wordsworth). Some exemplary QNA studies show author similarities based on latent semantic analysis, significant topics for each author or various text-analytic metrics for George Eliot’s poem ‘How Lisa Loved the King’ and James Joyce’s ‘Chamber Music’, concerning e.g. lexical diversity or sentiment analysis. The GEPC is particularly suited for research in Digital Humanities, Natural Language Processing or Neurocognitive Poetics, e.g. as training and test corpus, or for stimulus development and control. |
Tasks | Sentiment Analysis |
Published | 2018-01-06 |
URL | http://arxiv.org/abs/1801.02054v1 |
http://arxiv.org/pdf/1801.02054v1.pdf | |
PWC | https://paperswithcode.com/paper/explorations-in-an-english-poetry-corpus-a |
Repo | |
Framework | |
Greedy Algorithms for Approximating the Diameter of Machine Learning Datasets in Multidimensional Euclidean Space
Title | Greedy Algorithms for Approximating the Diameter of Machine Learning Datasets in Multidimensional Euclidean Space |
Authors | Ahmad B. Hassanat |
Abstract | Finding the diameter of a dataset in multidimensional Euclidean space is a well-established problem, with well-known algorithms. However, most of the algorithms found in the literature do not scale well with large values of data dimension, so the time complexity grows exponentially in most cases, which makes these algorithms impractical. Therefore, we implemented 4 simple greedy algorithms to be used for approximating the diameter of a multidimensional dataset; these are based on minimum/maximum l2 norms, hill climbing search, Tabu search and Beam search approaches, respectively. The time complexity of the implemented algorithms is near-linear, as they scale near-linearly with data size and its dimensions. The results of the experiments (conducted on different machine learning data sets) prove the efficiency of the implemented algorithms and can therefore be recommended for finding the diameter to be used by different machine learning applications when needed. |
Tasks | |
Published | 2018-08-10 |
URL | http://arxiv.org/abs/1808.03566v1 |
http://arxiv.org/pdf/1808.03566v1.pdf | |
PWC | https://paperswithcode.com/paper/greedy-algorithms-for-approximating-the |
Repo | |
Framework | |
Intrinsic Dimension of Geometric Data Sets
Title | Intrinsic Dimension of Geometric Data Sets |
Authors | Tom Hanika, Friedrich Martin Schneider, Gerd Stumme |
Abstract | The curse of dimensionality is a phenomenon frequently observed in machine learning (ML) and knowledge discovery (KD). There is a large body of literature investigating its origin and impact, using methods from mathematics as well as from computer science. Among the mathematical insights into data dimensionality, there is an intimate link between the dimension curse and the phenomenon of measure concentration, which makes the former accessible to methods of geometric analysis. The present work provides a comprehensive study of the intrinsic geometry of a data set, based on Gromov’s metric measure geometry and Pestov’s axiomatic approach to intrinsic dimension. In detail, we define a concept of geometric data set and introduce a metric as well as a partial order on the set of isomorphism classes of such data sets. Based on these objects, we propose and investigate an axiomatic approach to the intrinsic dimension of geometric data sets and establish a concrete dimension function with the desired properties. Our mathematical model for data sets and their intrinsic dimension is computationally feasible and, moreover, adaptable to specific ML/KD-algorithms, as illustrated by various experiments. |
Tasks | |
Published | 2018-01-24 |
URL | http://arxiv.org/abs/1801.07985v2 |
http://arxiv.org/pdf/1801.07985v2.pdf | |
PWC | https://paperswithcode.com/paper/intrinsic-dimension-of-geometric-data-sets |
Repo | |
Framework | |
Tartan: A retrieval-based socialbot powered by a dynamic finite-state machine architecture
Title | Tartan: A retrieval-based socialbot powered by a dynamic finite-state machine architecture |
Authors | George Larionov, Zachary Kaden, Hima Varsha Dureddy, Gabriel Bayomi T. Kalejaiye, Mihir Kale, Srividya Pranavi Potharaju, Ankit Parag Shah, Alexander I Rudnicky |
Abstract | This paper describes the Tartan conversational agent built for the 2018 Alexa Prize Competition. Tartan is a non-goal-oriented socialbot focused around providing users with an engaging and fluent casual conversation. Tartan’s key features include an emphasis on structured conversation based on flexible finite-state models and an approach focused on understanding and using conversational acts. To provide engaging conversations, Tartan blends script-like yet dynamic responses with data-based generative and retrieval models. Unique to Tartan is that our dialog manager is modeled as a dynamic Finite State Machine. To our knowledge, no other conversational agent implementation has followed this specific structure. |
Tasks | |
Published | 2018-12-04 |
URL | http://arxiv.org/abs/1812.01260v1 |
http://arxiv.org/pdf/1812.01260v1.pdf | |
PWC | https://paperswithcode.com/paper/tartan-a-retrieval-based-socialbot-powered-by |
Repo | |
Framework | |
Multi-scenario deep learning for multi-speaker source separation
Title | Multi-scenario deep learning for multi-speaker source separation |
Authors | Jeroen Zegers, Hugo Van hamme |
Abstract | Research in deep learning for multi-speaker source separation has received a boost in the last years. However, most studies are restricted to mixtures of a specific number of speakers, called a specific scenario. While some works included experiments for different scenarios, research towards combining data of different scenarios or creating a single model for multiple scenarios have been very rare. In this work it is shown that data of a specific scenario is relevant for solving another scenario. Furthermore, it is concluded that a single model, trained on different scenarios is capable of matching performance of scenario specific models. |
Tasks | Multi-Speaker Source Separation |
Published | 2018-08-24 |
URL | http://arxiv.org/abs/1808.08095v1 |
http://arxiv.org/pdf/1808.08095v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-scenario-deep-learning-for-multi |
Repo | |
Framework | |
Channel Splitting Network for Single MR Image Super-Resolution
Title | Channel Splitting Network for Single MR Image Super-Resolution |
Authors | Xiaole Zhao, Yulun Zhang, Tao Zhang, Xueming Zou |
Abstract | High resolution magnetic resonance (MR) imaging is desirable in many clinical applications due to its contribution to more accurate subsequent analyses and early clinical diagnoses. Single image super resolution (SISR) is an effective and cost efficient alternative technique to improve the spatial resolution of MR images. In the past few years, SISR methods based on deep learning techniques, especially convolutional neural networks (CNNs), have achieved state-of-the-art performance on natural images. However, the information is gradually weakened and training becomes increasingly difficult as the network deepens. The problem is more serious for medical images because lacking high quality and effective training samples makes deep models prone to underfitting or overfitting. Nevertheless, many current models treat the hierarchical features on different channels equivalently, which is not helpful for the models to deal with the hierarchical features discriminatively and targetedly. To this end, we present a novel channel splitting network (CSN) to ease the representational burden of deep models. The proposed CSN model divides the hierarchical features into two branches, i.e., residual branch and dense branch, with different information transmissions. The residual branch is able to promote feature reuse, while the dense branch is beneficial to the exploration of new features. Besides, we also adopt the merge-and-run mapping to facilitate information integration between different branches. Extensive experiments on various MR images, including proton density (PD), T1 and T2 images, show that the proposed CSN model achieves superior performance over other state-of-the-art SISR methods. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2018-10-15 |
URL | https://arxiv.org/abs/1810.06453v3 |
https://arxiv.org/pdf/1810.06453v3.pdf | |
PWC | https://paperswithcode.com/paper/channel-splitting-network-for-single-mr-image |
Repo | |
Framework | |