Paper Group ANR 849
Sample Compression, Support Vectors, and Generalization in Deep Learning. Panoptic Segmentation with a Joint Semantic and Instance Segmentation Network. Deep Convolutional Neural Networks in the Face of Caricature: Identity and Image Revealed. WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling. Cost-Aware Learning and Optimization …
Sample Compression, Support Vectors, and Generalization in Deep Learning
Title | Sample Compression, Support Vectors, and Generalization in Deep Learning |
Authors | Christopher Snyder, Sriram Vishwanath |
Abstract | Even though Deep Neural Networks (DNNs) are widely celebrated for their practical performance, they possess many intriguing properties related to depth that are difficult to explain both theoretically and intuitively. Understanding how weights in deep networks coordinate together across layers to form useful learners has proven challenging, in part because the repeated composition of nonlinearities has proved intractable. This paper presents a reparameterization of DNNs as a linear function of a feature map that is locally independent of the weights. This feature map transforms depth-dependencies into simple tensor products and maps each input to a discrete subset of the feature space. Then, using a max-margin assumption, the paper develops a sample compression representation of the neural network in terms of the discrete activation state of neurons induced by s ``support vectors”. The paper shows that the number of support vectors s relates with learning guarantees for neural networks through sample compression bounds, yielding a sample complexity of O(ns/epsilon) for networks with n neurons. Finally, the number of support vectors s is found to monotonically increase with width and label noise but decrease with depth. | |
Tasks | |
Published | 2018-11-05 |
URL | https://arxiv.org/abs/1811.02067v4 |
https://arxiv.org/pdf/1811.02067v4.pdf | |
PWC | https://paperswithcode.com/paper/generalization-bounds-for-neural-networks |
Repo | |
Framework | |
Panoptic Segmentation with a Joint Semantic and Instance Segmentation Network
Title | Panoptic Segmentation with a Joint Semantic and Instance Segmentation Network |
Authors | Daan de Geus, Panagiotis Meletis, Gijs Dubbelman |
Abstract | We present a single network method for panoptic segmentation. This method combines the predictions from a jointly trained semantic and instance segmentation network using heuristics. Joint training is the first step towards an end-to-end panoptic segmentation network and is faster and more memory efficient than training and predicting with two networks, as done in previous work. The architecture consists of a ResNet-50 feature extractor shared by the semantic segmentation and instance segmentation branch. For instance segmentation, a Mask R-CNN type of architecture is used, while the semantic segmentation branch is augmented with a Pyramid Pooling Module. Results for this method are submitted to the COCO and Mapillary Joint Recognition Challenge 2018. Our approach achieves a PQ score of 17.6 on the Mapillary Vistas validation set and 27.2 on the COCO test-dev set. |
Tasks | Instance Segmentation, Panoptic Segmentation, Semantic Segmentation |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.02110v2 |
http://arxiv.org/pdf/1809.02110v2.pdf | |
PWC | https://paperswithcode.com/paper/panoptic-segmentation-with-a-joint-semantic |
Repo | |
Framework | |
Deep Convolutional Neural Networks in the Face of Caricature: Identity and Image Revealed
Title | Deep Convolutional Neural Networks in the Face of Caricature: Identity and Image Revealed |
Authors | Matthew Q. Hill, Connor J. Parde, Carlos D. Castillo, Y. Ivette Colon, Rajeev Ranjan, Jun-Cheng Chen, Volker Blanz, Alice J. O’Toole |
Abstract | Real-world face recognition requires an ability to perceive the unique features of an individual face across multiple, variable images. The primate visual system solves the problem of image invariance using cascades of neurons that convert images of faces into categorical representations of facial identity. Deep convolutional neural networks (DCNNs) also create generalizable face representations, but with cascades of simulated neurons. DCNN representations can be examined in a multidimensional “face space”, with identities and image parameters quantified via their projections onto the axes that define the space. We examined the organization of viewpoint, illumination, gender, and identity in this space. We show that the network creates a highly organized, hierarchically nested, face similarity structure in which information about face identity and imaging characteristics coexist. Natural image variation is accommodated in this hierarchy, with face identity nested under gender, illumination nested under identity, and viewpoint nested under illumination. To examine identity, we caricatured faces and found that network identification accuracy increased with caricature level, and–mimicking human perception–a caricatured distortion of a face “resembled” its veridical counterpart. Caricatures improved performance by moving the identity away from other identities in the face space and minimizing the effects of illumination and viewpoint. Deep networks produce face representations that solve long-standing computational problems in generalized face recognition. They also provide a unitary theoretical framework for reconciling decades of behavioral and neural results that emphasized either the image or the object/face in representations, without understanding how a neural code could seamlessly accommodate both. |
Tasks | Caricature, Face Recognition |
Published | 2018-12-28 |
URL | http://arxiv.org/abs/1812.10902v1 |
http://arxiv.org/pdf/1812.10902v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-convolutional-neural-networks-in-the |
Repo | |
Framework | |
WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling
Title | WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling |
Authors | Hao Zhang, Bo Chen, Dandan Guo, Mingyuan Zhou |
Abstract | To train an inference network jointly with a deep generative topic model, making it both scalable to big corpora and fast in out-of-sample prediction, we develop Weibull hybrid autoencoding inference (WHAI) for deep latent Dirichlet allocation, which infers posterior samples via a hybrid of stochastic-gradient MCMC and autoencoding variational Bayes. The generative network of WHAI has a hierarchy of gamma distributions, while the inference network of WHAI is a Weibull upward-downward variational autoencoder, which integrates a deterministic-upward deep neural network, and a stochastic-downward deep generative model based on a hierarchy of Weibull distributions. The Weibull distribution can be used to well approximate a gamma distribution with an analytic Kullback-Leibler divergence, and has a simple reparameterization via the uniform noise, which help efficiently compute the gradients of the evidence lower bound with respect to the parameters of the inference network. The effectiveness and efficiency of WHAI are illustrated with experiments on big corpora. |
Tasks | |
Published | 2018-03-04 |
URL | http://arxiv.org/abs/1803.01328v1 |
http://arxiv.org/pdf/1803.01328v1.pdf | |
PWC | https://paperswithcode.com/paper/whai-weibull-hybrid-autoencoding-inference |
Repo | |
Framework | |
Cost-Aware Learning and Optimization for Opportunistic Spectrum Access
Title | Cost-Aware Learning and Optimization for Opportunistic Spectrum Access |
Authors | Chao Gan, Ruida Zhou, Jing Yang, Cong Shen |
Abstract | In this paper, we investigate cost-aware joint learning and optimization for multi-channel opportunistic spectrum access in a cognitive radio system. We investigate a discrete time model where the time axis is partitioned into frames. Each frame consists of a sensing phase, followed by a transmission phase. During the sensing phase, the user is able to sense a subset of channels sequentially before it decides to use one of them in the following transmission phase. We assume the channel states alternate between busy and idle according to independent Bernoulli random processes from frame to frame. To capture the inherent uncertainty in channel sensing, we assume the reward of each transmission when the channel is idle is a random variable. We also associate random costs with sensing and transmission actions. Our objective is to understand how the costs and reward of the actions would affect the optimal behavior of the user in both offline and online settings, and design the corresponding opportunistic spectrum access strategies to maximize the expected cumulative net reward (i.e., reward-minus-cost). We start with an offline setting where the statistics of the channel status, costs and reward are known beforehand. We show that the the optimal policy exhibits a recursive double threshold structure, and the user needs to compare the channel statistics with those thresholds sequentially in order to decide its actions. With such insights, we then study the online setting, where the statistical information of the channels, costs and reward are unknown a priori. We judiciously balance exploration and exploitation, and show that the cumulative regret scales in O(log T). We also establish a matched lower bound, which implies that our online algorithm is order-optimal. Simulation results corroborate our theoretical analysis. |
Tasks | |
Published | 2018-04-11 |
URL | http://arxiv.org/abs/1804.04048v1 |
http://arxiv.org/pdf/1804.04048v1.pdf | |
PWC | https://paperswithcode.com/paper/cost-aware-learning-and-optimization-for |
Repo | |
Framework | |
Comparison of non-linear activation functions for deep neural networks on MNIST classification task
Title | Comparison of non-linear activation functions for deep neural networks on MNIST classification task |
Authors | Dabal Pedamonti |
Abstract | Activation functions play a key role in neural networks so it becomes fundamental to understand their advantages and disadvantages in order to achieve better performances. This paper will first introduce common types of non linear activation functions that are alternative to the well known sigmoid function and then evaluate their characteristics. Moreover deeper neural networks will be analysed because they positively influence the final performances compared to shallower networks. They also strictly depend on the weight initialisation hence the effect of drawing weights from Gaussian and uniform distribution will be analysed making particular attention on how the number of incoming and outgoing connection to a node influence the whole network. |
Tasks | |
Published | 2018-04-08 |
URL | http://arxiv.org/abs/1804.02763v1 |
http://arxiv.org/pdf/1804.02763v1.pdf | |
PWC | https://paperswithcode.com/paper/comparison-of-non-linear-activation-functions |
Repo | |
Framework | |
Alive Caricature from 2D to 3D
Title | Alive Caricature from 2D to 3D |
Authors | Qianyi Wu, Juyong Zhang, Yu-Kun Lai, Jianmin Zheng, Jianfei Cai |
Abstract | Caricature is an art form that expresses subjects in abstract, simple and exaggerated view. While many caricatures are 2D images, this paper presents an algorithm for creating expressive 3D caricatures from 2D caricature images with a minimum of user interaction. The key idea of our approach is to introduce an intrinsic deformation representation that has a capacity of extrapolation enabling us to create a deformation space from standard face dataset, which maintains face constraints and meanwhile is sufficiently large for producing exaggerated face models. Built upon the proposed deformation representation, an optimization model is formulated to find the 3D caricature that captures the style of the 2D caricature image automatically. The experiments show that our approach has better capability in expressing caricatures than those fitting approaches directly using classical parametric face models such as 3DMM and FaceWareHouse. Moreover, our approach is based on standard face datasets and avoids constructing complicated 3D caricature training set, which provides great flexibility in real applications. |
Tasks | Caricature |
Published | 2018-03-19 |
URL | http://arxiv.org/abs/1803.06802v3 |
http://arxiv.org/pdf/1803.06802v3.pdf | |
PWC | https://paperswithcode.com/paper/alive-caricature-from-2d-to-3d |
Repo | |
Framework | |
Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning
Title | Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning |
Authors | Takumi Akazaki, Shuang Liu, Yoriyuki Yamagata, Yihai Duan, Jianye Hao |
Abstract | With the rapid development of software and distributed computing, Cyber-Physical Systems (CPS) are widely adopted in many application areas, e.g., smart grid, autonomous automobile. It is difficult to detect defects in CPS models due to the complexities involved in the software and physical systems. To find defects in CPS models efficiently, robustness guided falsification of CPS is introduced. Existing methods use several optimization techniques to generate counterexamples, which falsify the given properties of a CPS. However those methods may require a large number of simulation runs to find the counterexample and is far from practical. In this work, we explore state-of-the-art Deep Reinforcement Learning (DRL) techniques to reduce the number of simulation runs required to find such counterexamples. We report our method and the preliminary evaluation results. |
Tasks | |
Published | 2018-05-01 |
URL | http://arxiv.org/abs/1805.00200v1 |
http://arxiv.org/pdf/1805.00200v1.pdf | |
PWC | https://paperswithcode.com/paper/falsification-of-cyber-physical-systems-using |
Repo | |
Framework | |
Understanding Regularization to Visualize Convolutional Neural Networks
Title | Understanding Regularization to Visualize Convolutional Neural Networks |
Authors | Maximilian Baust, Florian Ludwig, Christian Rupprecht, Matthias Kohl, Stefan Braunewell |
Abstract | Variational methods for revealing visual concepts learned by convolutional neural networks have gained significant attention during the last years. Being based on noisy gradients obtained via back-propagation such methods require the application of regularization strategies. We present a mathematical framework unifying previously employed regularization methods. Within this framework, we propose a novel technique based on Sobolev gradients which can be implemented via convolutions and does not require specialized numerical treatment, such as total variation regularization. The experiments performed on feature inversion and activation maximization demonstrate the benefit of a unified approach to regularization, such as sharper reconstructions via the proposed Sobolev filters and a better control over reconstructed scales. |
Tasks | |
Published | 2018-04-20 |
URL | http://arxiv.org/abs/1805.00071v1 |
http://arxiv.org/pdf/1805.00071v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-regularization-to-visualize |
Repo | |
Framework | |
Wrapped Loss Function for Regularizing Nonconforming Residual Distributions
Title | Wrapped Loss Function for Regularizing Nonconforming Residual Distributions |
Authors | Chun Ting Liu, Ming Chuan Yang, Meng Chang Chen |
Abstract | Multi-output is essential in machine learning that it might suffer from nonconforming residual distributions, i.e., the multi-output residual distributions are not conforming to the expected distribution. In this paper, we propose “Wrapped Loss Function” to wrap the original loss function to alleviate the problem. This wrapped loss function acts just like the original loss function that its gradient can be used for backpropagation optimization. Empirical evaluations show wrapped loss function has advanced properties of faster convergence, better accuracy, and improving imbalanced data. |
Tasks | |
Published | 2018-08-21 |
URL | https://arxiv.org/abs/1808.06733v2 |
https://arxiv.org/pdf/1808.06733v2.pdf | |
PWC | https://paperswithcode.com/paper/wrapped-loss-function-for-regularizing |
Repo | |
Framework | |
An End-to-End Goal-Oriented Dialog System with a Generative Natural Language Response Generation
Title | An End-to-End Goal-Oriented Dialog System with a Generative Natural Language Response Generation |
Authors | Stefan Constantin, Jan Niehues, Alex Waibel |
Abstract | Recently advancements in deep learning allowed the development of end-to-end trained goal-oriented dialog systems. Although these systems already achieve good performance, some simplifications limit their usage in real-life scenarios. In this work, we address two of these limitations: ignoring positional information and a fixed number of possible response candidates. We propose to use positional encodings in the input to model the word order of the user utterances. Furthermore, by using a feedforward neural network, we are able to generate the output word by word and are no longer restricted to a fixed number of possible response candidates. Using the positional encoding, we were able to achieve better accuracies in the Dialog bAbI Tasks and using the feedforward neural network for generating the response, we were able to save computation time and space consumption. |
Tasks | Goal-Oriented Dialog |
Published | 2018-03-06 |
URL | http://arxiv.org/abs/1803.02279v2 |
http://arxiv.org/pdf/1803.02279v2.pdf | |
PWC | https://paperswithcode.com/paper/an-end-to-end-goal-oriented-dialog-system |
Repo | |
Framework | |
Seeded Graph Matching via Large Neighborhood Statistics
Title | Seeded Graph Matching via Large Neighborhood Statistics |
Authors | Elchanan Mossel, Jiaming Xu |
Abstract | We study a well known noisy model of the graph isomorphism problem. In this model, the goal is to perfectly recover the vertex correspondence between two edge-correlated Erd\H{o}s-R'{e}nyi random graphs, with an initial seed set of correctly matched vertex pairs revealed as side information. For seeded problems, our result provides a significant improvement over previously known results. We show that it is possible to achieve the information-theoretic limit of graph sparsity in time polynomial in the number of vertices $n$. Moreover, we show the number of seeds needed for exact recovery in polynomial-time can be as low as $n^{3\epsilon}$ in the sparse graph regime (with the average degree smaller than $n^{\epsilon}$) and $\Omega(\log n)$ in the dense graph regime. Our results also shed light on the unseeded problem. In particular, we give sub-exponential time algorithms for sparse models and an $n^{O(\log n)}$ algorithm for dense models for some parameters, including some that are not covered by recent results of Barak et al. |
Tasks | Graph Matching |
Published | 2018-07-26 |
URL | http://arxiv.org/abs/1807.10262v1 |
http://arxiv.org/pdf/1807.10262v1.pdf | |
PWC | https://paperswithcode.com/paper/seeded-graph-matching-via-large-neighborhood |
Repo | |
Framework | |
Adaptively Transforming Graph Matching
Title | Adaptively Transforming Graph Matching |
Authors | Fudong Wang, Nan Xue, Yipeng Zhang, Xiang Bai, Gui-Song Xia |
Abstract | Recently, many graph matching methods that incorporate pairwise constraint and that can be formulated as a quadratic assignment problem (QAP) have been proposed. Although these methods demonstrate promising results for the graph matching problem, they have high complexity in space or time. In this paper, we introduce an adaptively transforming graph matching (ATGM) method from the perspective of functional representation. More precisely, under a transformation formulation, we aim to match two graphs by minimizing the discrepancy between the original graph and the transformed graph. With a linear representation map of the transformation, the pairwise edge attributes of graphs are explicitly represented by unary node attributes, which enables us to reduce the space and time complexity significantly. Due to an efficient Frank-Wolfe method-based optimization strategy, we can handle graphs with hundreds and thousands of nodes within an acceptable amount of time. Meanwhile, because transformation map can preserve graph structures, a domain adaptation-based strategy is proposed to remove the outliers. The experimental results demonstrate that our proposed method outperforms the state-of-the-art graph matching algorithms. |
Tasks | Domain Adaptation, Graph Matching |
Published | 2018-07-26 |
URL | http://arxiv.org/abs/1807.10160v1 |
http://arxiv.org/pdf/1807.10160v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptively-transforming-graph-matching |
Repo | |
Framework | |
Distributed Gradient Descent with Coded Partial Gradient Computations
Title | Distributed Gradient Descent with Coded Partial Gradient Computations |
Authors | Emre Ozfatura, Sennur Ulukus, Deniz Gunduz |
Abstract | Coded computation techniques provide robustness against straggling servers in distributed computing, with the following limitations: First, they increase decoding complexity. Second, they ignore computations carried out by straggling servers; and they are typically designed to recover the full gradient, and thus, cannot provide a balance between the accuracy of the gradient and per-iteration completion time. Here we introduce a hybrid approach, called coded partial gradient computation (CPGC), that benefits from the advantages of both coded and uncoded computation schemes, and reduces both the computation time and decoding complexity. |
Tasks | |
Published | 2018-11-22 |
URL | http://arxiv.org/abs/1811.09271v1 |
http://arxiv.org/pdf/1811.09271v1.pdf | |
PWC | https://paperswithcode.com/paper/distributed-gradient-descent-with-coded |
Repo | |
Framework | |
Singular Values for ReLU Layers
Title | Singular Values for ReLU Layers |
Authors | Sören Dittmer, Emily J. King, Peter Maass |
Abstract | Despite their prevalence in neural networks we still lack a thorough theoretical characterization of ReLU layers. This paper aims to further our understanding of ReLU layers by studying how the activation function ReLU interacts with the linear component of the layer and what role this interaction plays in the success of the neural network in achieving its intended task. To this end, we introduce two new tools: ReLU singular values of operators and the Gaussian mean width of operators. By presenting on the one hand theoretical justifications, results, and interpretations of these two concepts and on the other hand numerical experiments and results of the ReLU singular values and the Gaussian mean width being applied to trained neural networks, we hope to give a comprehensive, singular-value-centric view of ReLU layers. We find that ReLU singular values and the Gaussian mean width do not only enable theoretical insights, but also provide one with metrics which seem promising for practical applications. In particular, these measures can be used to distinguish correctly and incorrectly classified data as it traverses the network. We conclude by introducing two tools based on our findings: double-layers and harmonic pruning. |
Tasks | |
Published | 2018-12-06 |
URL | https://arxiv.org/abs/1812.02566v2 |
https://arxiv.org/pdf/1812.02566v2.pdf | |
PWC | https://paperswithcode.com/paper/singular-values-for-relu-layers |
Repo | |
Framework | |