Paper Group ANR 555
Power Law in Sparsified Deep Neural Networks. Weight Initialization in Neural Language Models. Analytic Network Learning. Scale calibration for high-dimensional robust regression. Synaptic Strength For Convolutional Neural Network. Learning Decorrelated Hashing Codes for Multimodal Retrieval. A Unified Particle-Optimization Framework for Scalable B …
Power Law in Sparsified Deep Neural Networks
Title | Power Law in Sparsified Deep Neural Networks |
Authors | Lu Hou, James T. Kwok |
Abstract | The power law has been observed in the degree distributions of many biological neural networks. Sparse deep neural networks, which learn an economical representation from the data, resemble biological neural networks in many ways. In this paper, we study if these artificial networks also exhibit properties of the power law. Experimental results on two popular deep learning models, namely, multilayer perceptrons and convolutional neural networks, are affirmative. The power law is also naturally related to preferential attachment. To study the dynamical properties of deep networks in continual learning, we propose an internal preferential attachment model to explain how the network topology evolves. Experimental results show that with the arrival of a new task, the new connections made follow this preferential attachment process. |
Tasks | Continual Learning |
Published | 2018-05-04 |
URL | http://arxiv.org/abs/1805.01891v1 |
http://arxiv.org/pdf/1805.01891v1.pdf | |
PWC | https://paperswithcode.com/paper/power-law-in-sparsified-deep-neural-networks |
Repo | |
Framework | |
Weight Initialization in Neural Language Models
Title | Weight Initialization in Neural Language Models |
Authors | Ameet Deshpande, Vedant Somani |
Abstract | Semantic Similarity is an important application which finds its use in many downstream NLP applications. Though the task is mathematically defined, semantic similarity’s essence is to capture the notions of similarity impregnated in humans. Machines use some heuristics to calculate the similarity between words, but these are typically corpus dependent or are useful for specific domains. The difference between Semantic Similarity and Semantic Relatedness motivates the development of new algorithms. For a human, the word car and road are probably as related as car and bus. But this may not be the case for computational methods. Ontological methods are good at encoding Semantic Similarity and Vector Space models are better at encoding Semantic Relatedness. There is a dearth of methods which leverage ontologies to create better vector representations. The aim of this proposal is to explore in the direction of a hybrid method which combines statistical/vector space methods like Word2Vec and Ontological methods like WordNet to leverage the advantages provided by both. |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2018-05-12 |
URL | http://arxiv.org/abs/1805.06503v1 |
http://arxiv.org/pdf/1805.06503v1.pdf | |
PWC | https://paperswithcode.com/paper/weight-initialization-in-neural-language |
Repo | |
Framework | |
Analytic Network Learning
Title | Analytic Network Learning |
Authors | Kar-Ann Toh |
Abstract | Based on the property that solving the system of linear matrix equations via the column space and the row space projections boils down to an approximation in the least squares error sense, a formulation for learning the weight matrices of the multilayer network can be derived. By exploiting into the vast number of feasible solutions of these interdependent weight matrices, the learning can be performed analytically layer by layer without needing of gradient computation after an initialization. Possible initialization schemes include utilizing the data matrix as initial weights and random initialization. The study is followed by an investigation into the representation capability and the output variance of the learning scheme. An extensive experimentation on synthetic and real-world data sets validates its numerical feasibility. |
Tasks | |
Published | 2018-11-20 |
URL | http://arxiv.org/abs/1811.08227v1 |
http://arxiv.org/pdf/1811.08227v1.pdf | |
PWC | https://paperswithcode.com/paper/analytic-network-learning |
Repo | |
Framework | |
Scale calibration for high-dimensional robust regression
Title | Scale calibration for high-dimensional robust regression |
Authors | Po-Ling Loh |
Abstract | We present a new method for high-dimensional linear regression when a scale parameter of the additive errors is unknown. The proposed estimator is based on a penalized Huber $M$-estimator, for which theoretical results on estimation error have recently been proposed in high-dimensional statistics literature. However, the variance of the error term in the linear model is intricately connected to the optimal parameter used to define the shape of the Huber loss. Our main idea is to use an adaptive technique, based on Lepski’s method, to overcome the difficulties in solving a joint nonconvex optimization problem with respect to the location and scale parameters. |
Tasks | Calibration |
Published | 2018-11-06 |
URL | http://arxiv.org/abs/1811.02096v1 |
http://arxiv.org/pdf/1811.02096v1.pdf | |
PWC | https://paperswithcode.com/paper/scale-calibration-for-high-dimensional-robust |
Repo | |
Framework | |
Synaptic Strength For Convolutional Neural Network
Title | Synaptic Strength For Convolutional Neural Network |
Authors | Chen Lin, Zhao Zhong, Wei Wu, Junjie Yan |
Abstract | Convolutional Neural Networks(CNNs) are both computation and memory intensive which hindered their deployment in mobile devices. Inspired by the relevant concept in neural science literature, we propose Synaptic Pruning: a data-driven method to prune connections between input and output feature maps with a newly proposed class of parameters called Synaptic Strength. Synaptic Strength is designed to capture the importance of a connection based on the amount of information it transports. Experiment results show the effectiveness of our approach. On CIFAR-10, we prune connections for various CNN models with up to 96% , which results in significant size reduction and computation saving. Further evaluation on ImageNet demonstrates that synaptic pruning is able to discover efficient models which is competitive to state-of-the-art compact CNNs such as MobileNet-V2 and NasNet-Mobile. Our contribution is summarized as following: (1) We introduce Synaptic Strength, a new class of parameters for CNNs to indicate the importance of each connections. (2) Our approach can prune various CNNs with high compression without compromising accuracy. (3) Further investigation shows, the proposed Synaptic Strength is a better indicator for kernel pruning compared with the previous approach in both empirical result and theoretical analysis. |
Tasks | |
Published | 2018-11-06 |
URL | http://arxiv.org/abs/1811.02454v1 |
http://arxiv.org/pdf/1811.02454v1.pdf | |
PWC | https://paperswithcode.com/paper/synaptic-strength-for-convolutional-neural |
Repo | |
Framework | |
Learning Decorrelated Hashing Codes for Multimodal Retrieval
Title | Learning Decorrelated Hashing Codes for Multimodal Retrieval |
Authors | Dayong Tian |
Abstract | In social networks, heterogeneous multimedia data correlate to each other, such as videos and their corresponding tags in YouTube and image-text pairs in Facebook. Nearest neighbor retrieval across multiple modalities on large data sets becomes a hot yet challenging problem. Hashing is expected to be an efficient solution, since it represents data as binary codes. As the bit-wise XOR operations can be fast handled, the retrieval time is greatly reduced. Few existing multimodal hashing methods consider the correlation among hashing bits. The correlation has negative impact on hashing codes. When the hashing code length becomes longer, the retrieval performance improvement becomes slower. In this paper, we propose a minimum correlation regularization (MCR) for multimodal hashing. First, the sigmoid function is used to embed the data matrices. Then, the MCR is applied on the output of sigmoid function. As the output of sigmoid function approximates a binary code matrix, the proposed MCR can efficiently decorrelate the hashing codes. Experiments show the superiority of the proposed method becomes greater as the code length increases. |
Tasks | |
Published | 2018-03-02 |
URL | https://arxiv.org/abs/1803.00682v2 |
https://arxiv.org/pdf/1803.00682v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-decorrelated-hashing-codes-for |
Repo | |
Framework | |
A Unified Particle-Optimization Framework for Scalable Bayesian Sampling
Title | A Unified Particle-Optimization Framework for Scalable Bayesian Sampling |
Authors | Changyou Chen, Ruiyi Zhang, Wenlin Wang, Bai Li, Liqun Chen |
Abstract | There has been recent interest in developing scalable Bayesian sampling methods such as stochastic gradient MCMC (SG-MCMC) and Stein variational gradient descent (SVGD) for big-data analysis. A standard SG-MCMC algorithm simulates samples from a discrete-time Markov chain to approximate a target distribution, thus samples could be highly correlated, an undesired property for SG-MCMC. In contrary, SVGD directly optimizes a set of particles to approximate a target distribution, and thus is able to obtain good approximations with relatively much fewer samples. In this paper, we propose a principle particle-optimization framework based on Wasserstein gradient flows to unify SG-MCMC and SVGD, and to allow new algorithms to be developed. Our framework interprets SG-MCMC as particle optimization on the space of probability measures, revealing a strong connection between SG-MCMC and SVGD. The key component of our framework is several particle-approximate techniques to efficiently solve the original partial differential equations on the space of probability measures. Extensive experiments on both synthetic data and deep neural networks demonstrate the effectiveness and efficiency of our framework for scalable Bayesian sampling. |
Tasks | |
Published | 2018-05-29 |
URL | http://arxiv.org/abs/1805.11659v2 |
http://arxiv.org/pdf/1805.11659v2.pdf | |
PWC | https://paperswithcode.com/paper/a-unified-particle-optimization-framework-for |
Repo | |
Framework | |
Trust-Region Algorithms for Training Responses: Machine Learning Methods Using Indefinite Hessian Approximations
Title | Trust-Region Algorithms for Training Responses: Machine Learning Methods Using Indefinite Hessian Approximations |
Authors | Jennifer B. Erway, Joshua Griffin, Roummel F. Marcia, Riadh Omheni |
Abstract | Machine learning (ML) problems are often posed as highly nonlinear and nonconvex unconstrained optimization problems. Methods for solving ML problems based on stochastic gradient descent are easily scaled for very large problems but may involve fine-tuning many hyper-parameters. Quasi-Newton approaches based on the limited-memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) update typically do not require manually tuning hyper-parameters but suffer from approximating a potentially indefinite Hessian with a positive-definite matrix. Hessian-free methods leverage the ability to perform Hessian-vector multiplication without needing the entire Hessian matrix, but each iteration’s complexity is significantly greater than quasi-Newton methods. In this paper we propose an alternative approach for solving ML problems based on a quasi-Newton trust-region framework for solving large-scale optimization problems that allow for indefinite Hessian approximations. Numerical experiments on a standard testing data set show that with a fixed computational time budget, the proposed methods achieve better results than the traditional limited-memory BFGS and the Hessian-free methods. |
Tasks | |
Published | 2018-07-01 |
URL | https://arxiv.org/abs/1807.00251v3 |
https://arxiv.org/pdf/1807.00251v3.pdf | |
PWC | https://paperswithcode.com/paper/trust-region-algorithms-for-training |
Repo | |
Framework | |
A Summary Description of the A2RD Project
Title | A Summary Description of the A2RD Project |
Authors | Juliao Braga, Joao Nuno Silva, Patricia Takako Endo, Nizam Omar |
Abstract | This paper describes the Autonomous Architecture Over Restricted Domains project. It begins with the description of the context upon which the project is focused, and in the sequence describes the project and implementation models. It finish by presenting the environment conceptual model, showing where stand the components, inputs and facilities required to interact among the intelligent agents of the various implementations in their respective and restricted, routing domains (Autonomous Systems) which together make the Internet work. |
Tasks | |
Published | 2018-08-26 |
URL | http://arxiv.org/abs/1808.09293v3 |
http://arxiv.org/pdf/1808.09293v3.pdf | |
PWC | https://paperswithcode.com/paper/a-summary-description-of-the-a2rd-project |
Repo | |
Framework | |
Muscle Excitation Estimation in Biomechanical Simulation Using NAF Reinforcement Learning
Title | Muscle Excitation Estimation in Biomechanical Simulation Using NAF Reinforcement Learning |
Authors | Amir H. Abdi, Pramit Saha, Praneeth Srungarapu, Sidney Fels |
Abstract | Motor control is a set of time-varying muscle excitations which generate desired motions for a biomechanical system. Muscle excitations cannot be directly measured from live subjects. An alternative approach is to estimate muscle activations using inverse motion-driven simulation. In this article, we propose a deep reinforcement learning method to estimate the muscle excitations in simulated biomechanical systems. Here, we introduce a custom-made reward function which incentivizes faster point-to-point tracking of target motion. Moreover, we deploy two new techniques, namely, episode-based hard update and dual buffer experience replay, to avoid feedback training loops. The proposed method is tested in four simulated 2D and 3D environments with 6 to 24 axial muscles. The results show that the models were able to learn muscle excitations for given motions after nearly 100,000 simulated steps. Moreover, the root mean square error in point-to-point reaching of the target across experiments was less than 1% of the length of the domain of motion. Our reinforcement learning method is far from the conventional dynamic approaches as the muscle control is derived functionally by a set of distributed neurons. This can open paths for neural activity interpretation of this phenomenon. |
Tasks | |
Published | 2018-09-17 |
URL | https://arxiv.org/abs/1809.06121v2 |
https://arxiv.org/pdf/1809.06121v2.pdf | |
PWC | https://paperswithcode.com/paper/muscle-excitation-estimation-in-biomechanical |
Repo | |
Framework | |
IL-Net: Using Expert Knowledge to Guide the Design of Furcated Neural Networks
Title | IL-Net: Using Expert Knowledge to Guide the Design of Furcated Neural Networks |
Authors | Khushmeen Sakloth, Wesley Beckner, Jim Pfaendtner, Garrett B. Goh |
Abstract | Deep neural networks (DNN) excel at extracting patterns. Through representation learning and automated feature engineering on large datasets, such models have been highly successful in computer vision and natural language applications. Designing optimal network architectures from a principled or rational approach however has been less than successful, with the best successful approaches utilizing an additional machine learning algorithm to tune the network hyperparameters. However, in many technical fields, there exist established domain knowledge and understanding about the subject matter. In this work, we develop a novel furcated neural network architecture that utilizes domain knowledge as high-level design principles of the network. We demonstrate proof-of-concept by developing IL-Net, a furcated network for predicting the properties of ionic liquids, which is a class of complex multi-chemicals entities. Compared to existing state-of-the-art approaches, we show that furcated networks can improve model accuracy by approximately 20-35%, without using additional labeled data. Lastly, we distill two key design principles for furcated networks that can be adapted to other domains. |
Tasks | Automated Feature Engineering, Feature Engineering, Representation Learning |
Published | 2018-09-13 |
URL | http://arxiv.org/abs/1809.05127v1 |
http://arxiv.org/pdf/1809.05127v1.pdf | |
PWC | https://paperswithcode.com/paper/il-net-using-expert-knowledge-to-guide-the |
Repo | |
Framework | |
Attention-Aware Compositional Network for Person Re-identification
Title | Attention-Aware Compositional Network for Person Re-identification |
Authors | Jing Xu, Rui Zhao, Feng Zhu, Huaming Wang, Wanli Ouyang |
Abstract | Person re-identification (ReID) is to identify pedestrians observed from different camera views based on visual appearance. It is a challenging task due to large pose variations, complex background clutters and severe occlusions. Recently, human pose estimation by predicting joint locations was largely improved in accuracy. It is reasonable to use pose estimation results for handling pose variations and background clutters, and such attempts have obtained great improvement in ReID performance. However, we argue that the pose information was not well utilized and hasn’t yet been fully exploited for person ReID. In this work, we introduce a novel framework called Attention-Aware Compositional Network (AACN) for person ReID. AACN consists of two main components: Pose-guided Part Attention (PPA) and Attention-aware Feature Composition (AFC). PPA is learned and applied to mask out undesirable background features in pedestrian feature maps. Furthermore, pose-guided visibility scores are estimated for body parts to deal with part occlusion in the proposed AFC module. Extensive experiments with ablation analysis show the effectiveness of our method, and state-of-the-art results are achieved on several public datasets, including Market-1501, CUHK03, CUHK01, SenseReID, CUHK03-NP and DukeMTMC-reID. |
Tasks | Person Re-Identification, Pose Estimation |
Published | 2018-05-09 |
URL | http://arxiv.org/abs/1805.03344v2 |
http://arxiv.org/pdf/1805.03344v2.pdf | |
PWC | https://paperswithcode.com/paper/attention-aware-compositional-network-for |
Repo | |
Framework | |
Abstractly Interpreting Argumentation Frameworks for Sharpening Extensions
Title | Abstractly Interpreting Argumentation Frameworks for Sharpening Extensions |
Authors | Ryuta Arisaka, Jeremie Dauphin |
Abstract | Cycles of attacking arguments pose non-trivial issues in Dung style argumentation theory, apparent behavioural difference between odd and even length cycles being a notable one. While a few methods were proposed for treating them, to - in particular - enable selection of acceptable arguments in an odd-length cycle when Dung semantics could select none, so far the issues have been observed from a purely argument-graph-theoretic perspective. Per contra, we consider argument graphs together with a certain lattice like semantic structure over arguments e.g. ontology. As we show, the semantic-argumentgraphic hybrid theory allows us to apply abstract interpretation, a widely known methodology in static program analysis, to formal argumentation. With this, even where no arguments in a cycle could be selected sensibly, we could say more about arguments acceptability of an argument framework that contains it. In a certain sense, we can verify Dung extensions with respect to a semantic structure in this hybrid theory, to consolidate our confidence in their suitability. By defining the theory, and by making comparisons to existing approaches, we ultimately discover that whether Dung semantics, or an alternative semantics such as cf2, is adequate or problematic depends not just on an argument graph but also on the semantic relation among the arguments in the graph. |
Tasks | |
Published | 2018-02-05 |
URL | http://arxiv.org/abs/1802.01526v1 |
http://arxiv.org/pdf/1802.01526v1.pdf | |
PWC | https://paperswithcode.com/paper/abstractly-interpreting-argumentation |
Repo | |
Framework | |
Co-manifold learning with missing data
Title | Co-manifold learning with missing data |
Authors | Gal Mishne, Eric C. Chi, Ronald R. Coifman |
Abstract | Representation learning is typically applied to only one mode of a data matrix, either its rows or columns. Yet in many applications, there is an underlying geometry to both the rows and the columns. We propose utilizing this coupled structure to perform co-manifold learning: uncovering the underlying geometry of both the rows and the columns of a given matrix, where we focus on a missing data setting. Our unsupervised approach consists of three components. We first solve a family of optimization problems to estimate a complete matrix at multiple scales of smoothness. We then use this collection of smooth matrix estimates to compute pairwise distances on the rows and columns based on a new multi-scale metric that implicitly introduces a coupling between the rows and the columns. Finally, we construct row and column representations from these multi-scale metrics. We demonstrate that our approach outperforms competing methods in both data visualization and clustering. |
Tasks | Representation Learning |
Published | 2018-10-16 |
URL | http://arxiv.org/abs/1810.06803v1 |
http://arxiv.org/pdf/1810.06803v1.pdf | |
PWC | https://paperswithcode.com/paper/co-manifold-learning-with-missing-data |
Repo | |
Framework | |
Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks
Title | Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks |
Authors | Daphna Weinshall, Gad Cohen, Dan Amir |
Abstract | We provide theoretical investigation of curriculum learning in the context of stochastic gradient descent when optimizing the convex linear regression loss. We prove that the rate of convergence of an ideal curriculum learning method is monotonically increasing with the difficulty of the examples. Moreover, among all equally difficult points, convergence is faster when using points which incur higher loss with respect to the current hypothesis. We then analyze curriculum learning in the context of training a CNN. We describe a method which infers the curriculum by way of transfer learning from another network, pre-trained on a different task. While this approach can only approximate the ideal curriculum, we observe empirically similar behavior to the one predicted by the theory, namely, a significant boost in convergence speed at the beginning of training. When the task is made more difficult, improvement in generalization performance is also observed. Finally, curriculum learning exhibits robustness against unfavorable conditions such as excessive regularization. |
Tasks | Transfer Learning |
Published | 2018-02-11 |
URL | http://arxiv.org/abs/1802.03796v4 |
http://arxiv.org/pdf/1802.03796v4.pdf | |
PWC | https://paperswithcode.com/paper/curriculum-learning-by-transfer-learning |
Repo | |
Framework | |