January 26, 2020

3036 words 15 mins read

Paper Group ANR 1549

Paper Group ANR 1549

Structured Pruning of Recurrent Neural Networks through Neuron Selection. How to Prove Your Model Belongs to You: A Blind-Watermark based Framework to Protect Intellectual Property of DNN. Modeling question asking using neural program generation. Learning Dependency Structures for Weak Supervision Models. Deep Two-path Semi-supervised Learning for …

Structured Pruning of Recurrent Neural Networks through Neuron Selection

Title Structured Pruning of Recurrent Neural Networks through Neuron Selection
Authors Liangjian Wen, Xuanyang Zhang, Haoli Bai, Zenglin Xu
Abstract Recurrent neural networks (RNNs) have recently achieved remarkable successes in a number of applications. However, the huge sizes and computational burden of these models make it difficult for their deployment on edge devices. A practically effective approach is to reduce the overall storage and computation costs of RNNs by network pruning techniques. Despite their successful applications, those pruning methods based on Lasso either produce irregular sparse patterns in weight matrices, which is not helpful in practical speedup. To address these issues, we propose structured pruning method through neuron selection which can reduce the sizes of basic structures of RNNs. More specifically, we introduce two sets of binary random variables, which can be interpreted as gates or switches to the input neurons and the hidden neurons, respectively. We demonstrate that the corresponding optimization problem can be addressed by minimizing the L0 norm of the weight matrix. Finally, experimental results on language modeling and machine reading comprehension tasks have indicated the advantages of the proposed method in comparison with state-of-the-art pruning competitors. In particular, nearly 20 x practical speedup during inference was achieved without losing performance for language model on the Penn TreeBank dataset, indicating the promising performance of the proposed method
Tasks Language Modelling, Machine Reading Comprehension, Network Pruning, Reading Comprehension
Published 2019-06-17
URL https://arxiv.org/abs/1906.06847v2
PDF https://arxiv.org/pdf/1906.06847v2.pdf
PWC https://paperswithcode.com/paper/structured-pruning-of-recurrent-neural
Repo
Framework

How to Prove Your Model Belongs to You: A Blind-Watermark based Framework to Protect Intellectual Property of DNN

Title How to Prove Your Model Belongs to You: A Blind-Watermark based Framework to Protect Intellectual Property of DNN
Authors Zheng Li, Chengyu Hu, Yang Zhang, Shanqing Guo
Abstract Deep learning techniques have made tremendous progress in a variety of challenging tasks, such as image recognition and machine translation, during the past decade. Training deep neural networks is computationally expensive and requires both human and intellectual resources. Therefore, it is necessary to protect the intellectual property of the model and externally verify the ownership of the model. However, previous studies either fail to defend against the evasion attack or have not explicitly dealt with fraudulent claims of ownership by adversaries. Furthermore, they can not establish a clear association between the model and the creator’s identity. To fill these gaps, in this paper, we propose a novel intellectual property protection (IPP) framework based on blind-watermark for watermarking deep neural networks that meet the requirements of security and feasibility. Our framework accepts ordinary samples and the exclusive logo as inputs, outputting newly generated samples as watermarks, which are almost indistinguishable from the origin, and infuses these watermarks into DNN models by assigning specific labels, leaving the backdoor as the basis for our copyright claim. We evaluated our IPP framework on two benchmark datasets and 15 popular deep learning models. The results show that our framework successfully verifies the ownership of all the models without a noticeable impact on their primary task. Most importantly, we are the first to successfully design and implement a blind-watermark based framework, which can achieve state-of-art performances on undetectability against evasion attack and unforgeability against fraudulent claims of ownership. Further, our framework shows remarkable robustness and establishes a clear association between the model and the author’s identity.
Tasks Machine Translation
Published 2019-03-05
URL https://arxiv.org/abs/1903.01743v4
PDF https://arxiv.org/pdf/1903.01743v4.pdf
PWC https://paperswithcode.com/paper/deepstego-protecting-intellectual-property-of
Repo
Framework

Modeling question asking using neural program generation

Title Modeling question asking using neural program generation
Authors Ziyun Wang, Brenden M. Lake
Abstract People ask questions that are far richer, more informative, and more creative than current AI systems. We propose a neural program generation framework for modeling human question asking, which represents questions as formal programs and generates programs with an encoder-decoder based deep neural network. From extensive experiments using an information-search game, we show that our method can ask optimal questions in synthetic settings, and predict which questions humans are likely to ask in unconstrained settings. We also propose a novel grammar-based question generation framework trained with reinforcement learning, which is able to generate creative questions without supervised data.
Tasks Question Generation
Published 2019-07-23
URL https://arxiv.org/abs/1907.09899v2
PDF https://arxiv.org/pdf/1907.09899v2.pdf
PWC https://paperswithcode.com/paper/modeling-question-asking-using-neural-program
Repo
Framework

Learning Dependency Structures for Weak Supervision Models

Title Learning Dependency Structures for Weak Supervision Models
Authors Paroma Varma, Frederic Sala, Ann He, Alexander Ratner, Christopher Ré
Abstract Labeling training data is a key bottleneck in the modern machine learning pipeline. Recent weak supervision approaches combine labels from multiple noisy sources by estimating their accuracies without access to ground truth labels; however, estimating the dependencies among these sources is a critical challenge. We focus on a robust PCA-based algorithm for learning these dependency structures, establish improved theoretical recovery rates, and outperform existing methods on various real-world tasks. Under certain conditions, we show that the amount of unlabeled data needed can scale sublinearly or even logarithmically with the number of sources $m$, improving over previous efforts that ignore the sparsity pattern in the dependency structure and scale linearly in $m$. We provide an information-theoretic lower bound on the minimum sample complexity of the weak supervision setting. Our method outperforms weak supervision approaches that assume conditionally-independent sources by up to 4.64 F1 points and previous structure learning approaches by up to 4.41 F1 points on real-world relation extraction and image classification tasks.
Tasks Image Classification, Relation Extraction
Published 2019-03-14
URL http://arxiv.org/abs/1903.05844v1
PDF http://arxiv.org/pdf/1903.05844v1.pdf
PWC https://paperswithcode.com/paper/learning-dependency-structures-for-weak
Repo
Framework

Deep Two-path Semi-supervised Learning for Fake News Detection

Title Deep Two-path Semi-supervised Learning for Fake News Detection
Authors Xishuang Dong, Uboho Victor, Shanta Chowdhury, Lijun Qian
Abstract News in social media such as Twitter has been generated in high volume and speed. However, very few of them can be labeled (as fake or true news) in a short time. In order to achieve timely detection of fake news in social media, a novel deep two-path semi-supervised learning model is proposed, where one path is for supervised learning and the other is for unsupervised learning. These two paths implemented with convolutional neural networks are jointly optimized to enhance detection performance. In addition, we build a shared convolutional neural networks between these two paths to share the low level features. Experimental results using Twitter datasets show that the proposed model can recognize fake news effectively with very few labeled data.
Tasks Fake News Detection
Published 2019-06-10
URL https://arxiv.org/abs/1906.05659v1
PDF https://arxiv.org/pdf/1906.05659v1.pdf
PWC https://paperswithcode.com/paper/deep-two-path-semi-supervised-learning-for
Repo
Framework

RIMAX: Ranking Semantic Rhymes by calculating Definition Similarity

Title RIMAX: Ranking Semantic Rhymes by calculating Definition Similarity
Authors Alfonso Medina-Urrea, Juan-Manuel Torres-Moreno
Abstract This paper presents RIMAX, a new system for detecting semantic rhymes, using a Comprehensive Mexican Spanish Dictionary (DEM) and its Rhyming Dictionary (REM). We use the Vector Space Model to calculate the similarity of the definition of a query with the definitions corresponding to the assonant and consonant rhymes of the query. The preliminary results using a manual evaluation are very encouraging.
Tasks
Published 2019-12-19
URL https://arxiv.org/abs/1912.09558v2
PDF https://arxiv.org/pdf/1912.09558v2.pdf
PWC https://paperswithcode.com/paper/rimax-ranking-semantic-rhymes-by-calculating
Repo
Framework

High-Fidelity Vector Space Models of Structured Data

Title High-Fidelity Vector Space Models of Structured Data
Authors Maxwell Crouse, Achille Fokoue, Maria Chang, Pavan Kapanipathi, Ryan Musa, Constantine Nakos, Lingfei Wu, Kenneth Forbus, Michael Witbrock
Abstract Machine learning systems regularly deal with structured data in real-world applications. Unfortunately, such data has been difficult to faithfully represent in a way that most machine learning techniques would expect, i.e. as a real-valued vector of a fixed, pre-specified size. In this work, we introduce a novel approach that compiles structured data into a satisfiability problem which has in its set of solutions at least (and often only) the input data. The satisfiability problem is constructed from constraints which are generated automatically a priori from a given signature, thus trivially allowing for a bag-of-words-esque vector representation of the input to be constructed. The method is demonstrated in two areas, automated reasoning and natural language processing, where it is shown to produce vector representations of natural-language sentences and first-order logic clauses that can be precisely translated back to their original, structured input forms.
Tasks
Published 2019-01-09
URL http://arxiv.org/abs/1901.02565v2
PDF http://arxiv.org/pdf/1901.02565v2.pdf
PWC https://paperswithcode.com/paper/high-fidelity-vector-space-models-of
Repo
Framework

Ising Models with Latent Conditional Gaussian Variables

Title Ising Models with Latent Conditional Gaussian Variables
Authors Frank Nussbaum, Joachim Giesen
Abstract Ising models describe the joint probability distribution of a vector of binary feature variables. Typically, not all the variables interact with each other and one is interested in learning the presumably sparse network structure of the interacting variables. However, in the presence of latent variables, the conventional method of learning a sparse model might fail. This is because the latent variables induce indirect interactions of the observed variables. In the case of only a few latent conditional Gaussian variables these spurious interactions contribute an additional low-rank component to the interaction parameters of the observed Ising model. Therefore, we propose to learn a sparse + low-rank decomposition of the parameters of an Ising model using a convex regularized likelihood problem. We show that the same problem can be obtained as the dual of a maximum-entropy problem with a new type of relaxation, where the sample means collectively need to match the expected values only up to a given tolerance. The solution to the convex optimization problem has consistency properties in the high-dimensional setting, where the number of observed binary variables and the number of latent conditional Gaussian variables are allowed to grow with the number of training samples.
Tasks
Published 2019-01-28
URL https://arxiv.org/abs/1901.09712v2
PDF https://arxiv.org/pdf/1901.09712v2.pdf
PWC https://paperswithcode.com/paper/ising-models-with-latent-conditional-gaussian
Repo
Framework

Shared-Private Bilingual Word Embeddings for Neural Machine Translation

Title Shared-Private Bilingual Word Embeddings for Neural Machine Translation
Authors Xuebo Liu, Derek F. Wong, Yang Liu, Lidia S. Chao, Tong Xiao, Jingbo Zhu
Abstract Word embedding is central to neural machine translation (NMT), which has attracted intensive research interest in recent years. In NMT, the source embedding plays the role of the entrance while the target embedding acts as the terminal. These layers occupy most of the model parameters for representation learning. Furthermore, they indirectly interface via a soft-attention mechanism, which makes them comparatively isolated. In this paper, we propose shared-private bilingual word embeddings, which give a closer relationship between the source and target embeddings, and which also reduce the number of model parameters. For similar source and target words, their embeddings tend to share a part of the features and they cooperatively learn these common representation units. Experiments on 5 language pairs belonging to 6 different language families and written in 5 different alphabets demonstrate that the proposed model provides a significant performance boost over the strong baselines with dramatically fewer model parameters.
Tasks Machine Translation, Representation Learning, Word Embeddings
Published 2019-06-07
URL https://arxiv.org/abs/1906.03100v1
PDF https://arxiv.org/pdf/1906.03100v1.pdf
PWC https://paperswithcode.com/paper/shared-private-bilingual-word-embeddings-for
Repo
Framework

Attention-based method for categorizing different types of online harassment language

Title Attention-based method for categorizing different types of online harassment language
Authors Christos Karatsalos, Yannis Panagiotakis
Abstract In the era of social media and networking platforms, Twitter has been doomed for abuse and harassment toward users specifically women. Monitoring the contents including sexism and sexual harassment in traditional media is easier than monitoring on the online social media platforms like Twitter, because of the large amount of user generated content in these media. So, the research about the automated detection of content containing sexual or racist harassment is an important issue and could be the basis for removing that content or flagging it for human evaluation. Previous studies have been focused on collecting data about sexism and racism in very broad terms. However, there is no much study focusing on different types of online harassment attracting natural language processing techniques. In this work, we present an multi-attention based approach for the detection of different types of harassment in tweets. Our approach is based on the Recurrent Neural Networks and particularly we are using a deep, classification specific multi-attention mechanism. Moreover, we tackle the problem of imbalanced data, using a back-translation method. Finally, we present a comparison between different approaches based on the Recurrent Neural Networks.
Tasks
Published 2019-09-28
URL https://arxiv.org/abs/1909.13104v2
PDF https://arxiv.org/pdf/1909.13104v2.pdf
PWC https://paperswithcode.com/paper/attention-based-method-for-categorizing
Repo
Framework

An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise

Title An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise
Authors Yeming Wen, Kevin Luk, Maxime Gazeau, Guodong Zhang, Harris Chan, Jimmy Ba
Abstract The choice of batch-size in a stochastic optimization algorithm plays a substantial role for both optimization and generalization. Increasing the batch-size used typically improves optimization but degrades generalization. To address the problem of improving generalization while maintaining optimal convergence in large-batch training, we propose to add covariance noise to the gradients. We demonstrate that the learning performance of our method is more accurately captured by the structure of the covariance matrix of the noise rather than by the variance of gradients. Moreover, over the convex-quadratic, we prove in theory that it can be characterized by the Frobenius norm of the noise matrix. Our empirical studies with standard deep learning model-architectures and datasets shows that our method not only improves generalization performance in large-batch training, but furthermore, does so in a way where the optimization performance remains desirable and the training duration is not elongated.
Tasks Stochastic Optimization
Published 2019-02-21
URL https://arxiv.org/abs/1902.08234v4
PDF https://arxiv.org/pdf/1902.08234v4.pdf
PWC https://paperswithcode.com/paper/interplay-between-optimization-and
Repo
Framework

MimosaNet: An Unrobust Neural Network Preventing Model Stealing

Title MimosaNet: An Unrobust Neural Network Preventing Model Stealing
Authors Kálmán Szentannai, Jalal Al-Afandi, András Horváth
Abstract Deep Neural Networks are robust to minor perturbations of the learned network parameters and their minor modifications do not change the overall network response significantly. This allows space for model stealing, where a malevolent attacker can steal an already trained network, modify the weights and claim the new network his own intellectual property. In certain cases this can prevent the free distribution and application of networks in the embedded domain. In this paper, we propose a method for creating an equivalent version of an already trained fully connected deep neural network that can prevent network stealing: namely, it produces the same responses and classification accuracy, but it is extremely sensitive to weight changes.
Tasks
Published 2019-07-02
URL https://arxiv.org/abs/1907.01650v1
PDF https://arxiv.org/pdf/1907.01650v1.pdf
PWC https://paperswithcode.com/paper/mimosanet-an-unrobust-neural-network
Repo
Framework

Clustering Gaussian Graphical Models

Title Clustering Gaussian Graphical Models
Authors Keith Dillon
Abstract We derive an efficient method to perform clustering of nodes in Gaussian graphical models directly from sample data. Nodes are clustered based on the similarity of their network neighborhoods, with edge weights defined by partial correlations. In the limited-data scenario, where the covariance matrix would be rank-deficient, we are able to make use of matrix factors, and never need to estimate the actual covariance or precision matrix. We demonstrate the method on functional MRI data from the Human Connectome Project. A matlab implementation of the algorithm is provided.
Tasks
Published 2019-10-05
URL https://arxiv.org/abs/1910.02342v1
PDF https://arxiv.org/pdf/1910.02342v1.pdf
PWC https://paperswithcode.com/paper/clustering-gaussian-graphical-models
Repo
Framework

Modeling and Prediction of Iran’s Steel Consumption Based on Economic Activity Using Support Vector Machines

Title Modeling and Prediction of Iran’s Steel Consumption Based on Economic Activity Using Support Vector Machines
Authors Hossein Kamalzadeh, Saeid Nassim Sobhan, Azam Boskabadi, Mohsen Hatami, Amin Gharehyakheh
Abstract The steel industry has great impacts on the economy and the environment of both developed and underdeveloped countries. The importance of this industry and these impacts have led many researchers to investigate the relationship between a country’s steel consumption and its economic activity resulting in the so-called intensity of use model. This paper investigates the validity of the intensity of use model for the case of Iran’s steel consumption and extends this hypothesis by using the indexes of economic activity to model the steel consumption. We use the proposed model to train support vector machines and predict the future values for Iran’s steel consumption. The paper provides detailed correlation tests for the factors used in the model to check for their relationships with the steel consumption. The results indicate that Iran’s steel consumption is strongly correlated with its economic activity following the same pattern as the economy has been in the last four decades.
Tasks
Published 2019-12-05
URL https://arxiv.org/abs/1912.02373v1
PDF https://arxiv.org/pdf/1912.02373v1.pdf
PWC https://paperswithcode.com/paper/modeling-and-prediction-of-irans-steel
Repo
Framework

A Hybrid GA-PSO Method for Evolving Architecture and Short Connections of Deep Convolutional Neural Networks

Title A Hybrid GA-PSO Method for Evolving Architecture and Short Connections of Deep Convolutional Neural Networks
Authors Bin Wang, Yanan Sun, Bing Xue, Mengjie Zhang
Abstract Image classification is a difficult machine learning task, where Convolutional Neural Networks (CNNs) have been applied for over 20 years in order to solve the problem. In recent years, instead of the traditional way of only connecting the current layer with its next layer, shortcut connections have been proposed to connect the current layer with its forward layers apart from its next layer, which has been proved to be able to facilitate the training process of deep CNNs. However, there are various ways to build the shortcut connections, it is hard to manually design the best shortcut connections when solving a particular problem, especially given the design of the network architecture is already very challenging. In this paper, a hybrid evolutionary computation (EC) method is proposed to \textit{automatically} evolve both the architecture of deep CNNs and the shortcut connections. Three major contributions of this work are: Firstly, a new encoding strategy is proposed to encode a CNN, where the architecture and the shortcut connections are encoded separately; Secondly, a hybrid two-level EC method, which combines particle swarm optimisation and genetic algorithms, is developed to search for the optimal CNNs; Lastly, an adjustable learning rate is introduced for the fitness evaluations, which provides a better learning rate for the training process given a fixed number of epochs. The proposed algorithm is evaluated on three widely used benchmark datasets of image classification and compared with 12 peer Non-EC based competitors and one EC based competitor. The experimental results demonstrate that the proposed method outperforms all of the peer competitors in terms of classification accuracy.
Tasks Image Classification
Published 2019-03-10
URL http://arxiv.org/abs/1903.03893v1
PDF http://arxiv.org/pdf/1903.03893v1.pdf
PWC https://paperswithcode.com/paper/a-hybrid-ga-pso-method-for-evolving
Repo
Framework
comments powered by Disqus