January 31, 2020

3020 words 15 mins read

Paper Group ANR 122

MgNet: A Unified Framework of Multigrid and Convolutional Neural Network. Theoretical guarantees for sampling and inference in generative models with latent diffusions. Scheduled Sampling for Transformers. Hard-Mining Loss based Convolutional Neural Network for Face Recognition. A Performance Comparison of Data Mining Algorithms Based Intrusion Det …

MgNet: A Unified Framework of Multigrid and Convolutional Neural Network


Title	MgNet: A Unified Framework of Multigrid and Convolutional Neural Network
Authors	Juncai He, Jinchao Xu
Abstract	We develop a unified model, known as MgNet, that simultaneously recovers some convolutional neural networks (CNN) for image classification and multigrid (MG) methods for solving discretized partial differential equations (PDEs). This model is based on close connections that we have observed and uncovered between the CNN and MG methodologies. For example, pooling operation and feature extraction in CNN correspond directly to restriction operation and iterative smoothers in MG, respectively. As the solution space is often the dual of the data space in PDEs, the analogous concept of feature space and data space (which are dual to each other) is introduced in CNN. With such connections and new concept in the unified model, the function of various convolution operations and pooling used in CNN can be better understood. As a result, modified CNN models (with fewer weights and hyper parameters) are developed that exhibit competitive and sometimes better performance in comparison with existing CNN models when applied to both CIFAR-10 and CIFAR-100 data sets.
Tasks	Image Classification
Published	2019-01-29
URL	http://arxiv.org/abs/1901.10415v2
PDF	http://arxiv.org/pdf/1901.10415v2.pdf
PWC	https://paperswithcode.com/paper/mgnet-a-unified-framework-of-multigrid-and
Repo
Framework

Theoretical guarantees for sampling and inference in generative models with latent diffusions


Title	Theoretical guarantees for sampling and inference in generative models with latent diffusions
Authors	Belinda Tzen, Maxim Raginsky
Abstract	We introduce and study a class of probabilistic generative models, where the latent object is a finite-dimensional diffusion process on a finite time interval and the observed variable is drawn conditionally on the terminal point of the diffusion. We make the following contributions: We provide a unified viewpoint on both sampling and variational inference in such generative models through the lens of stochastic control. We quantify the expressiveness of diffusion-based generative models. Specifically, we show that one can efficiently sample from a wide class of terminal target distributions by choosing the drift of the latent diffusion from the class of multilayer feedforward neural nets, with the accuracy of sampling measured by the Kullback-Leibler divergence to the target distribution. Finally, we present and analyze a scheme for unbiased simulation of generative models with latent diffusions and provide bounds on the variance of the resulting estimators. This scheme can be implemented as a deep generative model with a random number of layers.
Tasks
Published	2019-03-05
URL	https://arxiv.org/abs/1903.01608v2
PDF	https://arxiv.org/pdf/1903.01608v2.pdf
PWC	https://paperswithcode.com/paper/theoretical-guarantees-for-sampling-and
Repo
Framework

Scheduled Sampling for Transformers


Title	Scheduled Sampling for Transformers
Authors	Tsvetomila Mihaylova, André F. T. Martins
Abstract	Scheduled sampling is a technique for avoiding one of the known problems in sequence-to-sequence generation: exposure bias. It consists of feeding the model a mix of the teacher forced embeddings and the model predictions from the previous step in training time. The technique has been used for improving the model performance with recurrent neural networks (RNN). In the Transformer model, unlike the RNN, the generation of a new word attends to the full sentence generated so far, not only to the last word, and it is not straightforward to apply the scheduled sampling technique. We propose some structural changes to allow scheduled sampling to be applied to Transformer architecture, via a two-pass decoding strategy. Experiments on two language pairs achieve performance close to a teacher-forcing baseline and show that this technique is promising for further exploration.
Tasks
Published	2019-06-18
URL	https://arxiv.org/abs/1906.07651v1
PDF	https://arxiv.org/pdf/1906.07651v1.pdf
PWC	https://paperswithcode.com/paper/scheduled-sampling-for-transformers
Repo
Framework

Hard-Mining Loss based Convolutional Neural Network for Face Recognition


Title	Hard-Mining Loss based Convolutional Neural Network for Face Recognition
Authors	Yash Srivastava, Vaishnav Murali, Shiv Ram Dubey
Abstract	Face Recognition is one of the prominent problems in the computer vision domain. Witnessing advances in deep learning, significant work has been observed in face recognition, which touched upon various parts of the recognition framework like Convolutional Neural Network (CNN), Layers, Loss functions, etc. Various loss functions such as Cross-Entropy, Angular-Softmax and ArcFace have been introduced to learn the weights of network for face recognition. However, these loss functions are not able to priorities the hard samples as compared to easy samples. Moreover, their learning process is biased due to a number of easy examples compared to hard examples. In this paper, we address this issue by considering hard examples with more priority. In order to do so, We propose a Hard-Mining loss by by increasing the loss for harder examples and decreasing the loss for easy examples. The proposed concept is generic and can be used with any existing loss function. We test the Hard-Mining loss with different losses such as Cross-Entropy, Angular-Softmax and ArcFace. The proposed Hard-Mining loss is tested over widely used the Labeled Faces in the Wild (LFW) and YouTube Faces (YTF) datasets while training is performed over CASIA-WebFace and MS-Celeb-1M datasets. We use the residual network (i.e., ResNet18) for the experimental analysis. The experimental results suggest that the performance of existing loss functions is boosted when used in the framework of the proposed Hard-Mining loss.
Tasks	Face Recognition
Published	2019-08-09
URL	https://arxiv.org/abs/1908.09747v1
PDF	https://arxiv.org/pdf/1908.09747v1.pdf
PWC	https://paperswithcode.com/paper/hard-mining-loss-based-convolutional-neural
Repo
Framework

A Performance Comparison of Data Mining Algorithms Based Intrusion Detection System for Smart Grid


Title	A Performance Comparison of Data Mining Algorithms Based Intrusion Detection System for Smart Grid
Authors	Zakaria El Mrabet, Hassan El Ghazi, Naima Kaabouch
Abstract	Smart grid is an emerging and promising technology. It uses the power of information technologies to deliver intelligently the electrical power to customers, and it allows the integration of the green technology to meet the environmental requirements. Unfortunately, information technologies have its inherent vulnerabilities and weaknesses that expose the smart grid to a wide variety of security risks. The Intrusion detection system (IDS) plays an important role in securing smart grid networks and detecting malicious activity, yet it suffers from several limitations. Many research papers have been published to address these issues using several algorithms and techniques. Therefore, a detailed comparison between these algorithms is needed. This paper presents an overview of four data mining algorithms used by IDS in Smart Grid. An evaluation of performance of these algorithms is conducted based on several metrics including the probability of detection, probability of false alarm, probability of miss detection, efficiency, and processing time. Results show that Random Forest outperforms the other three algorithms in detecting attacks with higher probability of detection, lower probability of false alarm, lower probability of miss detection, and higher accuracy.
Tasks	Intrusion Detection
Published	2019-12-31
URL	https://arxiv.org/abs/2001.00917v1
PDF	https://arxiv.org/pdf/2001.00917v1.pdf
PWC	https://paperswithcode.com/paper/a-performance-comparison-of-data-mining
Repo
Framework

Scene-level Pose Estimation for Multiple Instances of Densely Packed Objects


Title	Scene-level Pose Estimation for Multiple Instances of Densely Packed Objects
Authors	Chaitanya Mitash, Bowen Wen, Kostas Bekris, Abdeslam Boularias
Abstract	This paper introduces key machine learning operations that allow the realization of robust, joint 6D pose estimation of multiple instances of objects either densely packed or in unstructured piles from RGB-D data. The first objective is to learn semantic and instance-boundary detectors without manual labeling. An adversarial training framework in conjunction with physics-based simulation is used to achieve detectors that behave similarly in synthetic and real data. Given the stochastic output of such detectors, candidates for object poses are sampled. The second objective is to automatically learn a single score for each pose candidate that represents its quality in terms of explaining the entire scene via a gradient boosted tree. The proposed method uses features derived from surface and boundary alignment between the observed scene and the object model placed at hypothesized poses. Scene-level, multi-instance pose estimation is then achieved by an integer linear programming process that selects hypotheses that maximize the sum of the learned individual scores, while respecting constraints, such as avoiding collisions. To evaluate this method, a dataset of densely packed objects with challenging setups for state-of-the-art approaches is collected. Experiments on this dataset and a public one show that the method significantly outperforms alternatives in terms of 6D pose accuracy while trained only with synthetic datasets.
Tasks	6D Pose Estimation, Pose Estimation
Published	2019-10-11
URL	https://arxiv.org/abs/1910.04953v1
PDF	https://arxiv.org/pdf/1910.04953v1.pdf
PWC	https://paperswithcode.com/paper/scene-level-pose-estimation-for-multiple
Repo
Framework

Privacy-Preserving Blockchain Mining: Sybil-resistance by Proof-of-Useful-Work


Title	Privacy-Preserving Blockchain Mining: Sybil-resistance by Proof-of-Useful-Work
Authors	Hjalmar Turesson, Marek Laskowski, Alexandra Roatis, Henry M. Kim
Abstract	Blockchains rely on a consensus among participants to achieve decentralization and security. However, reaching consensus in an online, digital world where identities are not tied to physical users is a challenging problem. Proof-of-work (PoW) provides a solution by linking representation to a valuable, physical resource. This has worked well, currently securing Bitcoins $100 B value. However, the Bitcoin network uses a tremendous amount of specialized hardware and energy, and since the utility of these resources is strictly limited to blockchain security, the resources used are not useful other purposes. Here, we propose an alternative consensus scheme that directs the computational resources to a task with utility beyond blockchain security, aiming at better resource utilization. The key idea is to channel the resources to optimization of machine learning (ML) models by setting up decentralized ML competitions. This is achieved by a hybrid consensus scheme relying on three parties: data providers, miners, and a committee. The data provider makes data available and provides payment in return for the best model, miners compete about the payment and access to the committee by producing ML optimized models, and the committee controls the ML competition.
Tasks
Published	2019-07-20
URL	https://arxiv.org/abs/1907.08744v2
PDF	https://arxiv.org/pdf/1907.08744v2.pdf
PWC	https://paperswithcode.com/paper/privacy-preserving-blockchain-mining-sybil
Repo
Framework

Requirements Engineering for Machine Learning: Perspectives from Data Scientists


Title	Requirements Engineering for Machine Learning: Perspectives from Data Scientists
Authors	Andreas Vogelsang, Markus Borg
Abstract	Machine learning (ML) is used increasingly in real-world applications. In this paper, we describe our ongoing endeavor to define characteristics and challenges unique to Requirements Engineering (RE) for ML-based systems. As a first step, we interviewed four data scientists to understand how ML experts approach elicitation, specification, and assurance of requirements and expectations. The results show that changes in the development paradigm, i.e., from coding to training, also demands changes in RE. We conclude that development of ML systems demands requirements engineers to: (1) understand ML performance measures to state good functional requirements, (2) be aware of new quality requirements such as explainability, freedom from discrimination, or specific legal requirements, and (3) integrate ML specifics in the RE process. Our study provides a first contribution towards an RE methodology for ML systems.
Tasks
Published	2019-08-13
URL	https://arxiv.org/abs/1908.04674v1
PDF	https://arxiv.org/pdf/1908.04674v1.pdf
PWC	https://paperswithcode.com/paper/requirements-engineering-for-machine-learning
Repo
Framework

Sampling-based sublinear low-rank matrix arithmetic framework for dequantizing quantum machine learning


Title	Sampling-based sublinear low-rank matrix arithmetic framework for dequantizing quantum machine learning
Authors	Nai-Hui Chia, András Gilyén, Tongyang Li, Han-Hsuan Lin, Ewin Tang, Chunhao Wang
Abstract	We present an algorithmic framework generalizing quantum-inspired polylogarithmic-time algorithms on low-rank matrices. Our work follows the line of research started by Tang’s breakthrough classical algorithm for recommendation systems [STOC’19]. The main result of this work is an algorithm for singular value transformation on low-rank inputs in the quantum-inspired regime, where singular value transformation is a framework proposed by Gily'en et al. [STOC’19] to study various quantum speedups. Since singular value transformation encompasses a vast range of matrix arithmetic, this result, combined with simple sampling lemmas from previous work, suffices to generalize all results dequantizing quantum machine learning algorithms to the authors’ knowledge. Via simple black-box applications of our singular value transformation framework, we recover the dequantization results on recommendation systems, principal component analysis, supervised clustering, low-rank matrix inversion, low-rank semidefinite programming, and support vector machines. We also give additional dequantizations results on low-rank Hamiltonian simulation and discriminant analysis.
Tasks	Quantum Machine Learning, Recommendation Systems
Published	2019-10-14
URL	https://arxiv.org/abs/1910.06151v1
PDF	https://arxiv.org/pdf/1910.06151v1.pdf
PWC	https://paperswithcode.com/paper/sampling-based-sublinear-low-rank-matrix
Repo
Framework

Clustering by Orthogonal NMF Model and Non-Convex Penalty Optimization


Title	Clustering by Orthogonal NMF Model and Non-Convex Penalty Optimization
Authors	Shuai Wang, Tsung-Hui Chang, Ying Cui, Jong-Shi Pang
Abstract	The non-negative matrix factorization (NMF) model with an additional orthogonality constraint on one of the factor matrices, called the orthogonal NMF (ONMF), has been found to provide improved clustering performance over the K-means. Solving the ONMF model is a challenging optimization problem due to the existence of both orthogonality and nonnegativity constraints, and most of the existing methods directly deal with the orthogonality constraint in its original form via various optimization techniques. In this paper, we propose a new ONMF based clustering formulation that equivalently transforms the orthogonality constraint into a set of norm-based non-convex equality constraints. We then apply a non-convex penalty (NCP) approach to add the non-convex equality constraints to the objective as penalty terms, leaving simple non-negativity constraints only in the penalized problem. One smooth penalty formulation and one non-smooth penalty formulation are respectively studied, and theoretical conditions for the penalized problems to provide feasible stationary solutions to the ONMF based clustering problem are presented. Experimental results based on both synthetic and real datasets are presented to show that the proposed NCP methods are computationally time efficient, and either match or outperform the existing K-means and ONMF based methods in terms of the clustering performance.
Tasks
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00570v1
PDF	https://arxiv.org/pdf/1906.00570v1.pdf
PWC	https://paperswithcode.com/paper/190600570
Repo
Framework

Gradual Machine Learning for Aspect-level Sentiment Analysis


Title	Gradual Machine Learning for Aspect-level Sentiment Analysis
Authors	Yanyan Wang, Qun Chen, Jiquan Shen, Boyi Hou, Murtadha Ahmed, Zhanhuai Li
Abstract	The state-of-the-art solutions for Aspect-Level Sentiment Analysis (ALSA) were built on a variety of deep neural networks (DNN), whose efficacy depends on large amounts of accurately labeled training data. Unfortunately, high-quality labeled training data usually require expensive manual work, and may thus not be readily available in real scenarios. In this paper, we propose a novel solution for ALSA based on the recently proposed paradigm of gradual machine learning, which can enable effective machine labeling without the requirement for manual labeling effort. It begins with some easy instances in an ALSA task, which can be automatically labeled by the machine with high accuracy, and then gradually labels the more challenging instances by iterative factor graph inference. In the process of gradual machine learning, the hard instances are gradually labeled in small stages based on the estimated evidential certainty provided by the labeled easier instances. Our extensive experiments on the benchmark datasets have shown that the performance of the proposed solution is considerably better than its unsupervised alternatives, and also highly competitive compared to the state-of-the-art supervised DNN techniques.
Tasks	Sentiment Analysis
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02502v2
PDF	https://arxiv.org/pdf/1906.02502v2.pdf
PWC	https://paperswithcode.com/paper/gradual-machine-learning-for-aspect-level
Repo
Framework

View N-gram Network for 3D Object Retrieval


Title	View N-gram Network for 3D Object Retrieval
Authors	Xinwei He, Tengteng Huang, Song Bai, Xiang Bai
Abstract	How to aggregate multi-view representations of a 3D object into an informative and discriminative one remains a key challenge for multi-view 3D object retrieval. Existing methods either use view-wise pooling strategies which neglect the spatial information across different views or employ recurrent neural networks which may face the efficiency problem. To address these issues, we propose an effective and efficient framework called View N-gram Network (VNN). Inspired by n-gram models in natural language processing, VNN divides the view sequence into a set of visual n-grams, which involve overlapping consecutive view sub-sequences. By doing so, spatial information across multiple views is captured, which helps to learn a discriminative global embedding for each 3D object. Experiments on 3D shape retrieval benchmarks, including ModelNet10, ModelNet40 and ShapeNetCore55 datasets, demonstrate the superiority of our proposed method.
Tasks	3D Object Retrieval, 3D Shape Retrieval
Published	2019-08-06
URL	https://arxiv.org/abs/1908.01958v2
PDF	https://arxiv.org/pdf/1908.01958v2.pdf
PWC	https://paperswithcode.com/paper/view-n-gram-network-for-3d-object-retrieval
Repo
Framework

Non-linear ICA based on Cramer-Wold metric


Title	Non-linear ICA based on Cramer-Wold metric
Authors	Przemysław Spurek, Aleksandra Nowak, Jacek Tabor, Łukasz Maziarka, Stanisław Jastrzębski
Abstract	Non-linear source separation is a challenging open problem with many applications. We extend a recently proposed Adversarial Non-linear ICA (ANICA) model, and introduce Cramer-Wold ICA (CW-ICA). In contrast to ANICA we use a simple, closed–form optimization target instead of a discriminator–based independence measure. Our results show that CW-ICA achieves comparable results to ANICA, while foregoing the need for adversarial training.
Tasks
Published	2019-03-01
URL	http://arxiv.org/abs/1903.00201v1
PDF	http://arxiv.org/pdf/1903.00201v1.pdf
PWC	https://paperswithcode.com/paper/non-linear-ica-based-on-cramer-wold-metric
Repo
Framework

Whatcha lookin’ at? DeepLIFTing BERT’s Attention in Question Answering


Title	Whatcha lookin’ at? DeepLIFTing BERT’s Attention in Question Answering
Authors	Ekaterina Arkhangelskaia, Sourav Dutta
Abstract	There has been great success recently in tackling challenging NLP tasks by neural networks which have been pre-trained and fine-tuned on large amounts of task data. In this paper, we investigate one such model, BERT for question-answering, with the aim to analyze why it is able to achieve significantly better results than other models. We run DeepLIFT on the model predictions and test the outcomes to monitor shift in the attention values for input. We also cluster the results to analyze any possible patterns similar to human reasoning depending on the kind of input paragraph and question the model is trying to answer.
Tasks	Question Answering
Published	2019-10-14
URL	https://arxiv.org/abs/1910.06431v1
PDF	https://arxiv.org/pdf/1910.06431v1.pdf
PWC	https://paperswithcode.com/paper/whatcha-lookin-at-deeplifting-berts-attention
Repo
Framework

A Non-commutative Bilinear Model for Answering Path Queries in Knowledge Graphs


Title	A Non-commutative Bilinear Model for Answering Path Queries in Knowledge Graphs
Authors	Katsuhiko Hayashi, Masashi Shimbo
Abstract	Bilinear diagonal models for knowledge graph embedding (KGE), such as DistMult and ComplEx, balance expressiveness and computational efficiency by representing relations as diagonal matrices. Although they perform well in predicting atomic relations, composite relations (relation paths) cannot be modeled naturally by the product of relation matrices, as the product of diagonal matrices is commutative and hence invariant with the order of relations. In this paper, we propose a new bilinear KGE model, called BlockHolE, based on block circulant matrices. In BlockHolE, relation matrices can be non-commutative, allowing composite relations to be modeled by matrix product. The model is parameterized in a way that covers a spectrum ranging from diagonal to full relation matrices. A fast computation technique is developed on the basis of the duality of the Fourier transform of circulant matrices.
Tasks	Graph Embedding, Knowledge Graph Embedding, Knowledge Graphs
Published	2019-09-04
URL	https://arxiv.org/abs/1909.01567v1
PDF	https://arxiv.org/pdf/1909.01567v1.pdf
PWC	https://paperswithcode.com/paper/a-non-commutative-bilinear-model-for
Repo
Framework