October 17, 2019

3144 words 15 mins read

Paper Group ANR 801

StackNet: Stacking Parameters for Continual learning. An Incremental Construction of Deep Neuro Fuzzy System for Continual Learning of Non-stationary Data Streams. From biological vision to unsupervised hierarchical sparse coding. On Catastrophic Forgetting and Mode Collapse in Generative Adversarial Networks. Pose Estimation for Non-Cooperative Sp …

StackNet: Stacking Parameters for Continual learning


Title	StackNet: Stacking Parameters for Continual learning
Authors	Jangho Kim, Jeesoo Kim, Nojun Kwak
Abstract	Training a neural network for a classification task typically assumes that the data to train are given from the beginning. However, in the real world, additional data accumulate gradually and the model requires additional training without accessing the old training data. This usually leads to the catastrophic forgetting problem which is inevitable for the traditional training methodology of neural networks. In this paper, we propose a continual learning method stacking feature map based continual learning method that is able to learn additional tasks while retaining the performance of previously learned tasks by stacking parameters. Composed of two complementary components, the index module and the StackNet, our method estimates the index of the corresponding task for an input sample with the index module and utilizes a particular portion of StackNet with this index. The StackNet guarantees no degradation in the performance of the previously learned tasks and the index module shows high confidence in finding the origin of an input sample.
Tasks	Continual Learning
Published	2018-09-07
URL	http://arxiv.org/abs/1809.02441v2
PDF	http://arxiv.org/pdf/1809.02441v2.pdf
PWC	https://paperswithcode.com/paper/hc-net-memory-based-incremental-dual-network
Repo
Framework

An Incremental Construction of Deep Neuro Fuzzy System for Continual Learning of Non-stationary Data Streams


Title	An Incremental Construction of Deep Neuro Fuzzy System for Continual Learning of Non-stationary Data Streams
Authors	Mahardhika Pratama, Witold Pedrycz, Geoffrey I. Webb
Abstract	Existing FNNs are mostly developed under a shallow network configuration having lower generalization power than those of deep structures. This paper proposes a novel self-organizing deep FNN, namely DEVFNN. Fuzzy rules can be automatically extracted from data streams or removed if they play limited role during their lifespan. The structure of the network can be deepened on demand by stacking additional layers using a drift detection method which not only detects the covariate drift, variations of input space, but also accurately identifies the real drift, dynamic changes of both feature space and target space. DEVFNN is developed under the stacked generalization principle via the feature augmentation concept where a recently developed algorithm, namely gClass, drives the hidden layer. It is equipped by an automatic feature selection method which controls activation and deactivation of input attributes to induce varying subsets of input features. A deep network simplification procedure is put forward using the concept of hidden layer merging to prevent uncontrollable growth of dimensionality of input space due to the nature of feature augmentation approach in building a deep network structure. DEVFNN works in the sample-wise fashion and is compatible for data stream applications. The efficacy of DEVFNN has been thoroughly evaluated using seven datasets with non-stationary properties under the prequential test-then-train protocol. It has been compared with four popular continual learning algorithms and its shallow counterpart where DEVFNN demonstrates improvement of classification accuracy. Moreover, it is also shown that the concept drift detection method is an effective tool to control the depth of network structure while the hidden layer merging scenario is capable of simplifying the network complexity of a deep network with negligible compromise of generalization performance.
Tasks	Continual Learning, Feature Selection
Published	2018-08-26
URL	https://arxiv.org/abs/1808.08517v2
PDF	https://arxiv.org/pdf/1808.08517v2.pdf
PWC	https://paperswithcode.com/paper/an-incremental-construction-of-deep-neuro
Repo
Framework

From biological vision to unsupervised hierarchical sparse coding


Title	From biological vision to unsupervised hierarchical sparse coding
Authors	Victor Boutin, Angelo Franciosini, Franck Ruffier, Laurent. U Perrinet
Abstract	The formation of connections between neural cells is emerging essentially from an unsupervised learning process. For instance, during the development of the primary visual cortex of mammals (V1), we observe the emergence of cells selective to localized and oriented features. This leads to the development of a rough contour-based representation of the retinal image in area V1. We propose a biological model of the formation of this representation along the thalamo-cortical pathway. To achieve this goal, we replicated the Multi-Layer Convolutional Sparse Coding (ML-CSC) algorithm developed by Michael Elad’s group.
Tasks
Published	2018-12-04
URL	http://arxiv.org/abs/1812.01335v1
PDF	http://arxiv.org/pdf/1812.01335v1.pdf
PWC	https://paperswithcode.com/paper/from-biological-vision-to-unsupervised
Repo
Framework

On Catastrophic Forgetting and Mode Collapse in Generative Adversarial Networks


Title	On Catastrophic Forgetting and Mode Collapse in Generative Adversarial Networks
Authors	Hoang Thanh-Tung, Truyen Tran
Abstract	In this paper, we show that Generative Adversarial Networks (GANs) suffer from catastrophic forgetting even when they are trained to approximate a single target distribution. We show that GAN training is a continual learning problem in which the sequence of changing model distributions is the sequence of tasks to the discriminator. The level of mismatch between tasks in the sequence determines the level of forgetting. Catastrophic forgetting is interrelated to mode collapse and can make the training of GANs non-convergent. We investigate the landscape of the discriminator’s output in different variants of GANs and find that when a GAN converges to a good equilibrium, real training datapoints are wide local maxima of the discriminator. We empirically show the relationship between the sharpness of local maxima and mode collapse and generalization in GANs. We show how catastrophic forgetting prevents the discriminator from making real datapoints local maxima, and thus causes non-convergence. Finally, we study methods for preventing catastrophic forgetting in GANs.
Tasks	Continual Learning
Published	2018-07-11
URL	https://arxiv.org/abs/1807.04015v8
PDF	https://arxiv.org/pdf/1807.04015v8.pdf
PWC	https://paperswithcode.com/paper/on-catastrophic-forgetting-and-mode-collapse
Repo
Framework

Pose Estimation for Non-Cooperative Spacecraft Rendezvous Using Convolutional Neural Networks


Title	Pose Estimation for Non-Cooperative Spacecraft Rendezvous Using Convolutional Neural Networks
Authors	Sumant Sharma, Connor Beierle, Simone D’Amico
Abstract	On-board estimation of the pose of an uncooperative target spacecraft is an essential task for future on-orbit servicing and close-proximity formation flying missions. However, two issues hinder reliable on-board monocular vision based pose estimation: robustness to illumination conditions due to a lack of reliable visual features and scarcity of image datasets required for training and benchmarking. To address these two issues, this work details the design and validation of a monocular vision based pose determination architecture for spaceborne applications. The primary contribution to the state-of-the-art of this work is the introduction of a novel pose determination method based on Convolutional Neural Networks (CNN) to provide an initial guess of the pose in real-time on-board. The method involves discretizing the pose space and training the CNN with images corresponding to the resulting pose labels. Since reliable training of the CNN requires massive image datasets and computational resources, the parameters of the CNN must be determined prior to the mission with synthetic imagery. Moreover, reliable training of the CNN requires datasets that appropriately account for noise, color, and illumination characteristics expected in orbit. Therefore, the secondary contribution of this work is the introduction of an image synthesis pipeline, which is tailored to generate high fidelity images of any spacecraft 3D model. The proposed technique is scalable to spacecraft of different structural and physical properties as well as robust to the dynamic illumination conditions of space. Through metrics measuring classification and pose accuracy, it is shown that the presented architecture has desirable robustness and scalable properties.
Tasks	Image Generation, Pose Estimation
Published	2018-09-19
URL	http://arxiv.org/abs/1809.07238v1
PDF	http://arxiv.org/pdf/1809.07238v1.pdf
PWC	https://paperswithcode.com/paper/pose-estimation-for-non-cooperative
Repo
Framework

Branching embedding: A heuristic dimensionality reduction algorithm based on hierarchical clustering


Title	Branching embedding: A heuristic dimensionality reduction algorithm based on hierarchical clustering
Authors	Makito Oku
Abstract	This paper proposes a new dimensionality reduction algorithm named branching embedding (BE). It converts a dendrogram to a two-dimensional scatter plot, and visualizes the inherent structures of the original high-dimensional data. Since the conversion part is not computationally demanding, the BE algorithm would be beneficial for the case where hierarchical clustering is already performed. Numerical experiments revealed that the outputs of the algorithm moderately preserve the original hierarchical structures.
Tasks	Dimensionality Reduction
Published	2018-05-06
URL	http://arxiv.org/abs/1805.02161v1
PDF	http://arxiv.org/pdf/1805.02161v1.pdf
PWC	https://paperswithcode.com/paper/branching-embedding-a-heuristic
Repo
Framework

Towards Continuous Domain adaptation for Healthcare


Title	Towards Continuous Domain adaptation for Healthcare
Authors	Rahul Venkataramani, Hariharan Ravishankar, Saihareesh Anamandra
Abstract	Deep learning algorithms have demonstrated tremendous success on challenging medical imaging problems. However, post-deployment, these algorithms are susceptible to data distribution variations owing to \emph{limited data issues} and \emph{diversity} in medical images. In this paper, we propose \emph{ContextNets}, a generic memory-augmented neural network framework for semantic segmentation to achieve continuous domain adaptation without the necessity of retraining. Unlike existing methods which require access to entire source and target domain images, our algorithm can adapt to a target domain with a few similar images. We condition the inference on any new input with features computed on its support set of images (and masks, if available) through contextual embeddings to achieve site-specific adaptation. We demonstrate state-of-the-art domain adaptation performance on the X-ray lung segmentation problem from three independent cohorts that differ in disease type, gender, contrast and intensity variations.
Tasks	Domain Adaptation, Semantic Segmentation
Published	2018-12-04
URL	http://arxiv.org/abs/1812.01281v1
PDF	http://arxiv.org/pdf/1812.01281v1.pdf
PWC	https://paperswithcode.com/paper/towards-continuous-domain-adaptation-for
Repo
Framework

Entity Commonsense Representation for Neural Abstractive Summarization


Title	Entity Commonsense Representation for Neural Abstractive Summarization
Authors	Reinald Kim Amplayo, Seonjae Lim, Seung-won Hwang
Abstract	A major proportion of a text summary includes important entities found in the original text. These entities build up the topic of the summary. Moreover, they hold commonsense information once they are linked to a knowledge base. Based on these observations, this paper investigates the usage of linked entities to guide the decoder of a neural text summarizer to generate concise and better summaries. To this end, we leverage on an off-the-shelf entity linking system (ELS) to extract linked entities and propose Entity2Topic (E2T), a module easily attachable to a sequence-to-sequence model that transforms a list of entities into a vector representation of the topic of the summary. Current available ELS’s are still not sufficiently effective, possibly introducing unresolved ambiguities and irrelevant entities. We resolve the imperfections of the ELS by (a) encoding entities with selective disambiguation, and (b) pooling entity vectors using firm attention. By applying E2T to a simple sequence-to-sequence model with attention mechanism as base model, we see significant improvements of the performance in the Gigaword (sentence to title) and CNN (long document to multi-sentence highlights) summarization datasets by at least 2 ROUGE points.
Tasks	Abstractive Text Summarization, Entity Linking
Published	2018-06-14
URL	http://arxiv.org/abs/1806.05504v1
PDF	http://arxiv.org/pdf/1806.05504v1.pdf
PWC	https://paperswithcode.com/paper/entity-commonsense-representation-for-neural
Repo
Framework


Title	The Social Cost of Strategic Classification
Authors	Smitha Milli, John Miller, Anca D. Dragan, Moritz Hardt
Abstract	Consequential decision-making typically incentivizes individuals to behave strategically, tailoring their behavior to the specifics of the decision rule. A long line of work has therefore sought to counteract strategic behavior by designing more conservative decision boundaries in an effort to increase robustness to the effects of strategic covariate shift. We show that these efforts benefit the institutional decision maker at the expense of the individuals being classified. Introducing a notion of social burden, we prove that any increase in institutional utility necessarily leads to a corresponding increase in social burden. Moreover, we show that the negative externalities of strategic classification can disproportionately harm disadvantaged groups in the population. Our results highlight that strategy-robustness must be weighed against considerations of social welfare and fairness.
Tasks	Decision Making
Published	2018-08-25
URL	http://arxiv.org/abs/1808.08460v2
PDF	http://arxiv.org/pdf/1808.08460v2.pdf
PWC	https://paperswithcode.com/paper/the-social-cost-of-strategic-classification
Repo
Framework

Consistency and Variation in Kernel Neural Ranking Model


Title	Consistency and Variation in Kernel Neural Ranking Model
Authors	Mary Arpita Pyreddy, Varshini Ramaseshan, Narendra Nath Joshi, Zhuyun Dai, Chenyan Xiong, Jamie Callan, Zhiyuan Liu
Abstract	This paper studies the consistency of the kernel-based neural ranking model K-NRM, a recent state-of-the-art neural IR model, which is important for reproducible research and deployment in the industry. We find that K-NRM has low variance on relevance-based metrics across experimental trials. In spite of this low variance in overall performance, different trials produce different document rankings for individual queries. The main source of variance in our experiments was found to be different latent matching patterns captured by K-NRM. In the IR-customized word embeddings learned by K-NRM, the query-document word pairs follow two different matching patterns that are equally effective, but align word pairs differently in the embedding space. The different latent matching patterns enable a simple yet effective approach to construct ensemble rankers, which improve K-NRM’s effectiveness and generalization abilities.
Tasks	Word Embeddings
Published	2018-09-27
URL	http://arxiv.org/abs/1809.10522v1
PDF	http://arxiv.org/pdf/1809.10522v1.pdf
PWC	https://paperswithcode.com/paper/consistency-and-variation-in-kernel-neural
Repo
Framework

Meta Continual Learning


Title	Meta Continual Learning
Authors	Risto Vuorio, Dong-Yeon Cho, Daejoong Kim, Jiwon Kim
Abstract	Using neural networks in practical settings would benefit from the ability of the networks to learn new tasks throughout their lifetimes without forgetting the previous tasks. This ability is limited in the current deep neural networks by a problem called catastrophic forgetting, where training on new tasks tends to severely degrade performance on previous tasks. One way to lessen the impact of the forgetting problem is to constrain parameters that are important to previous tasks to stay close to the optimal parameters. Recently, multiple competitive approaches for computing the importance of the parameters with respect to the previous tasks have been presented. In this paper, we propose a learning to optimize algorithm for mitigating catastrophic forgetting. Instead of trying to formulate a new constraint function ourselves, we propose to train another neural network to predict parameter update steps that respect the importance of parameters to the previous tasks. In the proposed meta-training scheme, the update predictor is trained to minimize loss on a combination of current and past tasks. We show experimentally that the proposed approach works in the continual learning setting.
Tasks	Continual Learning
Published	2018-06-11
URL	http://arxiv.org/abs/1806.06928v1
PDF	http://arxiv.org/pdf/1806.06928v1.pdf
PWC	https://paperswithcode.com/paper/meta-continual-learning
Repo
Framework

Deep Learning with Apache SystemML


Title	Deep Learning with Apache SystemML
Authors	Niketan Pansare, Michael Dusenberry, Nakul Jindal, Matthias Boehm, Berthold Reinwald, Prithviraj Sen
Abstract	Enterprises operate large data lakes using Hadoop and Spark frameworks that (1) run a plethora of tools to automate powerful data preparation/transformation pipelines, (2) run on shared, large clusters to (3) perform many different analytics tasks ranging from model preparation, building, evaluation, and tuning for both machine learning and deep learning. Developing machine/deep learning models on data in such shared environments is challenging. Apache SystemML provides a unified framework for implementing machine learning and deep learning algorithms in a variety of shared deployment scenarios. SystemML’s novel compilation approach automatically generates runtime execution plans for machine/deep learning algorithms that are composed of single-node and distributed runtime operations depending on data and cluster characteristics such as data size, data sparsity, cluster size, and memory configurations, while still exploiting the capabilities of the underlying big data frameworks.
Tasks
Published	2018-02-08
URL	http://arxiv.org/abs/1802.04647v1
PDF	http://arxiv.org/pdf/1802.04647v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-with-apache-systemml
Repo
Framework

Malware triage for early identification of Advanced Persistent Threat activities


Title	Malware triage for early identification of Advanced Persistent Threat activities
Authors	Giuseppe Laurenza, Riccardo Lazzeretti, Luca Mazzotti
Abstract	In the last decade, a new class of cyber-threats has emerged. This new cybersecurity adversary is known with the name of “Advanced Persistent Threat” (APT) and is referred to different organizations that in the last years have been “in the center of the eye” due to multiple dangerous and effective attacks targeting financial and politic, news headlines, embassies, critical infrastructures, TV programs, etc. In order to early identify APT related malware, a semi-automatic approach for malware samples analysis is needed. In our previous work we introduced a “malware triage” step for a semi-automatic malware analysis architecture. This step has the duty to analyze as fast as possible new incoming samples and to immediately dispatch the ones that deserve a deeper analysis, among all the malware delivered per day in the cyber-space, the ones that really worth to be further examined by analysts. Our paper focuses on malware developed by APTs, and we build our knowledge base, used in the triage, on known APTs obtained from publicly available reports. In order to have the triage as fast as possible, we only rely on static malware features, that can be extracted with negligible delay, and use machine learning techniques for the identification. In this work we move from multiclass classification to a group of oneclass classifier, which simplify the training and allows higher modularity. The results of the proposed framework highlight high performances, reaching a precision of 100% and an accuracy over 95%
Tasks
Published	2018-10-16
URL	http://arxiv.org/abs/1810.07321v1
PDF	http://arxiv.org/pdf/1810.07321v1.pdf
PWC	https://paperswithcode.com/paper/malware-triage-for-early-identification-of
Repo
Framework

Stereoscopic Neural Style Transfer


Title	Stereoscopic Neural Style Transfer
Authors	Dongdong Chen, Lu Yuan, Jing Liao, Nenghai Yu, Gang Hua
Abstract	This paper presents the first attempt at stereoscopic neural style transfer, which responds to the emerging demand for 3D movies or AR/VR. We start with a careful examination of applying existing monocular style transfer methods to left and right views of stereoscopic images separately. This reveals that the original disparity consistency cannot be well preserved in the final stylization results, which causes 3D fatigue to the viewers. To address this issue, we incorporate a new disparity loss into the widely adopted style loss function by enforcing the bidirectional disparity constraint in non-occluded regions. For a practical real-time solution, we propose the first feed-forward network by jointly training a stylization sub-network and a disparity sub-network, and integrate them in a feature level middle domain. Our disparity sub-network is also the first end-to-end network for simultaneous bidirectional disparity and occlusion mask estimation. Finally, our network is effectively extended to stereoscopic videos, by considering both temporal coherence and disparity consistency. We will show that the proposed method clearly outperforms the baseline algorithms both quantitatively and qualitatively.
Tasks	Style Transfer
Published	2018-02-28
URL	http://arxiv.org/abs/1802.10591v2
PDF	http://arxiv.org/pdf/1802.10591v2.pdf
PWC	https://paperswithcode.com/paper/stereoscopic-neural-style-transfer
Repo
Framework

Learning More Robust Features with Adversarial Training


Title	Learning More Robust Features with Adversarial Training
Authors	Shuangtao Li, Yuanke Chen, Yanlin Peng, Lin Bai
Abstract	In recent years, it has been found that neural networks can be easily fooled by adversarial examples, which is a potential safety hazard in some safety-critical applications. Many researchers have proposed various method to make neural networks more robust to white-box adversarial attacks, but an effective method have not been found so far. In this short paper, we focus on the robustness of the features learned by neural networks. We show that the features learned by neural networks are not robust, and find that the robustness of the learned features is closely related to the resistance against adversarial examples of neural networks. We also find that adversarial training against fast gradients sign method (FGSM) does not make the leaned features very robust, even if it can make the trained networks very resistant to FGSM attack. Then we propose a method, which can be seen as an extension of adversarial training, to train neural networks to learn more robust features. We perform experiments on MNIST and CIFAR-10 to evaluate our method, and the experiment results show that this method greatly improves the robustness of the learned features and the resistance to adversarial attacks.
Tasks
Published	2018-04-20
URL	http://arxiv.org/abs/1804.07757v1
PDF	http://arxiv.org/pdf/1804.07757v1.pdf
PWC	https://paperswithcode.com/paper/learning-more-robust-features-with
Repo
Framework