Paper Group ANR 801
StackNet: Stacking Parameters for Continual learning. An Incremental Construction of Deep Neuro Fuzzy System for Continual Learning of Non-stationary Data Streams. From biological vision to unsupervised hierarchical sparse coding. On Catastrophic Forgetting and Mode Collapse in Generative Adversarial Networks. Pose Estimation for Non-Cooperative Sp …
StackNet: Stacking Parameters for Continual learning
Title | StackNet: Stacking Parameters for Continual learning |
Authors | Jangho Kim, Jeesoo Kim, Nojun Kwak |
Abstract | Training a neural network for a classification task typically assumes that the data to train are given from the beginning. However, in the real world, additional data accumulate gradually and the model requires additional training without accessing the old training data. This usually leads to the catastrophic forgetting problem which is inevitable for the traditional training methodology of neural networks. In this paper, we propose a continual learning method stacking feature map based continual learning method that is able to learn additional tasks while retaining the performance of previously learned tasks by stacking parameters. Composed of two complementary components, the index module and the StackNet, our method estimates the index of the corresponding task for an input sample with the index module and utilizes a particular portion of StackNet with this index. The StackNet guarantees no degradation in the performance of the previously learned tasks and the index module shows high confidence in finding the origin of an input sample. |
Tasks | Continual Learning |
Published | 2018-09-07 |
URL | http://arxiv.org/abs/1809.02441v2 |
http://arxiv.org/pdf/1809.02441v2.pdf | |
PWC | https://paperswithcode.com/paper/hc-net-memory-based-incremental-dual-network |
Repo | |
Framework | |
An Incremental Construction of Deep Neuro Fuzzy System for Continual Learning of Non-stationary Data Streams
Title | An Incremental Construction of Deep Neuro Fuzzy System for Continual Learning of Non-stationary Data Streams |
Authors | Mahardhika Pratama, Witold Pedrycz, Geoffrey I. Webb |
Abstract | Existing FNNs are mostly developed under a shallow network configuration having lower generalization power than those of deep structures. This paper proposes a novel self-organizing deep FNN, namely DEVFNN. Fuzzy rules can be automatically extracted from data streams or removed if they play limited role during their lifespan. The structure of the network can be deepened on demand by stacking additional layers using a drift detection method which not only detects the covariate drift, variations of input space, but also accurately identifies the real drift, dynamic changes of both feature space and target space. DEVFNN is developed under the stacked generalization principle via the feature augmentation concept where a recently developed algorithm, namely gClass, drives the hidden layer. It is equipped by an automatic feature selection method which controls activation and deactivation of input attributes to induce varying subsets of input features. A deep network simplification procedure is put forward using the concept of hidden layer merging to prevent uncontrollable growth of dimensionality of input space due to the nature of feature augmentation approach in building a deep network structure. DEVFNN works in the sample-wise fashion and is compatible for data stream applications. The efficacy of DEVFNN has been thoroughly evaluated using seven datasets with non-stationary properties under the prequential test-then-train protocol. It has been compared with four popular continual learning algorithms and its shallow counterpart where DEVFNN demonstrates improvement of classification accuracy. Moreover, it is also shown that the concept drift detection method is an effective tool to control the depth of network structure while the hidden layer merging scenario is capable of simplifying the network complexity of a deep network with negligible compromise of generalization performance. |
Tasks | Continual Learning, Feature Selection |
Published | 2018-08-26 |
URL | https://arxiv.org/abs/1808.08517v2 |
https://arxiv.org/pdf/1808.08517v2.pdf | |
PWC | https://paperswithcode.com/paper/an-incremental-construction-of-deep-neuro |
Repo | |
Framework | |
From biological vision to unsupervised hierarchical sparse coding
Title | From biological vision to unsupervised hierarchical sparse coding |
Authors | Victor Boutin, Angelo Franciosini, Franck Ruffier, Laurent. U Perrinet |
Abstract | The formation of connections between neural cells is emerging essentially from an unsupervised learning process. For instance, during the development of the primary visual cortex of mammals (V1), we observe the emergence of cells selective to localized and oriented features. This leads to the development of a rough contour-based representation of the retinal image in area V1. We propose a biological model of the formation of this representation along the thalamo-cortical pathway. To achieve this goal, we replicated the Multi-Layer Convolutional Sparse Coding (ML-CSC) algorithm developed by Michael Elad’s group. |
Tasks | |
Published | 2018-12-04 |
URL | http://arxiv.org/abs/1812.01335v1 |
http://arxiv.org/pdf/1812.01335v1.pdf | |
PWC | https://paperswithcode.com/paper/from-biological-vision-to-unsupervised |
Repo | |
Framework | |
On Catastrophic Forgetting and Mode Collapse in Generative Adversarial Networks
Title | On Catastrophic Forgetting and Mode Collapse in Generative Adversarial Networks |
Authors | Hoang Thanh-Tung, Truyen Tran |
Abstract | In this paper, we show that Generative Adversarial Networks (GANs) suffer from catastrophic forgetting even when they are trained to approximate a single target distribution. We show that GAN training is a continual learning problem in which the sequence of changing model distributions is the sequence of tasks to the discriminator. The level of mismatch between tasks in the sequence determines the level of forgetting. Catastrophic forgetting is interrelated to mode collapse and can make the training of GANs non-convergent. We investigate the landscape of the discriminator’s output in different variants of GANs and find that when a GAN converges to a good equilibrium, real training datapoints are wide local maxima of the discriminator. We empirically show the relationship between the sharpness of local maxima and mode collapse and generalization in GANs. We show how catastrophic forgetting prevents the discriminator from making real datapoints local maxima, and thus causes non-convergence. Finally, we study methods for preventing catastrophic forgetting in GANs. |
Tasks | Continual Learning |
Published | 2018-07-11 |
URL | https://arxiv.org/abs/1807.04015v8 |
https://arxiv.org/pdf/1807.04015v8.pdf | |
PWC | https://paperswithcode.com/paper/on-catastrophic-forgetting-and-mode-collapse |
Repo | |
Framework | |
Pose Estimation for Non-Cooperative Spacecraft Rendezvous Using Convolutional Neural Networks
Title | Pose Estimation for Non-Cooperative Spacecraft Rendezvous Using Convolutional Neural Networks |
Authors | Sumant Sharma, Connor Beierle, Simone D’Amico |
Abstract | On-board estimation of the pose of an uncooperative target spacecraft is an essential task for future on-orbit servicing and close-proximity formation flying missions. However, two issues hinder reliable on-board monocular vision based pose estimation: robustness to illumination conditions due to a lack of reliable visual features and scarcity of image datasets required for training and benchmarking. To address these two issues, this work details the design and validation of a monocular vision based pose determination architecture for spaceborne applications. The primary contribution to the state-of-the-art of this work is the introduction of a novel pose determination method based on Convolutional Neural Networks (CNN) to provide an initial guess of the pose in real-time on-board. The method involves discretizing the pose space and training the CNN with images corresponding to the resulting pose labels. Since reliable training of the CNN requires massive image datasets and computational resources, the parameters of the CNN must be determined prior to the mission with synthetic imagery. Moreover, reliable training of the CNN requires datasets that appropriately account for noise, color, and illumination characteristics expected in orbit. Therefore, the secondary contribution of this work is the introduction of an image synthesis pipeline, which is tailored to generate high fidelity images of any spacecraft 3D model. The proposed technique is scalable to spacecraft of different structural and physical properties as well as robust to the dynamic illumination conditions of space. Through metrics measuring classification and pose accuracy, it is shown that the presented architecture has desirable robustness and scalable properties. |
Tasks | Image Generation, Pose Estimation |
Published | 2018-09-19 |
URL | http://arxiv.org/abs/1809.07238v1 |
http://arxiv.org/pdf/1809.07238v1.pdf | |
PWC | https://paperswithcode.com/paper/pose-estimation-for-non-cooperative |
Repo | |
Framework | |
Branching embedding: A heuristic dimensionality reduction algorithm based on hierarchical clustering
Title | Branching embedding: A heuristic dimensionality reduction algorithm based on hierarchical clustering |
Authors | Makito Oku |
Abstract | This paper proposes a new dimensionality reduction algorithm named branching embedding (BE). It converts a dendrogram to a two-dimensional scatter plot, and visualizes the inherent structures of the original high-dimensional data. Since the conversion part is not computationally demanding, the BE algorithm would be beneficial for the case where hierarchical clustering is already performed. Numerical experiments revealed that the outputs of the algorithm moderately preserve the original hierarchical structures. |
Tasks | Dimensionality Reduction |
Published | 2018-05-06 |
URL | http://arxiv.org/abs/1805.02161v1 |
http://arxiv.org/pdf/1805.02161v1.pdf | |
PWC | https://paperswithcode.com/paper/branching-embedding-a-heuristic |
Repo | |
Framework | |
Towards Continuous Domain adaptation for Healthcare
Title | Towards Continuous Domain adaptation for Healthcare |
Authors | Rahul Venkataramani, Hariharan Ravishankar, Saihareesh Anamandra |
Abstract | Deep learning algorithms have demonstrated tremendous success on challenging medical imaging problems. However, post-deployment, these algorithms are susceptible to data distribution variations owing to \emph{limited data issues} and \emph{diversity} in medical images. In this paper, we propose \emph{ContextNets}, a generic memory-augmented neural network framework for semantic segmentation to achieve continuous domain adaptation without the necessity of retraining. Unlike existing methods which require access to entire source and target domain images, our algorithm can adapt to a target domain with a few similar images. We condition the inference on any new input with features computed on its support set of images (and masks, if available) through contextual embeddings to achieve site-specific adaptation. We demonstrate state-of-the-art domain adaptation performance on the X-ray lung segmentation problem from three independent cohorts that differ in disease type, gender, contrast and intensity variations. |
Tasks | Domain Adaptation, Semantic Segmentation |
Published | 2018-12-04 |
URL | http://arxiv.org/abs/1812.01281v1 |
http://arxiv.org/pdf/1812.01281v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-continuous-domain-adaptation-for |
Repo | |
Framework | |
Entity Commonsense Representation for Neural Abstractive Summarization
Title | Entity Commonsense Representation for Neural Abstractive Summarization |
Authors | Reinald Kim Amplayo, Seonjae Lim, Seung-won Hwang |
Abstract | A major proportion of a text summary includes important entities found in the original text. These entities build up the topic of the summary. Moreover, they hold commonsense information once they are linked to a knowledge base. Based on these observations, this paper investigates the usage of linked entities to guide the decoder of a neural text summarizer to generate concise and better summaries. To this end, we leverage on an off-the-shelf entity linking system (ELS) to extract linked entities and propose Entity2Topic (E2T), a module easily attachable to a sequence-to-sequence model that transforms a list of entities into a vector representation of the topic of the summary. Current available ELS’s are still not sufficiently effective, possibly introducing unresolved ambiguities and irrelevant entities. We resolve the imperfections of the ELS by (a) encoding entities with selective disambiguation, and (b) pooling entity vectors using firm attention. By applying E2T to a simple sequence-to-sequence model with attention mechanism as base model, we see significant improvements of the performance in the Gigaword (sentence to title) and CNN (long document to multi-sentence highlights) summarization datasets by at least 2 ROUGE points. |
Tasks | Abstractive Text Summarization, Entity Linking |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05504v1 |
http://arxiv.org/pdf/1806.05504v1.pdf | |
PWC | https://paperswithcode.com/paper/entity-commonsense-representation-for-neural |
Repo | |
Framework | |
The Social Cost of Strategic Classification
Title | The Social Cost of Strategic Classification |
Authors | Smitha Milli, John Miller, Anca D. Dragan, Moritz Hardt |
Abstract | Consequential decision-making typically incentivizes individuals to behave strategically, tailoring their behavior to the specifics of the decision rule. A long line of work has therefore sought to counteract strategic behavior by designing more conservative decision boundaries in an effort to increase robustness to the effects of strategic covariate shift. We show that these efforts benefit the institutional decision maker at the expense of the individuals being classified. Introducing a notion of social burden, we prove that any increase in institutional utility necessarily leads to a corresponding increase in social burden. Moreover, we show that the negative externalities of strategic classification can disproportionately harm disadvantaged groups in the population. Our results highlight that strategy-robustness must be weighed against considerations of social welfare and fairness. |
Tasks | Decision Making |
Published | 2018-08-25 |
URL | http://arxiv.org/abs/1808.08460v2 |
http://arxiv.org/pdf/1808.08460v2.pdf | |
PWC | https://paperswithcode.com/paper/the-social-cost-of-strategic-classification |
Repo | |
Framework | |
Consistency and Variation in Kernel Neural Ranking Model
Title | Consistency and Variation in Kernel Neural Ranking Model |
Authors | Mary Arpita Pyreddy, Varshini Ramaseshan, Narendra Nath Joshi, Zhuyun Dai, Chenyan Xiong, Jamie Callan, Zhiyuan Liu |
Abstract | This paper studies the consistency of the kernel-based neural ranking model K-NRM, a recent state-of-the-art neural IR model, which is important for reproducible research and deployment in the industry. We find that K-NRM has low variance on relevance-based metrics across experimental trials. In spite of this low variance in overall performance, different trials produce different document rankings for individual queries. The main source of variance in our experiments was found to be different latent matching patterns captured by K-NRM. In the IR-customized word embeddings learned by K-NRM, the query-document word pairs follow two different matching patterns that are equally effective, but align word pairs differently in the embedding space. The different latent matching patterns enable a simple yet effective approach to construct ensemble rankers, which improve K-NRM’s effectiveness and generalization abilities. |
Tasks | Word Embeddings |
Published | 2018-09-27 |
URL | http://arxiv.org/abs/1809.10522v1 |
http://arxiv.org/pdf/1809.10522v1.pdf | |
PWC | https://paperswithcode.com/paper/consistency-and-variation-in-kernel-neural |
Repo | |
Framework | |
Meta Continual Learning
Title | Meta Continual Learning |
Authors | Risto Vuorio, Dong-Yeon Cho, Daejoong Kim, Jiwon Kim |
Abstract | Using neural networks in practical settings would benefit from the ability of the networks to learn new tasks throughout their lifetimes without forgetting the previous tasks. This ability is limited in the current deep neural networks by a problem called catastrophic forgetting, where training on new tasks tends to severely degrade performance on previous tasks. One way to lessen the impact of the forgetting problem is to constrain parameters that are important to previous tasks to stay close to the optimal parameters. Recently, multiple competitive approaches for computing the importance of the parameters with respect to the previous tasks have been presented. In this paper, we propose a learning to optimize algorithm for mitigating catastrophic forgetting. Instead of trying to formulate a new constraint function ourselves, we propose to train another neural network to predict parameter update steps that respect the importance of parameters to the previous tasks. In the proposed meta-training scheme, the update predictor is trained to minimize loss on a combination of current and past tasks. We show experimentally that the proposed approach works in the continual learning setting. |
Tasks | Continual Learning |
Published | 2018-06-11 |
URL | http://arxiv.org/abs/1806.06928v1 |
http://arxiv.org/pdf/1806.06928v1.pdf | |
PWC | https://paperswithcode.com/paper/meta-continual-learning |
Repo | |
Framework | |
Deep Learning with Apache SystemML
Title | Deep Learning with Apache SystemML |
Authors | Niketan Pansare, Michael Dusenberry, Nakul Jindal, Matthias Boehm, Berthold Reinwald, Prithviraj Sen |
Abstract | Enterprises operate large data lakes using Hadoop and Spark frameworks that (1) run a plethora of tools to automate powerful data preparation/transformation pipelines, (2) run on shared, large clusters to (3) perform many different analytics tasks ranging from model preparation, building, evaluation, and tuning for both machine learning and deep learning. Developing machine/deep learning models on data in such shared environments is challenging. Apache SystemML provides a unified framework for implementing machine learning and deep learning algorithms in a variety of shared deployment scenarios. SystemML’s novel compilation approach automatically generates runtime execution plans for machine/deep learning algorithms that are composed of single-node and distributed runtime operations depending on data and cluster characteristics such as data size, data sparsity, cluster size, and memory configurations, while still exploiting the capabilities of the underlying big data frameworks. |
Tasks | |
Published | 2018-02-08 |
URL | http://arxiv.org/abs/1802.04647v1 |
http://arxiv.org/pdf/1802.04647v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-with-apache-systemml |
Repo | |
Framework | |
Malware triage for early identification of Advanced Persistent Threat activities
Title | Malware triage for early identification of Advanced Persistent Threat activities |
Authors | Giuseppe Laurenza, Riccardo Lazzeretti, Luca Mazzotti |
Abstract | In the last decade, a new class of cyber-threats has emerged. This new cybersecurity adversary is known with the name of “Advanced Persistent Threat” (APT) and is referred to different organizations that in the last years have been “in the center of the eye” due to multiple dangerous and effective attacks targeting financial and politic, news headlines, embassies, critical infrastructures, TV programs, etc. In order to early identify APT related malware, a semi-automatic approach for malware samples analysis is needed. In our previous work we introduced a “malware triage” step for a semi-automatic malware analysis architecture. This step has the duty to analyze as fast as possible new incoming samples and to immediately dispatch the ones that deserve a deeper analysis, among all the malware delivered per day in the cyber-space, the ones that really worth to be further examined by analysts. Our paper focuses on malware developed by APTs, and we build our knowledge base, used in the triage, on known APTs obtained from publicly available reports. In order to have the triage as fast as possible, we only rely on static malware features, that can be extracted with negligible delay, and use machine learning techniques for the identification. In this work we move from multiclass classification to a group of oneclass classifier, which simplify the training and allows higher modularity. The results of the proposed framework highlight high performances, reaching a precision of 100% and an accuracy over 95% |
Tasks | |
Published | 2018-10-16 |
URL | http://arxiv.org/abs/1810.07321v1 |
http://arxiv.org/pdf/1810.07321v1.pdf | |
PWC | https://paperswithcode.com/paper/malware-triage-for-early-identification-of |
Repo | |
Framework | |
Stereoscopic Neural Style Transfer
Title | Stereoscopic Neural Style Transfer |
Authors | Dongdong Chen, Lu Yuan, Jing Liao, Nenghai Yu, Gang Hua |
Abstract | This paper presents the first attempt at stereoscopic neural style transfer, which responds to the emerging demand for 3D movies or AR/VR. We start with a careful examination of applying existing monocular style transfer methods to left and right views of stereoscopic images separately. This reveals that the original disparity consistency cannot be well preserved in the final stylization results, which causes 3D fatigue to the viewers. To address this issue, we incorporate a new disparity loss into the widely adopted style loss function by enforcing the bidirectional disparity constraint in non-occluded regions. For a practical real-time solution, we propose the first feed-forward network by jointly training a stylization sub-network and a disparity sub-network, and integrate them in a feature level middle domain. Our disparity sub-network is also the first end-to-end network for simultaneous bidirectional disparity and occlusion mask estimation. Finally, our network is effectively extended to stereoscopic videos, by considering both temporal coherence and disparity consistency. We will show that the proposed method clearly outperforms the baseline algorithms both quantitatively and qualitatively. |
Tasks | Style Transfer |
Published | 2018-02-28 |
URL | http://arxiv.org/abs/1802.10591v2 |
http://arxiv.org/pdf/1802.10591v2.pdf | |
PWC | https://paperswithcode.com/paper/stereoscopic-neural-style-transfer |
Repo | |
Framework | |
Learning More Robust Features with Adversarial Training
Title | Learning More Robust Features with Adversarial Training |
Authors | Shuangtao Li, Yuanke Chen, Yanlin Peng, Lin Bai |
Abstract | In recent years, it has been found that neural networks can be easily fooled by adversarial examples, which is a potential safety hazard in some safety-critical applications. Many researchers have proposed various method to make neural networks more robust to white-box adversarial attacks, but an effective method have not been found so far. In this short paper, we focus on the robustness of the features learned by neural networks. We show that the features learned by neural networks are not robust, and find that the robustness of the learned features is closely related to the resistance against adversarial examples of neural networks. We also find that adversarial training against fast gradients sign method (FGSM) does not make the leaned features very robust, even if it can make the trained networks very resistant to FGSM attack. Then we propose a method, which can be seen as an extension of adversarial training, to train neural networks to learn more robust features. We perform experiments on MNIST and CIFAR-10 to evaluate our method, and the experiment results show that this method greatly improves the robustness of the learned features and the resistance to adversarial attacks. |
Tasks | |
Published | 2018-04-20 |
URL | http://arxiv.org/abs/1804.07757v1 |
http://arxiv.org/pdf/1804.07757v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-more-robust-features-with |
Repo | |
Framework | |