February 1, 2020

3317 words 16 mins read

Paper Group AWR 278

Paper Group AWR 278

A JIT Compiler for Neural Network Inference. Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches. Temporal Collaborative Ranking Via Personalized Transformer. QAInfomax: Learning Robust Question Answering System by Mutual Information Maximization. Light-weight Calibrator: a Separable Component for Unsupe …

A JIT Compiler for Neural Network Inference

Title A JIT Compiler for Neural Network Inference
Authors Felix Thielke, Arne Hasselbring
Abstract This paper describes a C++ library that compiles neural network models at runtime into machine code that performs inference. This approach in general promises to achieve the best performance possible since it is able to integrate statically known properties of the network directly into the code. In our experiments on the NAO V6 platform, it outperforms existing implementations significantly on small networks, while being inferior on large networks. The library was already part of the B-Human code release 2018, but has been extended since and is now available as a standalone version that can be integrated into any C++14 code base.
Tasks
Published 2019-06-13
URL https://arxiv.org/abs/1906.05737v1
PDF https://arxiv.org/pdf/1906.05737v1.pdf
PWC https://paperswithcode.com/paper/a-jit-compiler-for-neural-network-inference
Repo https://github.com/bhuman/CompiledNN
Framework none

Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches

Title Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches
Authors Shane Storks, Qiaozi Gao, Joyce Y. Chai
Abstract In the NLP community, recent years have seen a surge of research activities that address machines’ ability to perform deep language understanding which goes beyond what is explicitly stated in text, rather relying on reasoning and knowledge of the world. Many benchmark tasks and datasets have been created to support the development and evaluation of such natural language inference ability. As these benchmarks become instrumental and a driving force for the NLP research community, this paper aims to provide an overview of recent benchmarks, relevant knowledge resources, and state-of-the-art learning and inference approaches in order to support a better understanding of this growing field.
Tasks Natural Language Inference
Published 2019-04-02
URL https://arxiv.org/abs/1904.01172v3
PDF https://arxiv.org/pdf/1904.01172v3.pdf
PWC https://paperswithcode.com/paper/commonsense-reasoning-for-natural-language
Repo https://github.com/shengyp/Temporal-and-Evolving-KG
Framework none

Temporal Collaborative Ranking Via Personalized Transformer

Title Temporal Collaborative Ranking Via Personalized Transformer
Authors Liwei Wu, Shuqing Li, Cho-Jui Hsieh, James Sharpnack
Abstract The collaborative ranking problem has been an important open research question as most recommendation problems can be naturally formulated as ranking problems. While much of collaborative ranking methodology assumes static ranking data, the importance of temporal information to improving ranking performance is increasingly apparent. Recent advances in deep learning, especially the discovery of various attention mechanisms and newer architectures in addition to widely used RNN and CNN in natural language processing, have allowed us to make better use of the temporal ordering of items that each user has engaged with. In particular, the SASRec model, inspired by the popular Transformer model in natural languages processing, has achieved state-of-art results in the temporal collaborative ranking problem and enjoyed more than 10x speed-up when compared to earlier CNN/RNN-based methods. However, SASRec is inherently an un-personalized model and does not include personalized user embeddings. To overcome this limitation, we propose a Personalized Transformer (SSE-PT) model, outperforming SASRec by almost 5% in terms of NDCG@10 on 5 real-world datasets. Furthermore, after examining some random users’ engagement history and corresponding attention heat maps used during the inference stage, we find our model is not only more interpretable but also able to focus on recent engagement patterns for each user. Moreover, our SSE-PT model with a slight modification, which we call SSE-PT++, can handle extremely long sequences and outperform SASRec in ranking results with comparable training speed, striking a balance between performance and speed requirements. Code and data are open sourced at https://github.com/wuliwei9278/SSE-PT.
Tasks Collaborative Ranking
Published 2019-08-15
URL https://arxiv.org/abs/1908.05435v1
PDF https://arxiv.org/pdf/1908.05435v1.pdf
PWC https://paperswithcode.com/paper/temporal-collaborative-ranking-via
Repo https://github.com/wuliwei9278/SSE-PT
Framework tf

QAInfomax: Learning Robust Question Answering System by Mutual Information Maximization

Title QAInfomax: Learning Robust Question Answering System by Mutual Information Maximization
Authors Yi-Ting Yeh, Yun-Nung Chen
Abstract Standard accuracy metrics indicate that modern reading comprehension systems have achieved strong performance in many question answering datasets. However, the extent these systems truly understand language remains unknown, and existing systems are not good at distinguishing distractor sentences, which look related but do not actually answer the question. To address this problem, we propose QAInfomax as a regularizer in reading comprehension systems by maximizing mutual information among passages, a question, and its answer. QAInfomax helps regularize the model to not simply learn the superficial correlation for answering questions. The experiments show that our proposed QAInfomax achieves the state-of-the-art performance on the benchmark Adversarial-SQuAD dataset.
Tasks Accuracy Metrics, Question Answering, Reading Comprehension
Published 2019-08-31
URL https://arxiv.org/abs/1909.00215v1
PDF https://arxiv.org/pdf/1909.00215v1.pdf
PWC https://paperswithcode.com/paper/qainfomax-learning-robust-question-answering
Repo https://github.com/MiuLab/QAInfomax
Framework pytorch

Light-weight Calibrator: a Separable Component for Unsupervised Domain Adaptation

Title Light-weight Calibrator: a Separable Component for Unsupervised Domain Adaptation
Authors Shaokai Ye, Kailu Wu, Mu Zhou, Yunfei Yang, Sia huat Tan, Kaidi Xu, Jiebo Song, Chenglong Bao, Kaisheng Ma
Abstract Existing domain adaptation methods aim at learning features that can be generalized among domains. These methods commonly require to update source classifier to adapt to the target domain and do not properly handle the trade off between the source domain and the target domain. In this work, instead of training a classifier to adapt to the target domain, we use a separable component called data calibrator to help the fixed source classifier recover discrimination power in the target domain, while preserving the source domain’s performance. When the difference between two domains is small, the source classifier’s representation is sufficient to perform well in the target domain and outperforms GAN-based methods in digits. Otherwise, the proposed method can leverage synthetic images generated by GANs to boost performance and achieve state-of-the-art performance in digits datasets and driving scene semantic segmentation. Our method empirically reveals that certain intriguing hints, which can be mitigated by adversarial attack to domain discriminators, are one of the sources for performance degradation under the domain shift.
Tasks Adversarial Attack, Domain Adaptation, Semantic Segmentation, Unsupervised Domain Adaptation
Published 2019-11-28
URL https://arxiv.org/abs/1911.12796v2
PDF https://arxiv.org/pdf/1911.12796v2.pdf
PWC https://paperswithcode.com/paper/light-weight-calibrator-a-separable-component
Repo https://github.com/yeshaokai/Calibrator-Domain-Adaptation
Framework none

Gradient based sample selection for online continual learning

Title Gradient based sample selection for online continual learning
Authors Rahaf Aljundi, Min Lin, Baptiste Goujaud, Yoshua Bengio
Abstract A continual learning agent learns online with a non-stationary and never-ending stream of data. The key to such learning process is to overcome the catastrophic forgetting of previously seen data, which is a well known problem of neural networks. To prevent forgetting, a replay buffer is usually employed to store the previous data for the purpose of rehearsal. Previous works often depend on task boundary and i.i.d. assumptions to properly select samples for the replay buffer. In this work, we formulate sample selection as a constraint reduction problem based on the constrained optimization view of continual learning. The goal is to select a fixed subset of constraints that best approximate the feasible region defined by the original constraints. We show that it is equivalent to maximizing the diversity of samples in the replay buffer with parameters gradient as the feature. We further develop a greedy alternative that is cheap and efficient. The advantage of the proposed method is demonstrated by comparing to other alternatives under the continual learning setting. Further comparisons are made against state of the art methods that rely on task boundaries which show comparable or even better results for our method.
Tasks Continual Learning
Published 2019-03-20
URL https://arxiv.org/abs/1903.08671v5
PDF https://arxiv.org/pdf/1903.08671v5.pdf
PWC https://paperswithcode.com/paper/online-continual-learning-with-no-task
Repo https://github.com/wannabeOG/MAS-PyTorch
Framework pytorch

Evolving and Understanding Sparse Deep Neural Networks using Cosine Similarity

Title Evolving and Understanding Sparse Deep Neural Networks using Cosine Similarity
Authors Joost Pieterse, Decebal Constantin Mocanu
Abstract Training sparse neural networks with adaptive connectivity is an active research topic. Such networks require less storage and have lower computational complexity compared to their dense counterparts. The Sparse Evolutionary Training (SET) procedure uses weights magnitude to evolve efficiently the topology of a sparse network to fit the dataset, while enabling it to have quadratically less parameters than its dense counterpart. To this end, we propose a novel approach that evolves a sparse network topology based on the behavior of neurons in the network. More exactly, the cosine similarities between the activations of any two neurons are used to determine which connections are added to or removed from the network. By integrating our approach within the SET procedure, we propose 5 new algorithms to train sparse neural networks. We argue that our approach has low additional computational complexity and we draw a parallel to Hebbian learning. Experiments are performed on 8 datasets taken from various domains to demonstrate the general applicability of our approach. Even without optimizing hyperparameters for specific datasets, the experiments show that our proposed training algorithms usually outperform SET and state-of-the-art dense neural network techniques. The last but not the least, we show that the evolved connectivity patterns of the input neurons reflect their impact on the classification task.
Tasks
Published 2019-03-17
URL http://arxiv.org/abs/1903.07138v1
PDF http://arxiv.org/pdf/1903.07138v1.pdf
PWC https://paperswithcode.com/paper/evolving-and-understanding-sparse-deep-neural
Repo https://github.com/joostPieterse/CosineSET
Framework tf

Data-Free Learning of Student Networks

Title Data-Free Learning of Student Networks
Authors Hanting Chen, Yunhe Wang, Chang Xu, Zhaohui Yang, Chuanjian Liu, Boxin Shi, Chunjing Xu, Chao Xu, Qi Tian
Abstract Learning portable neural networks is very essential for computer vision for the purpose that pre-trained heavy deep models can be well applied on edge devices such as mobile phones and micro sensors. Most existing deep neural network compression and speed-up methods are very effective for training compact deep models, when we can directly access the training dataset. However, training data for the given deep network are often unavailable due to some practice problems (e.g. privacy, legal issue, and transmission), and the architecture of the given network are also unknown except some interfaces. To this end, we propose a novel framework for training efficient deep neural networks by exploiting generative adversarial networks (GANs). To be specific, the pre-trained teacher networks are regarded as a fixed discriminator and the generator is utilized for derivating training samples which can obtain the maximum response on the discriminator. Then, an efficient network with smaller model size and computational complexity is trained using the generated data and the teacher network, simultaneously. Efficient student networks learned using the proposed Data-Free Learning (DAFL) method achieve 92.22% and 74.47% accuracies using ResNet-18 without any training data on the CIFAR-10 and CIFAR-100 datasets, respectively. Meanwhile, our student network obtains an 80.56% accuracy on the CelebA benchmark.
Tasks Neural Network Compression
Published 2019-04-02
URL https://arxiv.org/abs/1904.01186v4
PDF https://arxiv.org/pdf/1904.01186v4.pdf
PWC https://paperswithcode.com/paper/data-free-learning-of-student-networks
Repo https://github.com/huawei-noah/DAFL
Framework pytorch

Understanding and Improving Layer Normalization

Title Understanding and Improving Layer Normalization
Authors Jingjing Xu, Xu Sun, Zhiyuan Zhang, Guangxiang Zhao, Junyang Lin
Abstract Layer normalization (LayerNorm) is a technique to normalize the distributions of intermediate layers. It enables smoother gradients, faster training, and better generalization accuracy. However, it is still unclear where the effectiveness stems from. In this paper, our main contribution is to take a step further in understanding LayerNorm. Many of previous studies believe that the success of LayerNorm comes from forward normalization. Unlike them, we find that the derivatives of the mean and variance are more important than forward normalization by re-centering and re-scaling backward gradients. Furthermore, we find that the parameters of LayerNorm, including the bias and gain, increase the risk of over-fitting and do not work in most cases. Experiments show that a simple version of LayerNorm (LayerNorm-simple) without the bias and gain outperforms LayerNorm on four datasets. It obtains the state-of-the-art performance on En-Vi machine translation. To address the over-fitting problem, we propose a new normalization method, Adaptive Normalization (AdaNorm), by replacing the bias and gain with a new transformation function. Experiments show that AdaNorm demonstrates better results than LayerNorm on seven out of eight datasets.
Tasks Machine Translation
Published 2019-11-16
URL https://arxiv.org/abs/1911.07013v1
PDF https://arxiv.org/pdf/1911.07013v1.pdf
PWC https://paperswithcode.com/paper/understanding-and-improving-layer-1
Repo https://github.com/lancopku/AdaNorm
Framework none

On Tiny Episodic Memories in Continual Learning

Title On Tiny Episodic Memories in Continual Learning
Authors Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K. Dokania, Philip H. S. Torr, Marc’Aurelio Ranzato
Abstract In continual learning (CL), an agent learns from a stream of tasks leveraging prior experience to transfer knowledge to future tasks. It is an ideal framework to decrease the amount of supervision in the existing learning algorithms. But for a successful knowledge transfer, the learner needs to remember how to perform previous tasks. One way to endow the learner the ability to perform tasks seen in the past is to store a small memory, dubbed episodic memory, that stores few examples from previous tasks and then to replay these examples when training for future tasks. In this work, we empirically analyze the effectiveness of a very small episodic memory in a CL setup where each training example is only seen once. Surprisingly, across four rather different supervised learning benchmarks adapted to CL, a very simple baseline, that jointly trains on both examples from the current task as well as examples stored in the episodic memory, significantly outperforms specifically designed CL approaches with and without episodic memory. Interestingly, we find that repetitive training on even tiny memories of past tasks does not harm generalization, on the contrary, it improves it, with gains between 7% and 17% when the memory is populated with a single example per class.
Tasks Continual Learning, Transfer Learning
Published 2019-02-27
URL https://arxiv.org/abs/1902.10486v4
PDF https://arxiv.org/pdf/1902.10486v4.pdf
PWC https://paperswithcode.com/paper/continual-learning-with-tiny-episodic
Repo https://github.com/facebookresearch/agem
Framework tf

Adapting Sequence to Sequence models for Text Normalization in Social Media

Title Adapting Sequence to Sequence models for Text Normalization in Social Media
Authors Ismini Lourentzou, Kabir Manghnani, ChengXiang Zhai
Abstract Social media offer an abundant source of valuable raw data, however informal writing can quickly become a bottleneck for many natural language processing (NLP) tasks. Off-the-shelf tools are usually trained on formal text and cannot explicitly handle noise found in short online posts. Moreover, the variety of frequently occurring linguistic variations presents several challenges, even for humans who might not be able to comprehend the meaning of such posts, especially when they contain slang and abbreviations. Text Normalization aims to transform online user-generated text to a canonical form. Current text normalization systems rely on string or phonetic similarity and classification models that work on a local fashion. We argue that processing contextual information is crucial for this task and introduce a social media text normalization hybrid word-character attention-based encoder-decoder model that can serve as a pre-processing step for NLP applications to adapt to noisy text in social media. Our character-based component is trained on synthetic adversarial examples that are designed to capture errors commonly found in online user-generated text. Experiments show that our model surpasses neural architectures designed for text normalization and achieves comparable performance with state-of-the-art related work.
Tasks Lexical Normalization
Published 2019-04-12
URL http://arxiv.org/abs/1904.06100v1
PDF http://arxiv.org/pdf/1904.06100v1.pdf
PWC https://paperswithcode.com/paper/adapting-sequence-to-sequence-models-for-text
Repo https://github.com/Isminoula/TextNormSeq2Seq
Framework pytorch

Hidden Trigger Backdoor Attacks

Title Hidden Trigger Backdoor Attacks
Authors Aniruddha Saha, Akshayvarun Subramanya, Hamed Pirsiavash
Abstract With the success of deep learning algorithms in various domains, studying adversarial attacks to secure deep models in real world applications has become an important research topic. Backdoor attacks are a form of adversarial attacks on deep networks where the attacker provides poisoned data to the victim to train the model with, and then activates the attack by showing a specific small trigger pattern at the test time. Most state-of-the-art backdoor attacks either provide mislabeled poisoning data that is possible to identify by visual inspection, reveal the trigger in the poisoned data, or use noise to hide the trigger. We propose a novel form of backdoor attack where poisoned data look natural with correct labels and also more importantly, the attacker hides the trigger in the poisoned data and keeps the trigger secret until the test time. We perform an extensive study on various image classification settings and show that our attack can fool the model by pasting the trigger at random locations on unseen images although the model performs well on clean data. We also show that our proposed attack cannot be easily defended using a state-of-the-art defense algorithm for backdoor attacks.
Tasks Image Classification
Published 2019-09-30
URL https://arxiv.org/abs/1910.00033v2
PDF https://arxiv.org/pdf/1910.00033v2.pdf
PWC https://paperswithcode.com/paper/hidden-trigger-backdoor-attacks
Repo https://github.com/UMBCvision/Hidden-Trigger-Backdoor-Attacks
Framework pytorch

Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds

Title Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds
Authors Bo Yang, Jianan Wang, Ronald Clark, Qingyong Hu, Sen Wang, Andrew Markham, Niki Trigoni
Abstract We propose a novel, conceptually simple and general framework for instance segmentation on 3D point clouds. Our method, called 3D-BoNet, follows the simple design philosophy of per-point multilayer perceptrons (MLPs). The framework directly regresses 3D bounding boxes for all instances in a point cloud, while simultaneously predicting a point-level mask for each instance. It consists of a backbone network followed by two parallel network branches for 1) bounding box regression and 2) point mask prediction. 3D-BoNet is single-stage, anchor-free and end-to-end trainable. Moreover, it is remarkably computationally efficient as, unlike existing approaches, it does not require any post-processing steps such as non-maximum suppression, feature sampling, clustering or voting. Extensive experiments show that our approach surpasses existing work on both ScanNet and S3DIS datasets while being approximately 10x more computationally efficient. Comprehensive ablation studies demonstrate the effectiveness of our design.
Tasks 3D Instance Segmentation, Instance Segmentation, Semantic Segmentation
Published 2019-06-04
URL https://arxiv.org/abs/1906.01140v2
PDF https://arxiv.org/pdf/1906.01140v2.pdf
PWC https://paperswithcode.com/paper/learning-object-bounding-boxes-for-3d
Repo https://github.com/QingyongHu/Benchmark_results_3D_point_cloud
Framework none

DARTS: DenseUnet-based Automatic Rapid Tool for brain Segmentation

Title DARTS: DenseUnet-based Automatic Rapid Tool for brain Segmentation
Authors Aakash Kaku, Chaitra V. Hegde, Jeffrey Huang, Sohae Chung, Xiuyuan Wang, Matthew Young, Alireza Radmanesh, Yvonne W. Lui, Narges Razavian
Abstract Quantitative, volumetric analysis of Magnetic Resonance Imaging (MRI) is a fundamental way researchers study the brain in a host of neurological conditions including normal maturation and aging. Despite the availability of open-source brain segmentation software, widespread clinical adoption of volumetric analysis has been hindered due to processing times and reliance on manual corrections. Here, we extend the use of deep learning models from proof-of-concept, as previously reported, to present a comprehensive segmentation of cortical and deep gray matter brain structures matching the standard regions of aseg+aparc included in the commonly used open-source tool, Freesurfer. The work presented here provides a real-life, rapid deep learning-based brain segmentation tool to enable clinical translation as well as research application of quantitative brain segmentation. The advantages of the presented tool include short (~1 minute) processing time and improved segmentation quality. This is the first study to perform quick and accurate segmentation of 102 brain regions based on the surface-based protocol (DMK protocol), widely used by experts in the field. This is also the first work to include an expert reader study to assess the quality of the segmentation obtained using a deep-learning-based model. We show the superior performance of our deep-learning-based models over the traditional segmentation tool, Freesurfer. We refer to the proposed deep learning-based tool as DARTS (DenseUnet-based Automatic Rapid Tool for brain Segmentation). Our tool and trained models are available at https://github.com/NYUMedML/DARTS
Tasks Brain Segmentation
Published 2019-11-13
URL https://arxiv.org/abs/1911.05567v2
PDF https://arxiv.org/pdf/1911.05567v2.pdf
PWC https://paperswithcode.com/paper/darts-denseunet-based-automatic-rapid-tool
Repo https://github.com/NYUMedML/DARTS
Framework pytorch

Learning and Interpreting Potentials for Classical Hamiltonian Systems

Title Learning and Interpreting Potentials for Classical Hamiltonian Systems
Authors Harish S. Bhat
Abstract We consider the problem of learning an interpretable potential energy function from a Hamiltonian system’s trajectories. We address this problem for classical, separable Hamiltonian systems. Our approach first constructs a neural network model of the potential and then applies an equation discovery technique to extract from the neural potential a closed-form algebraic expression. We demonstrate this approach for several systems, including oscillators, a central force problem, and a problem of two charged particles in a classical Coulomb potential. Through these test problems, we show close agreement between learned neural potentials, the interpreted potentials we obtain after training, and the ground truth. In particular, for the central force problem, we show that our approach learns the correct effective potential, a reduced-order model of the system.
Tasks
Published 2019-07-26
URL https://arxiv.org/abs/1907.11806v1
PDF https://arxiv.org/pdf/1907.11806v1.pdf
PWC https://paperswithcode.com/paper/learning-and-interpreting-potentials-for
Repo https://github.com/hbhat4000/learningpotentials
Framework none
comments powered by Disqus