February 1, 2020

3317 words 16 mins read

Paper Group AWR 278

A JIT Compiler for Neural Network Inference. Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches. Temporal Collaborative Ranking Via Personalized Transformer. QAInfomax: Learning Robust Question Answering System by Mutual Information Maximization. Light-weight Calibrator: a Separable Component for Unsupe …

A JIT Compiler for Neural Network Inference


Title	A JIT Compiler for Neural Network Inference
Authors	Felix Thielke, Arne Hasselbring
Abstract	This paper describes a C++ library that compiles neural network models at runtime into machine code that performs inference. This approach in general promises to achieve the best performance possible since it is able to integrate statically known properties of the network directly into the code. In our experiments on the NAO V6 platform, it outperforms existing implementations significantly on small networks, while being inferior on large networks. The library was already part of the B-Human code release 2018, but has been extended since and is now available as a standalone version that can be integrated into any C++14 code base.
Tasks
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05737v1
PDF	https://arxiv.org/pdf/1906.05737v1.pdf
PWC	https://paperswithcode.com/paper/a-jit-compiler-for-neural-network-inference
Repo	https://github.com/bhuman/CompiledNN
Framework	none

Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches


Title	Recent Advances in Natural Language Inference: A Survey of Benchmarks, Resources, and Approaches
Authors	Shane Storks, Qiaozi Gao, Joyce Y. Chai
Abstract	In the NLP community, recent years have seen a surge of research activities that address machines’ ability to perform deep language understanding which goes beyond what is explicitly stated in text, rather relying on reasoning and knowledge of the world. Many benchmark tasks and datasets have been created to support the development and evaluation of such natural language inference ability. As these benchmarks become instrumental and a driving force for the NLP research community, this paper aims to provide an overview of recent benchmarks, relevant knowledge resources, and state-of-the-art learning and inference approaches in order to support a better understanding of this growing field.
Tasks	Natural Language Inference
Published	2019-04-02
URL	https://arxiv.org/abs/1904.01172v3
PDF	https://arxiv.org/pdf/1904.01172v3.pdf
PWC	https://paperswithcode.com/paper/commonsense-reasoning-for-natural-language
Repo	https://github.com/shengyp/Temporal-and-Evolving-KG
Framework	none

Temporal Collaborative Ranking Via Personalized Transformer


Title	Temporal Collaborative Ranking Via Personalized Transformer
Authors	Liwei Wu, Shuqing Li, Cho-Jui Hsieh, James Sharpnack
Abstract	The collaborative ranking problem has been an important open research question as most recommendation problems can be naturally formulated as ranking problems. While much of collaborative ranking methodology assumes static ranking data, the importance of temporal information to improving ranking performance is increasingly apparent. Recent advances in deep learning, especially the discovery of various attention mechanisms and newer architectures in addition to widely used RNN and CNN in natural language processing, have allowed us to make better use of the temporal ordering of items that each user has engaged with. In particular, the SASRec model, inspired by the popular Transformer model in natural languages processing, has achieved state-of-art results in the temporal collaborative ranking problem and enjoyed more than 10x speed-up when compared to earlier CNN/RNN-based methods. However, SASRec is inherently an un-personalized model and does not include personalized user embeddings. To overcome this limitation, we propose a Personalized Transformer (SSE-PT) model, outperforming SASRec by almost 5% in terms of NDCG@10 on 5 real-world datasets. Furthermore, after examining some random users’ engagement history and corresponding attention heat maps used during the inference stage, we find our model is not only more interpretable but also able to focus on recent engagement patterns for each user. Moreover, our SSE-PT model with a slight modification, which we call SSE-PT++, can handle extremely long sequences and outperform SASRec in ranking results with comparable training speed, striking a balance between performance and speed requirements. Code and data are open sourced at https://github.com/wuliwei9278/SSE-PT.
Tasks	Collaborative Ranking
Published	2019-08-15
URL	https://arxiv.org/abs/1908.05435v1
PDF	https://arxiv.org/pdf/1908.05435v1.pdf
PWC	https://paperswithcode.com/paper/temporal-collaborative-ranking-via
Repo	https://github.com/wuliwei9278/SSE-PT
Framework	tf

QAInfomax: Learning Robust Question Answering System by Mutual Information Maximization


Title	QAInfomax: Learning Robust Question Answering System by Mutual Information Maximization
Authors	Yi-Ting Yeh, Yun-Nung Chen
Abstract	Standard accuracy metrics indicate that modern reading comprehension systems have achieved strong performance in many question answering datasets. However, the extent these systems truly understand language remains unknown, and existing systems are not good at distinguishing distractor sentences, which look related but do not actually answer the question. To address this problem, we propose QAInfomax as a regularizer in reading comprehension systems by maximizing mutual information among passages, a question, and its answer. QAInfomax helps regularize the model to not simply learn the superficial correlation for answering questions. The experiments show that our proposed QAInfomax achieves the state-of-the-art performance on the benchmark Adversarial-SQuAD dataset.
Tasks	Accuracy Metrics, Question Answering, Reading Comprehension
Published	2019-08-31
URL	https://arxiv.org/abs/1909.00215v1
PDF	https://arxiv.org/pdf/1909.00215v1.pdf
PWC	https://paperswithcode.com/paper/qainfomax-learning-robust-question-answering
Repo	https://github.com/MiuLab/QAInfomax
Framework	pytorch

Light-weight Calibrator: a Separable Component for Unsupervised Domain Adaptation


Title	Light-weight Calibrator: a Separable Component for Unsupervised Domain Adaptation
Authors	Shaokai Ye, Kailu Wu, Mu Zhou, Yunfei Yang, Sia huat Tan, Kaidi Xu, Jiebo Song, Chenglong Bao, Kaisheng Ma
Abstract	Existing domain adaptation methods aim at learning features that can be generalized among domains. These methods commonly require to update source classifier to adapt to the target domain and do not properly handle the trade off between the source domain and the target domain. In this work, instead of training a classifier to adapt to the target domain, we use a separable component called data calibrator to help the fixed source classifier recover discrimination power in the target domain, while preserving the source domain’s performance. When the difference between two domains is small, the source classifier’s representation is sufficient to perform well in the target domain and outperforms GAN-based methods in digits. Otherwise, the proposed method can leverage synthetic images generated by GANs to boost performance and achieve state-of-the-art performance in digits datasets and driving scene semantic segmentation. Our method empirically reveals that certain intriguing hints, which can be mitigated by adversarial attack to domain discriminators, are one of the sources for performance degradation under the domain shift.
Tasks	Adversarial Attack, Domain Adaptation, Semantic Segmentation, Unsupervised Domain Adaptation
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12796v2
PDF	https://arxiv.org/pdf/1911.12796v2.pdf
PWC	https://paperswithcode.com/paper/light-weight-calibrator-a-separable-component
Repo	https://github.com/yeshaokai/Calibrator-Domain-Adaptation
Framework	none

Gradient based sample selection for online continual learning


Title	Gradient based sample selection for online continual learning
Authors	Rahaf Aljundi, Min Lin, Baptiste Goujaud, Yoshua Bengio
Abstract	A continual learning agent learns online with a non-stationary and never-ending stream of data. The key to such learning process is to overcome the catastrophic forgetting of previously seen data, which is a well known problem of neural networks. To prevent forgetting, a replay buffer is usually employed to store the previous data for the purpose of rehearsal. Previous works often depend on task boundary and i.i.d. assumptions to properly select samples for the replay buffer. In this work, we formulate sample selection as a constraint reduction problem based on the constrained optimization view of continual learning. The goal is to select a fixed subset of constraints that best approximate the feasible region defined by the original constraints. We show that it is equivalent to maximizing the diversity of samples in the replay buffer with parameters gradient as the feature. We further develop a greedy alternative that is cheap and efficient. The advantage of the proposed method is demonstrated by comparing to other alternatives under the continual learning setting. Further comparisons are made against state of the art methods that rely on task boundaries which show comparable or even better results for our method.
Tasks	Continual Learning
Published	2019-03-20
URL	https://arxiv.org/abs/1903.08671v5
PDF	https://arxiv.org/pdf/1903.08671v5.pdf
PWC	https://paperswithcode.com/paper/online-continual-learning-with-no-task
Repo	https://github.com/wannabeOG/MAS-PyTorch
Framework	pytorch

Evolving and Understanding Sparse Deep Neural Networks using Cosine Similarity


Title	Evolving and Understanding Sparse Deep Neural Networks using Cosine Similarity
Authors	Joost Pieterse, Decebal Constantin Mocanu
Abstract	Training sparse neural networks with adaptive connectivity is an active research topic. Such networks require less storage and have lower computational complexity compared to their dense counterparts. The Sparse Evolutionary Training (SET) procedure uses weights magnitude to evolve efficiently the topology of a sparse network to fit the dataset, while enabling it to have quadratically less parameters than its dense counterpart. To this end, we propose a novel approach that evolves a sparse network topology based on the behavior of neurons in the network. More exactly, the cosine similarities between the activations of any two neurons are used to determine which connections are added to or removed from the network. By integrating our approach within the SET procedure, we propose 5 new algorithms to train sparse neural networks. We argue that our approach has low additional computational complexity and we draw a parallel to Hebbian learning. Experiments are performed on 8 datasets taken from various domains to demonstrate the general applicability of our approach. Even without optimizing hyperparameters for specific datasets, the experiments show that our proposed training algorithms usually outperform SET and state-of-the-art dense neural network techniques. The last but not the least, we show that the evolved connectivity patterns of the input neurons reflect their impact on the classification task.
Tasks
Published	2019-03-17
URL	http://arxiv.org/abs/1903.07138v1
PDF	http://arxiv.org/pdf/1903.07138v1.pdf
PWC	https://paperswithcode.com/paper/evolving-and-understanding-sparse-deep-neural
Repo	https://github.com/joostPieterse/CosineSET
Framework	tf

Data-Free Learning of Student Networks


Title	Data-Free Learning of Student Networks
Authors	Hanting Chen, Yunhe Wang, Chang Xu, Zhaohui Yang, Chuanjian Liu, Boxin Shi, Chunjing Xu, Chao Xu, Qi Tian
Abstract	Learning portable neural networks is very essential for computer vision for the purpose that pre-trained heavy deep models can be well applied on edge devices such as mobile phones and micro sensors. Most existing deep neural network compression and speed-up methods are very effective for training compact deep models, when we can directly access the training dataset. However, training data for the given deep network are often unavailable due to some practice problems (e.g. privacy, legal issue, and transmission), and the architecture of the given network are also unknown except some interfaces. To this end, we propose a novel framework for training efficient deep neural networks by exploiting generative adversarial networks (GANs). To be specific, the pre-trained teacher networks are regarded as a fixed discriminator and the generator is utilized for derivating training samples which can obtain the maximum response on the discriminator. Then, an efficient network with smaller model size and computational complexity is trained using the generated data and the teacher network, simultaneously. Efficient student networks learned using the proposed Data-Free Learning (DAFL) method achieve 92.22% and 74.47% accuracies using ResNet-18 without any training data on the CIFAR-10 and CIFAR-100 datasets, respectively. Meanwhile, our student network obtains an 80.56% accuracy on the CelebA benchmark.
Tasks	Neural Network Compression
Published	2019-04-02
URL	https://arxiv.org/abs/1904.01186v4
PDF	https://arxiv.org/pdf/1904.01186v4.pdf
PWC	https://paperswithcode.com/paper/data-free-learning-of-student-networks
Repo	https://github.com/huawei-noah/DAFL
Framework	pytorch

Understanding and Improving Layer Normalization


Title	Understanding and Improving Layer Normalization
Authors	Jingjing Xu, Xu Sun, Zhiyuan Zhang, Guangxiang Zhao, Junyang Lin
Abstract	Layer normalization (LayerNorm) is a technique to normalize the distributions of intermediate layers. It enables smoother gradients, faster training, and better generalization accuracy. However, it is still unclear where the effectiveness stems from. In this paper, our main contribution is to take a step further in understanding LayerNorm. Many of previous studies believe that the success of LayerNorm comes from forward normalization. Unlike them, we find that the derivatives of the mean and variance are more important than forward normalization by re-centering and re-scaling backward gradients. Furthermore, we find that the parameters of LayerNorm, including the bias and gain, increase the risk of over-fitting and do not work in most cases. Experiments show that a simple version of LayerNorm (LayerNorm-simple) without the bias and gain outperforms LayerNorm on four datasets. It obtains the state-of-the-art performance on En-Vi machine translation. To address the over-fitting problem, we propose a new normalization method, Adaptive Normalization (AdaNorm), by replacing the bias and gain with a new transformation function. Experiments show that AdaNorm demonstrates better results than LayerNorm on seven out of eight datasets.
Tasks	Machine Translation
Published	2019-11-16
URL	https://arxiv.org/abs/1911.07013v1
PDF	https://arxiv.org/pdf/1911.07013v1.pdf
PWC	https://paperswithcode.com/paper/understanding-and-improving-layer-1
Repo	https://github.com/lancopku/AdaNorm
Framework	none

On Tiny Episodic Memories in Continual Learning


Title	On Tiny Episodic Memories in Continual Learning
Authors	Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K. Dokania, Philip H. S. Torr, Marc’Aurelio Ranzato
Abstract	In continual learning (CL), an agent learns from a stream of tasks leveraging prior experience to transfer knowledge to future tasks. It is an ideal framework to decrease the amount of supervision in the existing learning algorithms. But for a successful knowledge transfer, the learner needs to remember how to perform previous tasks. One way to endow the learner the ability to perform tasks seen in the past is to store a small memory, dubbed episodic memory, that stores few examples from previous tasks and then to replay these examples when training for future tasks. In this work, we empirically analyze the effectiveness of a very small episodic memory in a CL setup where each training example is only seen once. Surprisingly, across four rather different supervised learning benchmarks adapted to CL, a very simple baseline, that jointly trains on both examples from the current task as well as examples stored in the episodic memory, significantly outperforms specifically designed CL approaches with and without episodic memory. Interestingly, we find that repetitive training on even tiny memories of past tasks does not harm generalization, on the contrary, it improves it, with gains between 7% and 17% when the memory is populated with a single example per class.
Tasks	Continual Learning, Transfer Learning
Published	2019-02-27
URL	https://arxiv.org/abs/1902.10486v4
PDF	https://arxiv.org/pdf/1902.10486v4.pdf
PWC	https://paperswithcode.com/paper/continual-learning-with-tiny-episodic
Repo	https://github.com/facebookresearch/agem
Framework	tf


Title	Adapting Sequence to Sequence models for Text Normalization in Social Media
Authors	Ismini Lourentzou, Kabir Manghnani, ChengXiang Zhai
Abstract	Social media offer an abundant source of valuable raw data, however informal writing can quickly become a bottleneck for many natural language processing (NLP) tasks. Off-the-shelf tools are usually trained on formal text and cannot explicitly handle noise found in short online posts. Moreover, the variety of frequently occurring linguistic variations presents several challenges, even for humans who might not be able to comprehend the meaning of such posts, especially when they contain slang and abbreviations. Text Normalization aims to transform online user-generated text to a canonical form. Current text normalization systems rely on string or phonetic similarity and classification models that work on a local fashion. We argue that processing contextual information is crucial for this task and introduce a social media text normalization hybrid word-character attention-based encoder-decoder model that can serve as a pre-processing step for NLP applications to adapt to noisy text in social media. Our character-based component is trained on synthetic adversarial examples that are designed to capture errors commonly found in online user-generated text. Experiments show that our model surpasses neural architectures designed for text normalization and achieves comparable performance with state-of-the-art related work.
Tasks	Lexical Normalization
Published	2019-04-12
URL	http://arxiv.org/abs/1904.06100v1
PDF	http://arxiv.org/pdf/1904.06100v1.pdf
PWC	https://paperswithcode.com/paper/adapting-sequence-to-sequence-models-for-text
Repo	https://github.com/Isminoula/TextNormSeq2Seq
Framework	pytorch

Hidden Trigger Backdoor Attacks


Title	Hidden Trigger Backdoor Attacks
Authors	Aniruddha Saha, Akshayvarun Subramanya, Hamed Pirsiavash
Abstract	With the success of deep learning algorithms in various domains, studying adversarial attacks to secure deep models in real world applications has become an important research topic. Backdoor attacks are a form of adversarial attacks on deep networks where the attacker provides poisoned data to the victim to train the model with, and then activates the attack by showing a specific small trigger pattern at the test time. Most state-of-the-art backdoor attacks either provide mislabeled poisoning data that is possible to identify by visual inspection, reveal the trigger in the poisoned data, or use noise to hide the trigger. We propose a novel form of backdoor attack where poisoned data look natural with correct labels and also more importantly, the attacker hides the trigger in the poisoned data and keeps the trigger secret until the test time. We perform an extensive study on various image classification settings and show that our attack can fool the model by pasting the trigger at random locations on unseen images although the model performs well on clean data. We also show that our proposed attack cannot be easily defended using a state-of-the-art defense algorithm for backdoor attacks.
Tasks	Image Classification
Published	2019-09-30
URL	https://arxiv.org/abs/1910.00033v2
PDF	https://arxiv.org/pdf/1910.00033v2.pdf
PWC	https://paperswithcode.com/paper/hidden-trigger-backdoor-attacks
Repo	https://github.com/UMBCvision/Hidden-Trigger-Backdoor-Attacks
Framework	pytorch

Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds


Title	Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds
Authors	Bo Yang, Jianan Wang, Ronald Clark, Qingyong Hu, Sen Wang, Andrew Markham, Niki Trigoni
Abstract	We propose a novel, conceptually simple and general framework for instance segmentation on 3D point clouds. Our method, called 3D-BoNet, follows the simple design philosophy of per-point multilayer perceptrons (MLPs). The framework directly regresses 3D bounding boxes for all instances in a point cloud, while simultaneously predicting a point-level mask for each instance. It consists of a backbone network followed by two parallel network branches for 1) bounding box regression and 2) point mask prediction. 3D-BoNet is single-stage, anchor-free and end-to-end trainable. Moreover, it is remarkably computationally efficient as, unlike existing approaches, it does not require any post-processing steps such as non-maximum suppression, feature sampling, clustering or voting. Extensive experiments show that our approach surpasses existing work on both ScanNet and S3DIS datasets while being approximately 10x more computationally efficient. Comprehensive ablation studies demonstrate the effectiveness of our design.
Tasks	3D Instance Segmentation, Instance Segmentation, Semantic Segmentation
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01140v2
PDF	https://arxiv.org/pdf/1906.01140v2.pdf
PWC	https://paperswithcode.com/paper/learning-object-bounding-boxes-for-3d
Repo	https://github.com/QingyongHu/Benchmark_results_3D_point_cloud
Framework	none

DARTS: DenseUnet-based Automatic Rapid Tool for brain Segmentation


Title	DARTS: DenseUnet-based Automatic Rapid Tool for brain Segmentation
Authors	Aakash Kaku, Chaitra V. Hegde, Jeffrey Huang, Sohae Chung, Xiuyuan Wang, Matthew Young, Alireza Radmanesh, Yvonne W. Lui, Narges Razavian
Abstract	Quantitative, volumetric analysis of Magnetic Resonance Imaging (MRI) is a fundamental way researchers study the brain in a host of neurological conditions including normal maturation and aging. Despite the availability of open-source brain segmentation software, widespread clinical adoption of volumetric analysis has been hindered due to processing times and reliance on manual corrections. Here, we extend the use of deep learning models from proof-of-concept, as previously reported, to present a comprehensive segmentation of cortical and deep gray matter brain structures matching the standard regions of aseg+aparc included in the commonly used open-source tool, Freesurfer. The work presented here provides a real-life, rapid deep learning-based brain segmentation tool to enable clinical translation as well as research application of quantitative brain segmentation. The advantages of the presented tool include short (~1 minute) processing time and improved segmentation quality. This is the first study to perform quick and accurate segmentation of 102 brain regions based on the surface-based protocol (DMK protocol), widely used by experts in the field. This is also the first work to include an expert reader study to assess the quality of the segmentation obtained using a deep-learning-based model. We show the superior performance of our deep-learning-based models over the traditional segmentation tool, Freesurfer. We refer to the proposed deep learning-based tool as DARTS (DenseUnet-based Automatic Rapid Tool for brain Segmentation). Our tool and trained models are available at https://github.com/NYUMedML/DARTS
Tasks	Brain Segmentation
Published	2019-11-13
URL	https://arxiv.org/abs/1911.05567v2
PDF	https://arxiv.org/pdf/1911.05567v2.pdf
PWC	https://paperswithcode.com/paper/darts-denseunet-based-automatic-rapid-tool
Repo	https://github.com/NYUMedML/DARTS
Framework	pytorch

Learning and Interpreting Potentials for Classical Hamiltonian Systems


Title	Learning and Interpreting Potentials for Classical Hamiltonian Systems
Authors	Harish S. Bhat
Abstract	We consider the problem of learning an interpretable potential energy function from a Hamiltonian system’s trajectories. We address this problem for classical, separable Hamiltonian systems. Our approach first constructs a neural network model of the potential and then applies an equation discovery technique to extract from the neural potential a closed-form algebraic expression. We demonstrate this approach for several systems, including oscillators, a central force problem, and a problem of two charged particles in a classical Coulomb potential. Through these test problems, we show close agreement between learned neural potentials, the interpreted potentials we obtain after training, and the ground truth. In particular, for the central force problem, we show that our approach learns the correct effective potential, a reduced-order model of the system.
Tasks
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11806v1
PDF	https://arxiv.org/pdf/1907.11806v1.pdf
PWC	https://paperswithcode.com/paper/learning-and-interpreting-potentials-for
Repo	https://github.com/hbhat4000/learningpotentials
Framework	none