February 1, 2020

3437 words 17 mins read

Paper Group AWR 204

DeepFreak: Learning Crystallography Diffraction Patterns with Automated Machine Learning. GENDIS: GENetic DIscovery of Shapelets. Effectiveness of Self Normalizing Neural Networks for Text Classification. ILS-SUMM: Iterated Local Search for Unsupervised Video Summarization. Omni-Scale Feature Learning for Person Re-Identification. Deep Visual City …

DeepFreak: Learning Crystallography Diffraction Patterns with Automated Machine Learning


Title	DeepFreak: Learning Crystallography Diffraction Patterns with Automated Machine Learning
Authors	Artur Souza, Leonardo B. Oliveira, Sabine Hollatz, Matt Feldman, Kunle Olukotun, James M. Holton, Aina E. Cohen, Luigi Nardi
Abstract	Serial crystallography is the field of science that studies the structure and properties of crystals via diffraction patterns. In this paper, we introduce a new serial crystallography dataset comprised of real and synthetic images; the synthetic images are generated through the use of a simulator that is both scalable and accurate. The resulting dataset is called DiffraNet, and it is composed of 25,457 512x512 grayscale labeled images. We explore several computer vision approaches for classification on DiffraNet such as standard feature extraction algorithms associated with Random Forests and Support Vector Machines but also an end-to-end CNN topology dubbed DeepFreak tailored to work on this new dataset. All implementations are publicly available and have been fine-tuned using off-the-shelf AutoML optimization tools for a fair comparison. Our best model achieves 98.5% accuracy on synthetic images and 94.51% accuracy on real images. We believe that the DiffraNet dataset and its classification methods will have in the long term a positive impact in accelerating discoveries in many disciplines, including chemistry, geology, biology, materials science, metallurgy, and physics.
Tasks	AutoML
Published	2019-04-26
URL	https://arxiv.org/abs/1904.11834v2
PDF	https://arxiv.org/pdf/1904.11834v2.pdf
PWC	https://paperswithcode.com/paper/deepfreak-learning-crystallography
Repo	https://github.com/arturluis/diffranet
Framework	pytorch

GENDIS: GENetic DIscovery of Shapelets


Title	GENDIS: GENetic DIscovery of Shapelets
Authors	Gilles Vandewiele, Femke Ongenae, Filip De Turck
Abstract	In the time series classification domain, shapelets are small time series that are discriminative for a certain class. It has been shown that classifiers are able to achieve state-of-the-art results on a plethora of datasets by taking as input distances from the input time series to different discriminative shapelets. Additionally, these shapelets can easily be visualized and thus possess an interpretable characteristic, making them very appealing in critical domains, such as the health care domain, where longitudinal data is ubiquitous. In this study, a new paradigm for shapelet discovery is proposed, which is based upon evolutionary computation. The advantages of the proposed approach are that (i) it is gradient-free, which could allow to escape from local optima more easily and to find suited candidates more easily and supports non-differentiable objectives, (ii) no brute-force search is required, which drastically reduces the computational complexity by several orders of magnitude, (iii) the total amount of shapelets and length of each of these shapelets are evolved jointly with the shapelets themselves, alleviating the need to specify this beforehand, (iv) entire sets are evaluated at once as opposed to single shapelets, which results in smaller final sets with less similar shapelets that result in similar predictive performances, and (v) discovered shapelets do not need to be a subsequence of the input time series. We present the results of experiments which validate the enumerated advantages.
Tasks	Outlier Detection, Time Series, Time Series Classification
Published	2019-09-13
URL	https://arxiv.org/abs/1910.12948v1
PDF	https://arxiv.org/pdf/1910.12948v1.pdf
PWC	https://paperswithcode.com/paper/gendis-genetic-discovery-of-shapelets
Repo	https://github.com/IBCNServices/GENDIS
Framework	none

Effectiveness of Self Normalizing Neural Networks for Text Classification


Title	Effectiveness of Self Normalizing Neural Networks for Text Classification
Authors	Avinash Madasu, Vijjini Anvesh Rao
Abstract	Self Normalizing Neural Networks(SNN) proposed on Feed Forward Neural Networks(FNN) outperform regular FNN architectures in various machine learning tasks. Particularly in the domain of Computer Vision, the activation function Scaled Exponential Linear Units (SELU) proposed for SNNs, perform better than other non linear activations such as ReLU. The goal of SNN is to produce a normalized output for a normalized input. Established neural network architectures like feed forward networks and Convolutional Neural Networks(CNN) lack the intrinsic nature of normalizing outputs. Hence, requiring additional layers such as Batch Normalization. Despite the success of SNNs, their characteristic features on other network architectures like CNN haven’t been explored, especially in the domain of Natural Language Processing. In this paper we aim to show the effectiveness of proposed, Self Normalizing Convolutional Neural Networks(SCNN) on text classification. We analyze their performance with the standard CNN architecture used on several text classification datasets. Our experiments demonstrate that SCNN achieves comparable results to standard CNN model with significantly fewer parameters. Furthermore it also outperforms CNN with equal number of parameters.
Tasks	Text Classification
Published	2019-05-03
URL	https://arxiv.org/abs/1905.01338v1
PDF	https://arxiv.org/pdf/1905.01338v1.pdf
PWC	https://paperswithcode.com/paper/effectiveness-of-self-normalizing-neural
Repo	https://github.com/Chrisackerman1/Self-Normalizing-Neural-Networks-for-Text-Classification
Framework	none

ILS-SUMM: Iterated Local Search for Unsupervised Video Summarization


Title	ILS-SUMM: Iterated Local Search for Unsupervised Video Summarization
Authors	Yair Shemer, Daniel Rotman, Nahum Shimkin
Abstract	In recent years, there has been an increasing interest in building video summarization tools, where the goal is to automatically create a short summary of an input video that properly represents the original content. We consider shot-based video summarization where the summary consists of a subset of the video shots which can be of various lengths. A straightforward approach to maximize the representativeness of a subset of shots is by minimizing the total distance between shots and their nearest selected shots. We formulate the task of video summarization as an optimization problem with a knapsack-like constraint on the total summary duration. Previous studies have proposed greedy algorithms to solve this problem approximately, but no experiments were presented to measure the ability of these methods to obtain solutions with low total distance. Indeed, our experiments on video summarization datasets show that the success of current methods in obtaining results with low total distance still has much room for improvement. In this paper, we develop ILS-SUMM, a novel video summarization algorithm to solve the subset selection problem under the knapsack constraint. Our algorithm is based on the well-known metaheuristic optimization framework – Iterated Local Search (ILS), known for its ability to avoid weak local minima and obtain a good near-global minimum. Extensive experiments show that our method finds solutions with significantly better total distance than previous methods. Moreover, to indicate the high scalability of ILS-SUMM, we introduce a new dataset consisting of videos of various lengths.
Tasks	Unsupervised Video Summarization, Video Summarization
Published	2019-12-08
URL	https://arxiv.org/abs/1912.03650v1
PDF	https://arxiv.org/pdf/1912.03650v1.pdf
PWC	https://paperswithcode.com/paper/ils-summ-iterated-local-search-for
Repo	https://github.com/YairShemer/ILS-SUMM
Framework	none

Omni-Scale Feature Learning for Person Re-Identification


Title	Omni-Scale Feature Learning for Person Re-Identification
Authors	Kaiyang Zhou, Yongxin Yang, Andrea Cavallaro, Tao Xiang
Abstract	As an instance-level recognition problem, person re-identification (ReID) relies on discriminative features, which not only capture different spatial scales but also encapsulate an arbitrary combination of multiple scales. We call features of both homogeneous and heterogeneous scales omni-scale features. In this paper, a novel deep ReID CNN is designed, termed Omni-Scale Network (OSNet), for omni-scale feature learning. This is achieved by designing a residual block composed of multiple convolutional streams, each detecting features at a certain scale. Importantly, a novel unified aggregation gate is introduced to dynamically fuse multi-scale features with input-dependent channel-wise weights. To efficiently learn spatial-channel correlations and avoid overfitting, the building block uses pointwise and depthwise convolutions. By stacking such block layer-by-layer, our OSNet is extremely lightweight and can be trained from scratch on existing ReID benchmarks. Despite its small model size, OSNet achieves state-of-the-art performance on six person ReID datasets, outperforming most large-sized models, often by a clear margin. Code and models are available at: \url{https://github.com/KaiyangZhou/deep-person-reid}.
Tasks	Person Re-Identification
Published	2019-05-02
URL	https://arxiv.org/abs/1905.00953v6
PDF	https://arxiv.org/pdf/1905.00953v6.pdf
PWC	https://paperswithcode.com/paper/omni-scale-feature-learning-for-person-re
Repo	https://github.com/KaiyangZhou/deep-person-reid
Framework	pytorch

Deep Visual City Recognition Visualization


Title	Deep Visual City Recognition Visualization
Authors	Xiangwei Shi, Seyran Khademi, Jan van Gemert
Abstract	Understanding how cities visually differ from each others is interesting for planners, residents, and historians. We investigate the interpretation of deep features learned by convolutional neural networks (CNNs) for city recognition. Given a trained city recognition network, we first generate weighted masks using the known Grad-CAM technique and to select the most discriminate regions in the image. Since the image classification label is the city name, it contains no information of objects that are class-discriminate, we investigate the interpretability of deep representations with two methods. (i) Unsupervised method is used to cluster the objects appearing in the visual explanations. (ii) A pretrained semantic segmentation model is used to label objects in pixel level, and then we introduce statistical measures to quantitatively evaluate the interpretability of discriminate objects. The influence of network architectures and random initializations in training, is studied on the interpretability of CNN features for city recognition. The results suggest that network architectures would affect the interpretability of learned visual representations greater than different initializations.
Tasks	Image Classification, Semantic Segmentation
Published	2019-05-06
URL	https://arxiv.org/abs/1905.01932v1
PDF	https://arxiv.org/pdf/1905.01932v1.pdf
PWC	https://paperswithcode.com/paper/deep-visual-city-recognition-visualization
Repo	https://github.com/seyrankhademi/CIPA2019_Workshop
Framework	none

How to Write Summaries with Patterns? Learning towards Abstractive Summarization through Prototype Editing


Title	How to Write Summaries with Patterns? Learning towards Abstractive Summarization through Prototype Editing
Authors	Shen Gao, Xiuying Chen, Piji Li, Zhangming Chan, Dongyan Zhao, Rui Yan
Abstract	Under special circumstances, summaries should conform to a particular style with patterns, such as court judgments and abstracts in academic papers. To this end, the prototype document-summary pairs can be utilized to generate better summaries. There are two main challenges in this task: (1) the model needs to incorporate learned patterns from the prototype, but (2) should avoid copying contents other than the patternized words—such as irrelevant facts—into the generated summaries. To tackle these challenges, we design a model named Prototype Editing based Summary Generator (PESG). PESG first learns summary patterns and prototype facts by analyzing the correlation between a prototype document and its summary. Prototype facts are then utilized to help extract facts from the input document. Next, an editing generator generates new summary based on the summary pattern or extracted facts. Finally, to address the second challenge, a fact checker is used to estimate mutual information between the input document and generated summary, providing an additional signal for the generator. Extensive experiments conducted on a large-scale real-world text summarization dataset show that PESG achieves the state-of-the-art performance in terms of both automatic metrics and human evaluations.
Tasks	Abstractive Text Summarization, Text Summarization
Published	2019-09-19
URL	https://arxiv.org/abs/1909.08837v1
PDF	https://arxiv.org/pdf/1909.08837v1.pdf
PWC	https://paperswithcode.com/paper/how-to-write-summaries-with-patterns-learning
Repo	https://github.com/gsh199449/proto-summ
Framework	none

On Extractive and Abstractive Neural Document Summarization with Transformer Language Models


Title	On Extractive and Abstractive Neural Document Summarization with Transformer Language Models
Authors	Sandeep Subramanian, Raymond Li, Jonathan Pilault, Christopher Pal
Abstract	We present a method to produce abstractive summaries of long documents that exceed several thousand words via neural abstractive summarization. We perform a simple extractive step before generating a summary, which is then used to condition the transformer language model on relevant information before being tasked with generating a summary. We show that this extractive step significantly improves summarization results. We also show that this approach produces more abstractive summaries compared to prior work that employs a copy mechanism while still achieving higher rouge scores. Note: The abstract above was not written by the authors, it was generated by one of the models presented in this paper.
Tasks	Abstractive Text Summarization, Document Summarization, Language Modelling
Published	2019-09-07
URL	https://arxiv.org/abs/1909.03186v1
PDF	https://arxiv.org/pdf/1909.03186v1.pdf
PWC	https://paperswithcode.com/paper/on-extractive-and-abstractive-neural-document
Repo	https://github.com/Bread-and-Code/Text-Summarization
Framework	tf

Disentangled Representation Learning in Cardiac Image Analysis


Title	Disentangled Representation Learning in Cardiac Image Analysis
Authors	Agisilaos Chartsias, Thomas Joyce, Giorgos Papanastasiou, Michelle Williams, David Newby, Rohan Dharmakumar, Sotirios A. Tsaftaris
Abstract	Typically, a medical image offers spatial information on the anatomy (and pathology) modulated by imaging specific characteristics. Many imaging modalities including Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) can be interpreted in this way. We can venture further and consider that a medical image naturally factors into some spatial factors depicting anatomy and factors that denote the imaging characteristics. Here, we explicitly learn this decomposed (disentangled) representation of imaging data, focusing in particular on cardiac images. We propose Spatial Decomposition Network (SDNet), which factorises 2D medical images into spatial anatomical factors and non-spatial modality factors. We demonstrate that this high-level representation is ideally suited for several medical image analysis tasks, such as semi-supervised segmentation, multi-task segmentation and regression, and image-to-image synthesis. Specifically, we show that our model can match the performance of fully supervised segmentation models, using only a fraction of the labelled images. Critically, we show that our factorised representation also benefits from supervision obtained either when we use auxiliary tasks to train the model in a multi-task setting (e.g. regressing to known cardiac indices), or when aggregating multimodal data from different sources (e.g. pooling together MRI and CT data). To explore the properties of the learned factorisation, we perform latent-space arithmetic and show that we can synthesise CT from MR and vice versa, by swapping the modality factors. We also demonstrate that the factor holding image specific information can be used to predict the input modality with high accuracy. Code will be made available at https://github.com/agis85/anatomy_modality_decomposition.
Tasks	Computed Tomography (CT), Image Generation, Representation Learning
Published	2019-03-22
URL	https://arxiv.org/abs/1903.09467v4
PDF	https://arxiv.org/pdf/1903.09467v4.pdf
PWC	https://paperswithcode.com/paper/factorised-representation-learning-in-cardiac
Repo	https://github.com/agis85/anatomy_modality_decomposition
Framework	tf

Linear Range in Gradient Descent


Title	Linear Range in Gradient Descent
Authors	Angxiu Ni, Chaitanya Talnikar
Abstract	This paper defines linear range as the range of parameter perturbations which lead to approximately linear perturbations in the states of a network. We compute linear range from the difference between actual perturbations in states and the tangent solution. Linear range is a new criterion for estimating the effectivenss of gradients and thus having many possible applications. In particular, we propose that the optimal learning rate at the initial stages of training is such that parameter changes on all minibatches are within linear range. We demonstrate our algorithm on two shallow neural networks and a ResNet.
Tasks
Published	2019-05-11
URL	https://arxiv.org/abs/1905.04561v2
PDF	https://arxiv.org/pdf/1905.04561v2.pdf
PWC	https://paperswithcode.com/paper/linear-range-in-gradient-descent
Repo	https://github.com/niangxiu/linGrad
Framework	none

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism


Title	Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Authors	Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro
Abstract	Recent work in language modeling demonstrates that training large transformer models advances the state of the art in Natural Language Processing applications. However, very large models can be quite difficult to train due to memory constraints. In this work, we present our techniques for training very large transformer models and implement a simple, efficient intra-layer model parallel approach that enables training transformer models with billions of parameters. Our approach does not require a new compiler or library changes, is orthogonal and complimentary to pipeline model parallelism, and can be fully implemented with the insertion of a few communication operations in native PyTorch. We illustrate this approach by converging transformer based models up to 8.3 billion parameters using 512 GPUs. We sustain 15.1 PetaFLOPs across the entire application with 76% scaling efficiency when compared to a strong single GPU baseline that sustains 39 TeraFLOPs, which is 30% of peak FLOPs. To demonstrate that large language models can further advance the state of the art (SOTA), we train an 8.3 billion parameter transformer language model similar to GPT-2 and a 3.9 billion parameter model similar to BERT. We show that careful attention to the placement of layer normalization in BERT-like models is critical to achieving increased performance as the model size grows. Using the GPT-2 model we achieve SOTA results on the WikiText103 (10.8 compared to SOTA perplexity of 15.8) and LAMBADA (66.5% compared to SOTA accuracy of 63.2%) datasets. Our BERT model achieves SOTA results on the RACE dataset (90.9% compared to SOTA accuracy of 89.4%).
Tasks	Language Modelling
Published	2019-09-17
URL	https://arxiv.org/abs/1909.08053v4
PDF	https://arxiv.org/pdf/1909.08053v4.pdf
PWC	https://paperswithcode.com/paper/megatron-lm-training-multi-billion-parameter
Repo	https://github.com/NVIDIA/Megatron-LM
Framework	pytorch

Evaluating task-agnostic exploration for fixed-batch learning of arbitrary future tasks


Title	Evaluating task-agnostic exploration for fixed-batch learning of arbitrary future tasks
Authors	Vibhavari Dasagi, Robert Lee, Jake Bruce, Jürgen Leitner
Abstract	Deep reinforcement learning has been shown to solve challenging tasks where large amounts of training experience is available, usually obtained online while learning the task. Robotics is a significant potential application domain for many of these algorithms, but generating robot experience in the real world is expensive, especially when each task requires a lengthy online training procedure. Off-policy algorithms can in principle learn arbitrary tasks from a diverse enough fixed dataset. In this work, we evaluate popular exploration methods by generating robotics datasets for the purpose of learning to solve tasks completely offline without any further interaction in the real world. We present results on three popular continuous control tasks in simulation, as well as continuous control of a high-dimensional real robot arm. Code documenting all algorithms, experiments, and hyper-parameters is available at https://github.com/qutrobotlearning/batchlearning.
Tasks	Continuous Control
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08666v1
PDF	https://arxiv.org/pdf/1911.08666v1.pdf
PWC	https://paperswithcode.com/paper/evaluating-task-agnostic-exploration-for
Repo	https://github.com/qutrobotlearning/batchlearning
Framework	pytorch

Fast GPU-Enabled Color Normalization for Digital Pathology


Title	Fast GPU-Enabled Color Normalization for Digital Pathology
Authors	Goutham Ramakrishnan, Deepak Anand, Amit Sethi
Abstract	Normalizing unwanted color variations due to differences in staining processes and scanner responses has been shown to aid machine learning in computational pathology. Of the several popular techniques for color normalization, structure preserving color normalization (SPCN) is well-motivated, convincingly tested, and published with its code base. However, SPCN makes occasional errors in color basis estimation leading to artifacts such as swapping the color basis vectors between stains or giving a colored tinge to the background with no tissue. We made several algorithmic improvements to remove these artifacts. Additionally, the original SPCN code is not readily usable on gigapixel whole slide images (WSIs) due to long run times, use of proprietary software platform and libraries, and its inability to automatically handle WSIs. We completely rewrote the software such that it can automatically handle images of any size in popular WSI formats. Our software utilizes GPU-acceleration and open-source libraries that are becoming ubiquitous with the advent of deep learning. We also made several other small improvements and achieved a multifold overall speedup on gigapixel images. Our algorithm and software is usable right out-of-the-box by the computational pathology community.
Tasks
Published	2019-01-10
URL	http://arxiv.org/abs/1901.03088v1
PDF	http://arxiv.org/pdf/1901.03088v1.pdf
PWC	https://paperswithcode.com/paper/fast-gpu-enabled-color-normalization-for
Repo	https://github.com/MEDAL-IITB/Fast_WSI_Color_Norm
Framework	tf

Coresets for Minimum Enclosing Balls over Sliding Windows


Title	Coresets for Minimum Enclosing Balls over Sliding Windows
Authors	Yanhao Wang, Yuchen Li, Kian-Lee Tan
Abstract	\emph{Coresets} are important tools to generate concise summaries of massive datasets for approximate analysis. A coreset is a small subset of points extracted from the original point set such that certain geometric properties are preserved with provable guarantees. This paper investigates the problem of maintaining a coreset to preserve the minimum enclosing ball (MEB) for a sliding window of points that are continuously updated in a data stream. Although the problem has been extensively studied in batch and append-only streaming settings, no efficient sliding-window solution is available yet. In this work, we first introduce an algorithm, called AOMEB, to build a coreset for MEB in an append-only stream. AOMEB improves the practical performance of the state-of-the-art algorithm while having the same approximation ratio. Furthermore, using AOMEB as a building block, we propose two novel algorithms, namely SWMEB and SWMEB+, to maintain coresets for MEB over the sliding window with constant approximation ratios. The proposed algorithms also support coresets for MEB in a reproducing kernel Hilbert space (RKHS). Finally, extensive experiments on real-world and synthetic datasets demonstrate that SWMEB and SWMEB+ achieve speedups of up to four orders of magnitude over the state-of-the-art batch algorithm while providing coresets for MEB with rather small errors compared to the optimal ones.
Tasks
Published	2019-05-09
URL	https://arxiv.org/abs/1905.03718v2
PDF	https://arxiv.org/pdf/1905.03718v2.pdf
PWC	https://paperswithcode.com/paper/190503718
Repo	https://github.com/yhwang1990/SW-MEB
Framework	none

CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text


Title	CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text
Authors	Koustuv Sinha, Shagun Sodhani, Jin Dong, Joelle Pineau, William L. Hamilton
Abstract	The recent success of natural language understanding (NLU) systems has been troubled by results highlighting the failure of these models to generalize in a systematic and robust way. In this work, we introduce a diagnostic benchmark suite, named CLUTRR, to clarify some key issues related to the robustness and systematicity of NLU systems. Motivated by classic work on inductive logic programming, CLUTRR requires that an NLU system infer kinship relations between characters in short stories. Successful performance on this task requires both extracting relationships between entities, as well as inferring the logical rules governing these relationships. CLUTRR allows us to precisely measure a model’s ability for systematic generalization by evaluating on held-out combinations of logical rules, and it allows us to evaluate a model’s robustness by adding curated noise facts. Our empirical results highlight a substantial performance gap between state-of-the-art NLU models (e.g., BERT and MAC) and a graph neural network model that works directly with symbolic inputs—with the graph-based model exhibiting both stronger generalization and greater robustness.
Tasks
Published	2019-08-16
URL	https://arxiv.org/abs/1908.06177v2
PDF	https://arxiv.org/pdf/1908.06177v2.pdf
PWC	https://paperswithcode.com/paper/clutrr-a-diagnostic-benchmark-for-inductive
Repo	https://github.com/facebookresearch/clutrr
Framework	none