February 1, 2020

3133 words 15 mins read

Paper Group AWR 226

Multi Scale Curriculum CNN for Context-Aware Breast MRI Malignancy Classification. Co-Generation with GANs using AIS based HMC. XDeep: An Interpretation Tool for Deep Neural Networks. Noise as Domain Shift: Denoising Medical Images by Unpaired Image Translation. Learning to Remember: A Synaptic Plasticity Driven Framework for Continual Learning. Le …

Multi Scale Curriculum CNN for Context-Aware Breast MRI Malignancy Classification


Title	Multi Scale Curriculum CNN for Context-Aware Breast MRI Malignancy Classification
Authors	Christoph Haarburger, Michael Baumgartner, Daniel Truhn, Mirjam Broeckmann, Hannah Schneider, Simone Schrading, Christiane Kuhl, Dorit Merhof
Abstract	Classification of malignancy for breast cancer and other cancer types is usually tackled as an object detection problem: Individual lesions are first localized and then classified with respect to malignancy. However, the drawback of this approach is that abstract features incorporating several lesions and areas that are not labelled as a lesion but contain global medically relevant information are thus disregarded: especially for dynamic contrast-enhanced breast MRI, criteria such as background parenchymal enhancement and location within the breast are important for diagnosis and cannot be captured by object detection approaches properly. In this work, we propose a 3D CNN and a multi scale curriculum learning strategy to classify malignancy globally based on an MRI of the whole breast. Thus, the global context of the whole breast rather than individual lesions is taken into account. Our proposed approach does not rely on lesion segmentations, which renders the annotation of training data much more effective than in current object detection approaches. Achieving an AUROC of 0.89, we compare the performance of our approach to Mask R-CNN and Retina U-Net as well as a radiologist. Our performance is on par with approaches that, in contrast to our method, rely on pixelwise segmentations of lesions.
Tasks	Object Detection
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06058v2
PDF	https://arxiv.org/pdf/1906.06058v2.pdf
PWC	https://paperswithcode.com/paper/multi-scale-curriculum-cnn-for-context-aware
Repo	https://github.com/haarburger/multi-scale-curriculum
Framework	pytorch

Co-Generation with GANs using AIS based HMC


Title	Co-Generation with GANs using AIS based HMC
Authors	Tiantian Fang, Alexander G. Schwing
Abstract	Inferring the most likely configuration for a subset of variables of a joint distribution given the remaining ones - which we refer to as co-generation - is an important challenge that is computationally demanding for all but the simplest settings. This task has received a considerable amount of attention, particularly for classical ways of modeling distributions like structured prediction. In contrast, almost nothing is known about this task when considering recently proposed techniques for modeling high-dimensional distributions, particularly generative adversarial nets (GANs). Therefore, in this paper, we study the occurring challenges for co-generation with GANs. To address those challenges we develop an annealed importance sampling based Hamiltonian Monte Carlo co-generation algorithm. The presented approach significantly outperforms classical gradient based methods on a synthetic and on the CelebA and LSUN datasets.
Tasks	Structured Prediction
Published	2019-10-31
URL	https://arxiv.org/abs/1910.14673v1
PDF	https://arxiv.org/pdf/1910.14673v1.pdf
PWC	https://paperswithcode.com/paper/co-generation-with-gans-using-ais-based-hmc
Repo	https://github.com/AilsaF/cogen_by_ais
Framework	none

XDeep: An Interpretation Tool for Deep Neural Networks


Title	XDeep: An Interpretation Tool for Deep Neural Networks
Authors	Fan Yang, Zijian Zhang, Haofan Wang, Yuening Li, Xia Hu
Abstract	XDeep is an open-source Python package developed to interpret deep models for both practitioners and researchers. Overall, XDeep takes a trained deep neural network (DNN) as the input, and generates relevant interpretations as the output with the post-hoc manner. From the functionality perspective, XDeep integrates a wide range of interpretation algorithms from the state-of-the-arts, covering different types of methodologies, and is capable of providing both local explanation and global explanation for DNN when interpreting model behaviours. With the well-documented API designed in XDeep, end-users can easily obtain the interpretations for their deep models at hand with several lines of codes, and compare the results among different algorithms. XDeep is generally compatible with Python 3, and can be installed through Python Package Index (PyPI). The source codes are available at: https://github.com/datamllab/xdeep.
Tasks
Published	2019-11-04
URL	https://arxiv.org/abs/1911.01005v1
PDF	https://arxiv.org/pdf/1911.01005v1.pdf
PWC	https://paperswithcode.com/paper/xdeep-an-interpretation-tool-for-deep-neural
Repo	https://github.com/datamllab/xdeep
Framework	none

Noise as Domain Shift: Denoising Medical Images by Unpaired Image Translation


Title	Noise as Domain Shift: Denoising Medical Images by Unpaired Image Translation
Authors	Ilja Manakov, Markus Rohm, Christoph Kern, Benedikt Schworm, Karsten Kortuem, Volker Tresp
Abstract	We cast the problem of image denoising as a domain translation problem between high and low noise domains. By modifying the cycleGAN model, we are able to learn a mapping between these domains on unpaired retinal optical coherence tomography images. In quantitative measurements and a qualitative evaluation by ophthalmologists, we show how this approach outperforms other established methods. The results indicate that the network differentiates subtle changes in the level of noise in the image. Further investigation of the model’s feature maps reveals that it has learned to distinguish retinal layers and other distinct regions of the images.
Tasks	Denoising, Image Denoising
Published	2019-10-07
URL	https://arxiv.org/abs/1910.02702v1
PDF	https://arxiv.org/pdf/1910.02702v1.pdf
PWC	https://paperswithcode.com/paper/noise-as-domain-shift-denoising-medical
Repo	https://github.com/IljaManakov/HDcycleGAN
Framework	pytorch

Learning to Remember: A Synaptic Plasticity Driven Framework for Continual Learning


Title	Learning to Remember: A Synaptic Plasticity Driven Framework for Continual Learning
Authors	Oleksiy Ostapenko, Mihai Puscas, Tassilo Klein, Patrick Jähnichen, Moin Nabi
Abstract	Models trained in the context of continual learning (CL) should be able to learn from a stream of data over an undefined period of time. The main challenges herein are: 1) maintaining old knowledge while simultaneously benefiting from it when learning new tasks, and 2) guaranteeing model scalability with a growing amount of data to learn from. In order to tackle these challenges, we introduce Dynamic Generative Memory (DGM) - a synaptic plasticity driven framework for continual learning. DGM relies on conditional generative adversarial networks with learnable connection plasticity realized with neural masking. Specifically, we evaluate two variants of neural masking: applied to (i) layer activations and (ii) to connection weights directly. Furthermore, we propose a dynamic network expansion mechanism that ensures sufficient model capacity to accommodate for continually incoming tasks. The amount of added capacity is determined dynamically from the learned binary mask. We evaluate DGM in the continual class-incremental setup on visual classification tasks.
Tasks	Continual Learning
Published	2019-04-05
URL	https://arxiv.org/abs/1904.03137v4
PDF	https://arxiv.org/pdf/1904.03137v4.pdf
PWC	https://paperswithcode.com/paper/learning-to-remember-a-synaptic-plasticity
Repo	https://github.com/SAP/machine-learning-dgm
Framework	pytorch

Learning Set-equivariant Functions with SWARM Mappings


Title	Learning Set-equivariant Functions with SWARM Mappings
Authors	Roland Vollgraf
Abstract	In this work we propose a new neural network architecture that efficiently implements and learns general purpose set-equivariant functions. Such a function f maps a set of entities x = {x1, . . . , xn} from one domain to a set of same cardinality y = f (x) = {y1, . . . , yn} in another domain regardless of the ordering of the entities. The architecture is based on a gated recurrent network which is iteratively applied to all entities individually and at the same time syncs with the progression of the whole population. In reminiscence to this pattern, which can be frequently observed in nature, we call our approach SWARM mapping. Set-equivariant and generally permutation invariant functions are important building blocks for many state of the art machine learning approaches. Even in applications where the permutation invariance is not of primary interest, as to be seen in the recent success of attention based transformer models (Vaswani et. al. 2017). Accordingly, we demonstrate the power and usefulness of SWARM mappings in different applications. We compare the performance of our approach with another recently proposed set-equivariant function, the Set Transformer (Lee et.al. 2018) and we demonstrate that models solely based on SWARM layers gives state of the art results.
Tasks
Published	2019-06-22
URL	https://arxiv.org/abs/1906.09400v2
PDF	https://arxiv.org/pdf/1906.09400v2.pdf
PWC	https://paperswithcode.com/paper/learning-set-equivariant-functions-with-swarm
Repo	https://github.com/zalandoresearch/SWARM
Framework	pytorch

Learning Regularity in Skeleton Trajectories for Anomaly Detection in Videos


Title	Learning Regularity in Skeleton Trajectories for Anomaly Detection in Videos
Authors	Romero Morais, Vuong Le, Truyen Tran, Budhaditya Saha, Moussa Mansour, Svetha Venkatesh
Abstract	Appearance features have been widely used in video anomaly detection even though they contain complex entangled factors. We propose a new method to model the normal patterns of human movements in surveillance video for anomaly detection using dynamic skeleton features. We decompose the skeletal movements into two sub-components: global body movement and local body posture. We model the dynamics and interaction of the coupled features in our novel Message-Passing Encoder-Decoder Recurrent Network. We observed that the decoupled features collaboratively interact in our spatio-temporal model to accurately identify human-related irregular events from surveillance video sequences. Compared to traditional appearance-based models, our method achieves superior outlier detection performance. Our model also offers “open-box” examination and decision explanation made possible by the semantically understandable features and a network architecture supporting interpretability.
Tasks	Anomaly Detection, Outlier Detection
Published	2019-03-08
URL	http://arxiv.org/abs/1903.03295v2
PDF	http://arxiv.org/pdf/1903.03295v2.pdf
PWC	https://paperswithcode.com/paper/learning-regularity-in-skeleton-trajectories
Repo	https://github.com/RomeroBarata/skeleton_based_anomaly_detection
Framework	none

FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System


Title	FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System
Authors	Bowen Yu, Claudio T. Silva
Abstract	Dataflow visualization systems enable flexible visual data exploration by allowing the user to construct a dataflow diagram that composes query and visualization modules to specify system functionality. However learning dataflow diagram usage presents overhead that often discourages the user. In this work we design FlowSense, a natural language interface for dataflow visualization systems that utilizes state-of-the-art natural language processing techniques to assist dataflow diagram construction. FlowSense employs a semantic parser with special utterance tagging and special utterance placeholders to generalize to different datasets and dataflow diagrams. It explicitly presents recognized dataset and diagram special utterances to the user for dataflow context awareness. With FlowSense the user can expand and adjust dataflow diagrams more conveniently via plain English. We apply FlowSense to the VisFlow subset-flow visualization system to enhance its usability. We evaluate FlowSense by one case study with domain experts on a real-world data analysis problem and a formal user study.
Tasks	Machine Translation
Published	2019-08-02
URL	https://arxiv.org/abs/1908.00681v2
PDF	https://arxiv.org/pdf/1908.00681v2.pdf
PWC	https://paperswithcode.com/paper/flowsense-a-natural-language-interface-for
Repo	https://github.com/yubowenok/flowsense
Framework	none

Joey NMT: A Minimalist NMT Toolkit for Novices


Title	Joey NMT: A Minimalist NMT Toolkit for Novices
Authors	Julia Kreutzer, Joost Bastings, Stefan Riezler
Abstract	We present Joey NMT, a minimalist neural machine translation toolkit based on PyTorch that is specifically designed for novices. Joey NMT provides many popular NMT features in a small and simple code base, so that novices can easily and quickly learn to use it and adapt it to their needs. Despite its focus on simplicity, Joey NMT supports classic architectures (RNNs, transformers), fast beam search, weight tying, and more, and achieves performance comparable to more complex toolkits on standard benchmarks. We evaluate the accessibility of our toolkit in a user study where novices with general knowledge about Pytorch and NMT and experts work through a self-contained Joey NMT tutorial, showing that novices perform almost as well as experts in a subsequent code quiz. Joey NMT is available at https://github.com/joeynmt/joeynmt .
Tasks	Machine Translation
Published	2019-07-29
URL	https://arxiv.org/abs/1907.12484v2
PDF	https://arxiv.org/pdf/1907.12484v2.pdf
PWC	https://paperswithcode.com/paper/joey-nmt-a-minimalist-nmt-toolkit-for-novices
Repo	https://github.com/juliakreutzer/joeynmt
Framework	pytorch

Self-Supervised Learning of Face Representations for Video Face Clustering


Title	Self-Supervised Learning of Face Representations for Video Face Clustering
Authors	Vivek Sharma, Makarand Tapaswi, M. Saquib Sarfraz, Rainer Stiefelhagen
Abstract	Analyzing the story behind TV series and movies often requires understanding who the characters are and what they are doing. With improving deep face models, this may seem like a solved problem. However, as face detectors get better, clustering/identification needs to be revisited to address increasing diversity in facial appearance. In this paper, we address video face clustering using unsupervised methods. Our emphasis is on distilling the essential information, identity, from the representations obtained using deep pre-trained face networks. We propose a self-supervised Siamese network that can be trained without the need for video/track based supervision, and thus can also be applied to image collections. We evaluate our proposed method on three video face clustering datasets. The experiments show that our methods outperform current state-of-the-art methods on all datasets. Video face clustering is lacking a common benchmark as current works are often evaluated with different metrics and/or different sets of face tracks.
Tasks
Published	2019-03-03
URL	http://arxiv.org/abs/1903.01000v1
PDF	http://arxiv.org/pdf/1903.01000v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-learning-of-face
Repo	https://github.com/vivoutlaw/SSIAM
Framework	none

Language Modelling for Sound Event Detection with Teacher Forcing and Scheduled Sampling


Title	Language Modelling for Sound Event Detection with Teacher Forcing and Scheduled Sampling
Authors	Konstantinos Drossos, Shayan Gharib, Paul Magron, Tuomas Virtanen
Abstract	A sound event detection (SED) method typically takes as an input a sequence of audio frames and predicts the activities of sound events in each frame. In real-life recordings, the sound events exhibit some temporal structure: for instance, a “car horn” will likely be followed by a “car passing by”. While this temporal structure is widely exploited in sequence prediction tasks (e.g., in machine translation), where language models (LM) are exploited, it is not satisfactorily modeled in SED. In this work we propose a method which allows a recurrent neural network (RNN) to learn an LM for the SED task. The method conditions the input of the RNN with the activities of classes at the previous time step. We evaluate our method using F1 score and error rate (ER) over three different and publicly available datasets; the TUT-SED Synthetic 2016 and the TUT Sound Events 2016 and 2017 datasets. The obtained results show an increase of 9% and 2% at the F1 (higher is better) and a decrease of 7% and 2% at ER (lower is better) for the TUT Sound Events 2016 and 2017 datasets, respectively, when using our method. On the contrary, with our method there is a decrease of 4% at F1 score and an increase of 7% at ER for the TUT-SED Synthetic 2016 dataset.
Tasks	Language Modelling, Machine Translation, Sound Event Detection
Published	2019-07-19
URL	https://arxiv.org/abs/1907.08506v3
PDF	https://arxiv.org/pdf/1907.08506v3.pdf
PWC	https://paperswithcode.com/paper/language-modelling-for-sound-event-detection
Repo	https://github.com/dr-costas/SEDLM
Framework	pytorch


Title	Social Ways: Learning Multi-Modal Distributions of Pedestrian Trajectories with GANs
Authors	Javad Amirian, Jean-Bernard Hayet, Julien Pettre
Abstract	This paper proposes a novel approach for predicting the motion of pedestrians interacting with others. It uses a Generative Adversarial Network (GAN) to sample plausible predictions for any agent in the scene. As GANs are very susceptible to mode collapsing and dropping, we show that the recently proposed Info-GAN allows dramatic improvements in multi-modal pedestrian trajectory prediction to avoid these issues. We also left out L2-loss in training the generator, unlike some previous works, because it causes serious mode collapsing though faster convergence. We show through experiments on real and synthetic data that the proposed method leads to generate more diverse samples and to preserve the modes of the predictive distribution. In particular, to prove this claim, we have designed a toy example dataset of trajectories that can be used to assess the performance of different methods in preserving the predictive distribution modes.
Tasks	Self-Driving Cars, Trajectory Prediction
Published	2019-04-20
URL	http://arxiv.org/abs/1904.09507v2
PDF	http://arxiv.org/pdf/1904.09507v2.pdf
PWC	https://paperswithcode.com/paper/social-ways-learning-multi-modal
Repo	https://github.com/amiryanj/socialways
Framework	pytorch

Label-aware Document Representation via Hybrid Attention for Extreme Multi-Label Text Classification


Title	Label-aware Document Representation via Hybrid Attention for Extreme Multi-Label Text Classification
Authors	Xin Huang, Boli Chen, Lin Xiao, Liping Jing
Abstract	Extreme multi-label text classification (XMTC) aims at tagging a document with most relevant labels from an extremely large-scale label set. It is a challenging problem especially for the tail labels because there are only few training documents to build classifier. This paper is motivated to better explore the semantic relationship between each document and extreme labels by taking advantage of both document content and label correlation. Our objective is to establish an explicit label-aware representation for each document with a hybrid attention deep neural network model(LAHA). LAHA consists of three parts. The first part adopts a multi-label self-attention mechanism to detect the contribution of each word to labels. The second part exploits the label structure and document content to determine the semantic connection between words and labels in a same latent space. An adaptive fusion strategy is designed in the third part to obtain the final label-aware document representation so that the essence of previous two parts can be sufficiently integrated. Extensive experiments have been conducted on six benchmark datasets by comparing with the state-of-the-art methods. The results show the superiority of our proposed LAHA method, especially on the tail labels.
Tasks	Multi-Label Text Classification, Text Classification
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10070v2
PDF	https://arxiv.org/pdf/1905.10070v2.pdf
PWC	https://paperswithcode.com/paper/label-aware-document-representation-via
Repo	https://github.com/HX-idiot/Hybrid_Attention_XML
Framework	pytorch

Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop


Title	Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop
Authors	Nikos Kolotouros, Georgios Pavlakos, Michael J. Black, Kostas Daniilidis
Abstract	Model-based human pose estimation is currently approached through two different paradigms. Optimization-based methods fit a parametric body model to 2D observations in an iterative manner, leading to accurate image-model alignments, but are often slow and sensitive to the initialization. In contrast, regression-based methods, that use a deep network to directly estimate the model parameters from pixels, tend to provide reasonable, but not pixel accurate, results while requiring huge amounts of supervision. In this work, instead of investigating which approach is better, our key insight is that the two paradigms can form a strong collaboration. A reasonable, directly regressed estimate from the network can initialize the iterative optimization making the fitting faster and more accurate. Similarly, a pixel accurate fit from iterative optimization can act as strong supervision for the network. This is the core of our proposed approach SPIN (SMPL oPtimization IN the loop). The deep network initializes an iterative optimization routine that fits the body model to 2D joints within the training loop, and the fitted estimate is subsequently used to supervise the network. Our approach is self-improving by nature, since better network estimates can lead the optimization to better solutions, while more accurate optimization fits provide better supervision for the network. We demonstrate the effectiveness of our approach in different settings, where 3D ground truth is scarce, or not available, and we consistently outperform the state-of-the-art model-based pose estimation approaches by significant margins. The project website with videos, results, and code can be found at https://seas.upenn.edu/~nkolot/projects/spin.
Tasks	3D Human Pose Estimation, Pose Estimation
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12828v1
PDF	https://arxiv.org/pdf/1909.12828v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-reconstruct-3d-human-pose-and
Repo	https://github.com/nkolot/SPIN
Framework	pytorch

Spectral embedding of regularized block models


Title	Spectral embedding of regularized block models
Authors	Nathan de Lara, Thomas Bonald
Abstract	Spectral embedding is a popular technique for the representation of graph data. Several regularization techniques have been proposed to improve the quality of the embedding with respect to downstream tasks like clustering. In this paper, we explain on a simple block model the impact of the complete graph regularization, whereby a constant is added to all entries of the adjacency matrix. Specifically, we show that the regularization forces the spectral embedding to focus on the largest blocks, making the representation less sensitive to noise or outliers. We illustrate these results on both on both synthetic and real data, showing how regularization improves standard clustering scores.
Tasks
Published	2019-12-23
URL	https://arxiv.org/abs/1912.10903v1
PDF	https://arxiv.org/pdf/1912.10903v1.pdf
PWC	https://paperswithcode.com/paper/spectral-embedding-of-regularized-block-1
Repo	https://github.com/research-submissions/iclr20
Framework	none