February 2, 2020

3089 words 15 mins read

Paper Group AWR 68

Rigging the Lottery: Making All Tickets Winners. Class Imbalance Techniques for High Energy Physics. The Dynamic Embedded Topic Model. Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis. MirrorGAN: Learning Text-to-image Generation by Redescription. Hierarchical Probabilistic Model for Blind Source Separation via Lege …

Rigging the Lottery: Making All Tickets Winners


Title	Rigging the Lottery: Making All Tickets Winners
Authors	Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen
Abstract	Sparse neural networks have been shown to be more parameter and compute efficient compared to dense networks and in some cases are used to decrease wall clock inference times. There is a large body of work on training dense networks to yield sparse networks for inference. This limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods. Our method updates the topology of the network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. Importantly, by adjusting the topology it can start from any initialization - not just “lucky” ones. We demonstrate state-of-the-art sparse training results with ResNet-50, MobileNet v1 and MobileNet v2 on the ImageNet-2012 dataset, WideResNets on the CIFAR-10 dataset and RNNs on the WikiText-103 dataset. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static.
Tasks	Image Classification, Language Modelling, Sparse Learning
Published	2019-11-25
URL	https://arxiv.org/abs/1911.11134v1
PDF	https://arxiv.org/pdf/1911.11134v1.pdf
PWC	https://paperswithcode.com/paper/rigging-the-lottery-making-all-tickets-1
Repo	https://github.com/google-research/rigl
Framework	tf

Class Imbalance Techniques for High Energy Physics


Title	Class Imbalance Techniques for High Energy Physics
Authors	Christopher W. Murphy
Abstract	A common problem in a high energy physics experiment is extracting a signal from a much larger background. Posed as a classification task, there is said to be an imbalance in the number of samples belonging to the signal class versus the number of samples from the background class. In this work we provide a brief overview of class imbalance techniques in a high energy physics setting. Two case studies are presented: (1) the measurement of the longitudinal polarization fraction in same-sign $WW$ scattering, and (2) the decay of the Higgs boson to charm-quark pairs.
Tasks
Published	2019-05-01
URL	https://arxiv.org/abs/1905.00339v2
PDF	https://arxiv.org/pdf/1905.00339v2.pdf
PWC	https://paperswithcode.com/paper/class-imbalance-techniques-for-high-energy
Repo	https://github.com/christopher-w-murphy/Class-Imbalance-in-WW-Polarization
Framework	tf

The Dynamic Embedded Topic Model


Title	The Dynamic Embedded Topic Model
Authors	Adji B. Dieng, Francisco J. R. Ruiz, David M. Blei
Abstract	Topic modeling analyzes documents to learn meaningful patterns of words. For documents collected in sequence, dynamic topic models capture how these patterns vary over time. We develop the dynamic embedded topic model (D-ETM), a generative model of documents that combines dynamic latent Dirichlet allocation (D-LDA) and word embeddings. The D-ETM models each word with a categorical distribution parameterized by the inner product between the word embedding and a per-time-step embedding representation of its assigned topic. The D-ETM learns smooth topic trajectories by defining a random walk prior over the embedding representations of the topics. We fit the D-ETM using structured amortized variational inference with a recurrent neural network. On three different corpora—a collection of United Nations debates, a set of ACL abstracts, and a dataset of Science Magazine articles—we found that the D-ETM outperforms D-LDA on a document completion task. We further found that the D-ETM learns more diverse and coherent topics than D-LDA while requiring significantly less time to fit.
Tasks	Topic Models, Word Embeddings
Published	2019-07-12
URL	https://arxiv.org/abs/1907.05545v2
PDF	https://arxiv.org/pdf/1907.05545v2.pdf
PWC	https://paperswithcode.com/paper/the-dynamic-embedded-topic-model
Repo	https://github.com/adjidieng/DETM
Framework	pytorch

Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis


Title	Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis
Authors	Ceyuan Yang, Yujun Shen, Bolei Zhou
Abstract	Despite the success of Generative Adversarial Networks (GANs) in image synthesis, there lacks enough understanding on what generative models have learned inside the deep generative representations and how photo-realistic images are able to be composed of the layer-wise stochasticity introduced in recent GANs. In this work, we show that highly-structured semantic hierarchy emerges as variation factors from synthesizing scenes from the generative representations in state-of-the-art GAN models, like StyleGAN and BigGAN. By probing the layer-wise representations with a broad set of semantics at different abstraction levels, we are able to quantify the causality between the activations and semantics occurring in the output image. Such a quantification identifies the human-understandable variation factors learned by GANs to compose scenes. The qualitative and quantitative results further suggest that the generative representations learned by the GANs with layer-wise latent codes are specialized to synthesize different hierarchical semantics: the early layers tend to determine the spatial layout and configuration, the middle layers control the categorical objects, and the later layers finally render the scene attributes as well as color scheme. Identifying such a set of manipulatable latent variation factors facilitates semantic scene manipulation.
Tasks	Image Generation
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09267v3
PDF	https://arxiv.org/pdf/1911.09267v3.pdf
PWC	https://paperswithcode.com/paper/semantic-hierarchy-emerges-in-deep-generative
Repo	https://github.com/ShenYujun/HiGAN
Framework	tf

MirrorGAN: Learning Text-to-image Generation by Redescription


Title	MirrorGAN: Learning Text-to-image Generation by Redescription
Authors	Tingting Qiao, Jing Zhang, Duanqing Xu, Dacheng Tao
Abstract	Generating an image from a given text description has two goals: visual realism and semantic consistency. Although significant progress has been made in generating high-quality and visually realistic images using generative adversarial networks, guaranteeing semantic consistency between the text description and visual content remains very challenging. In this paper, we address this problem by proposing a novel global-local attentive and semantic-preserving text-to-image-to-text framework called MirrorGAN. MirrorGAN exploits the idea of learning text-to-image generation by redescription and consists of three modules: a semantic text embedding module (STEM), a global-local collaborative attentive module for cascaded image generation (GLAM), and a semantic text regeneration and alignment module (STREAM). STEM generates word- and sentence-level embeddings. GLAM has a cascaded architecture for generating target images from coarse to fine scales, leveraging both local word attention and global sentence attention to progressively enhance the diversity and semantic consistency of the generated images. STREAM seeks to regenerate the text description from the generated image, which semantically aligns with the given text description. Thorough experiments on two public benchmark datasets demonstrate the superiority of MirrorGAN over other representative state-of-the-art methods.
Tasks	Image Generation, Text-to-Image Generation
Published	2019-03-14
URL	http://arxiv.org/abs/1903.05854v1
PDF	http://arxiv.org/pdf/1903.05854v1.pdf
PWC	https://paperswithcode.com/paper/mirrorgan-learning-text-to-image-generation
Repo	https://github.com/byby221b/PaperReading
Framework	none


Title	Hierarchical Probabilistic Model for Blind Source Separation via Legendre Transformation
Authors	Simon Luo, Lamiae Azizi, Mahito Sugiyama
Abstract	We present a novel blind source separation (BSS) method, called information geometric blind source separation (IGBSS). Our formulation is based on the information geometric log-linear model equipped with a hierarchically structured sample space, which has theoretical guarantees to uniquely recover a set of source signals by minimizing the KL divergence from a set of mixed signals. Source signals, received signals, and mixing matrices are realized as different layers in our hierarchical sample space. Our empirical results have demonstrated on images that our approach is superior to current state-of-the-art techniques and is able to separate signals with complex interactions.
Tasks
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11294v1
PDF	https://arxiv.org/pdf/1909.11294v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-probabilistic-model-for-blind
Repo	https://github.com/sjmluo/IGLLM
Framework	none

A Cross-Domain Transferable Neural Coherence Model


Title	A Cross-Domain Transferable Neural Coherence Model
Authors	Peng Xu, Hamidreza Saghir, Jin Sung Kang, Teng Long, Avishek Joey Bose, Yanshuai Cao, Jackie Chi Kit Cheung
Abstract	Coherence is an important aspect of text quality and is crucial for ensuring its readability. One important limitation of existing coherence models is that training on one domain does not easily generalize to unseen categories of text. Previous work advocates for generative models for cross-domain generalization, because for discriminative models, the space of incoherent sentence orderings to discriminate against during training is prohibitively large. In this work, we propose a local discriminative neural model with a much smaller negative sampling space that can efficiently learn against incorrect orderings. The proposed coherence model is simple in structure, yet it significantly outperforms previous state-of-art methods on a standard benchmark dataset on the Wall Street Journal corpus, as well as in multiple new challenging settings of transfer to unseen categories of discourse on Wikipedia articles.
Tasks	Domain Generalization
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11912v2
PDF	https://arxiv.org/pdf/1905.11912v2.pdf
PWC	https://paperswithcode.com/paper/a-cross-domain-transferable-neural-coherence
Repo	https://github.com/BorealisAI/cross_domain_coherence
Framework	pytorch

Guided Image Generation with Conditional Invertible Neural Networks


Title	Guided Image Generation with Conditional Invertible Neural Networks
Authors	Lynton Ardizzone, Carsten Lüth, Jakob Kruse, Carsten Rother, Ullrich Köthe
Abstract	In this work, we address the task of natural image generation guided by a conditioning input. We introduce a new architecture called conditional invertible neural network (cINN). The cINN combines the purely generative INN model with an unconstrained feed-forward network, which efficiently preprocesses the conditioning input into useful features. All parameters of the cINN are jointly optimized with a stable, maximum likelihood-based training procedure. By construction, the cINN does not experience mode collapse and generates diverse samples, in contrast to e.g. cGANs. At the same time our model produces sharp images since no reconstruction loss is required, in contrast to e.g. VAEs. We demonstrate these properties for the tasks of MNIST digit generation and image colorization. Furthermore, we take advantage of our bi-directional cINN architecture to explore and manipulate emergent properties of the latent space, such as changing the image style in an intuitive way.
Tasks	Colorization, Image Generation
Published	2019-07-04
URL	https://arxiv.org/abs/1907.02392v3
PDF	https://arxiv.org/pdf/1907.02392v3.pdf
PWC	https://paperswithcode.com/paper/guided-image-generation-with-conditional
Repo	https://github.com/VLL-HD/conditional_invertible_neural_networks
Framework	pytorch

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty


Title	AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty
Authors	Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, Balaji Lakshminarayanan
Abstract	Modern deep neural networks can achieve high accuracy when the training distribution and test distribution are identically distributed, but this assumption is frequently violated in practice. When the train and test distributions are mismatched, accuracy can plummet. Currently there are few techniques that improve robustness to unforeseen data shifts encountered during deployment. In this work, we propose a technique to improve the robustness and uncertainty estimates of image classifiers. We propose AugMix, a data processing technique that is simple to implement, adds limited computational overhead, and helps models withstand unforeseen corruptions. AugMix significantly improves robustness and uncertainty measures on challenging image classification benchmarks, closing the gap between previous methods and the best possible performance in some cases by more than half.
Tasks	Image Classification
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02781v2
PDF	https://arxiv.org/pdf/1912.02781v2.pdf
PWC	https://paperswithcode.com/paper/augmix-a-simple-data-processing-method-to
Repo	https://github.com/google-research/augmix
Framework	pytorch

Graph Star Net for Generalized Multi-Task Learning


Title	Graph Star Net for Generalized Multi-Task Learning
Authors	Lu Haonan, Seth H. Huang, Tian Ye, Guo Xiuyan
Abstract	In this work, we present graph star net (GraphStar), a novel and unified graph neural net architecture which utilizes message-passing relay and attention mechanism for multiple prediction tasks - node classification, graph classification and link prediction. GraphStar addresses many earlier challenges facing graph neural nets and achieves non-local representation without increasing the model depth or bearing heavy computational costs. We also propose a new method to tackle topic-specific sentiment analysis based on node classification and text classification as graph classification. Our work shows that ‘star nodes’ can learn effective graph-data representation and improve on current methods for the three tasks. Specifically, for graph classification and link prediction, GraphStar outperforms the current state-of-the-art models by 2-5% on several key benchmarks.
Tasks	Graph Classification, Link Prediction, Multi-Task Learning, Node Classification, Sentiment Analysis, Text Classification
Published	2019-06-21
URL	https://arxiv.org/abs/1906.12330v1
PDF	https://arxiv.org/pdf/1906.12330v1.pdf
PWC	https://paperswithcode.com/paper/graph-star-net-for-generalized-multi-task-1
Repo	https://github.com/graph-star-team/graph_star
Framework	pytorch

Fast Deep Learning for Automatic Modulation Classification


Title	Fast Deep Learning for Automatic Modulation Classification
Authors	Sharan Ramjee, Shengtai Ju, Diyu Yang, Xiaoyu Liu, Aly El Gamal, Yonina C. Eldar
Abstract	In this work, we investigate the feasibility and effectiveness of employing deep learning algorithms for automatic recognition of the modulation type of received wireless communication signals from subsampled data. Recent work considered a GNU radio-based data set that mimics the imperfections in a real wireless channel and uses 10 different modulation types. A Convolutional Neural Network (CNN) architecture was then developed and shown to achieve performance that exceeds that of expert-based approaches. Here, we continue this line of work and investigate deep neural network architectures that deliver high classification accuracy. We identify three architectures - namely, a Convolutional Long Short-term Deep Neural Network (CLDNN), a Long Short-Term Memory neural network (LSTM), and a deep Residual Network (ResNet) - that lead to typical classification accuracy values around 90% at high SNR. We then study algorithms to reduce the training time by minimizing the size of the training data set, while incurring a minimal loss in classification accuracy. To this end, we demonstrate the performance of Principal Component Analysis in significantly reducing the training time, while maintaining good performance at low SNR. We also investigate subsampling techniques that further reduce the training time, and pave the way for online classification at high SNR. Finally, we identify representative SNR values for training each of the candidate architectures, and consequently, realize drastic reductions of the training time, with negligible loss in classification accuracy.
Tasks
Published	2019-01-16
URL	http://arxiv.org/abs/1901.05850v1
PDF	http://arxiv.org/pdf/1901.05850v1.pdf
PWC	https://paperswithcode.com/paper/fast-deep-learning-for-automatic-modulation
Repo	https://github.com/dl4amc/source
Framework	tf

A CNN-based tool for automatic tongue contour tracking in ultrasound images


Title	A CNN-based tool for automatic tongue contour tracking in ultrasound images
Authors	Jian Zhu, Will Styler, Ian Calloway
Abstract	For speech research, ultrasound tongue imaging provides a non-invasive means for visualizing tongue position and movement during articulation. Extracting tongue contours from ultrasound images is a basic step in analyzing ultrasound data but this task often requires non-trivial manual annotation. This study presents an open source tool for fully automatic tracking of tongue contours in ultrasound frames using neural network based methods. We have implemented and systematically compared two convolutional neural networks, U-Net and DenseU-Net, under different conditions. Though both models can perform automatic contour tracking with comparable accuracy, Dense U-Net architecture seems more generalizable across test datasets while U-Net has faster extraction speed. Our comparison also shows that the choice of loss function and data augmentation have a greater effect on tracking performance in this task. This public available segmentation tool shows considerable promise for the automated tongue contour annotation of ultrasound images in speech research.
Tasks	Data Augmentation
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10210v1
PDF	https://arxiv.org/pdf/1907.10210v1.pdf
PWC	https://paperswithcode.com/paper/a-cnn-based-tool-for-automatic-tongue-contour
Repo	https://github.com/lingjzhu/mtracker.github.io
Framework	tf

MaskedFusion: Mask-based 6D Object Pose Estimation


Title	MaskedFusion: Mask-based 6D Object Pose Estimation
Authors	Nuno Pereira, Luís A. Alexandre
Abstract	MaskedFusion is a framework to estimate the 6D pose of objects using RGB-D data, with an architecture that leverages multiple sub-tasks in a pipeline to achieve accurate 6D poses. 6D pose estimation is an open challenge due to complex world objects and many possible problems when capturing data from the real world, e.g., occlusions, truncations, and noise in the data. Achieving accurate 6D poses will improve results in other open problems like robot grasping or positioning objects in augmented reality. MaskedFusion improves the state-of-the-art by using object masks to eliminate non-relevant data. With the inclusion of the masks on the neural network that estimates the 6D pose of an object we also have features that represent the object shape. MaskedFusion is a modular pipeline where each sub-task can have different methods that achieve the objective. MaskedFusion achieved 97.3% on average using the ADD metric on the LineMOD dataset and 93.3% using the ADD-S AUC metric on YCB-Video Dataset, which is an improvement, compared to the state-of-the-art methods. The code is available on GitHub (https://github.com/kroglice/MaskedFusion).
Tasks	6D Pose Estimation, 6D Pose Estimation using RGB, Pose Estimation
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07771v2
PDF	https://arxiv.org/pdf/1911.07771v2.pdf
PWC	https://paperswithcode.com/paper/maskedfusion-mask-based-6d-object-pose
Repo	https://github.com/kroglice/MaskedFusion
Framework	pytorch

NLNL: Negative Learning for Noisy Labels


Title	NLNL: Negative Learning for Noisy Labels
Authors	Youngdong Kim, Junho Yim, Juseung Yun, Junmo Kim
Abstract	Convolutional Neural Networks (CNNs) provide excellent performance when used for image classification. The classical method of training CNNs is by labeling images in a supervised manner as in “input image belongs to this label” (Positive Learning; PL), which is a fast and accurate method if the labels are assigned correctly to all images. However, if inaccurate labels, or noisy labels, exist, training with PL will provide wrong information, thus severely degrading performance. To address this issue, we start with an indirect learning method called Negative Learning (NL), in which the CNNs are trained using a complementary label as in “input image does not belong to this complementary label.” Because the chances of selecting a true label as a complementary label are low, NL decreases the risk of providing incorrect information. Furthermore, to improve convergence, we extend our method by adopting PL selectively, termed as Selective Negative Learning and Positive Learning (SelNLPL). PL is used selectively to train upon expected-to-be-clean data, whose choices become possible as NL progresses, thus resulting in superior performance of filtering out noisy data. With simple semi-supervised training technique, our method achieves state-of-the-art accuracy for noisy data classification, proving the superiority of SelNLPL’s noisy data filtering ability.
Tasks	Image Classification
Published	2019-08-19
URL	https://arxiv.org/abs/1908.07387v1
PDF	https://arxiv.org/pdf/1908.07387v1.pdf
PWC	https://paperswithcode.com/paper/nlnl-negative-learning-for-noisy-labels
Repo	https://github.com/ydkim1293/NLNL-Negative-Learning-for-Noisy-Labels
Framework	pytorch

Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation


Title	Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation
Authors	Qiqi Hou, Feng Liu
Abstract	Natural image matting is an important problem in computer vision and graphics. It is an ill-posed problem when only an input image is available without any external information. While the recent deep learning approaches have shown promising results, they only estimate the alpha matte. This paper presents a context-aware natural image matting method for simultaneous foreground and alpha matte estimation. Our method employs two encoder networks to extract essential information for matting. Particularly, we use a matting encoder to learn local features and a context encoder to obtain more global context information. We concatenate the outputs from these two encoders and feed them into decoder networks to simultaneously estimate the foreground and alpha matte. To train this whole deep neural network, we employ both the standard Laplacian loss and the feature loss: the former helps to achieve high numerical performance while the latter leads to more perceptually plausible results. We also report several data augmentation strategies that greatly improve the network’s generalization performance. Our qualitative and quantitative experiments show that our method enables high-quality matting for a single natural image. Our inference codes and models have been made publicly available at https://github.com/hqqxyy/Context-Aware-Matting.
Tasks	Data Augmentation, Image Matting
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09725v2
PDF	https://arxiv.org/pdf/1909.09725v2.pdf
PWC	https://paperswithcode.com/paper/190909725
Repo	https://github.com/hqqxyy/Context-Aware-Matting
Framework	tf