February 1, 2020

3123 words 15 mins read

Paper Group AWR 270

Scaling Distributed Machine Learning with In-Network Aggregation. A Comprehensive Survey on Graph Neural Networks. Unsupervised Part-Based Disentangling of Object Shape and Appearance. Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck. Signet Ring Cell Detection With a Semi-supervised Learning Framework. Cyclical Annealing S …

Scaling Distributed Machine Learning with In-Network Aggregation


Title	Scaling Distributed Machine Learning with In-Network Aggregation
Authors	Amedeo Sapio, Marco Canini, Chen-Yu Ho, Jacob Nelson, Panos Kalnis, Changhoon Kim, Arvind Krishnamurthy, Masoud Moshref, Dan R. K. Ports, Peter Richtárik
Abstract	Training complex machine learning models in parallel is an increasingly important workload. We accelerate distributed parallel training by designing a communication primitive that uses a programmable switch dataplane to execute a key step of the training process. Our approach, SwitchML, reduces the volume of exchanged data by aggregating the model updates from multiple workers in the network. We co-design the switch processing with the end-host protocols and ML frameworks to provide a robust, efficient solution that speeds up training by up to 300%, and at least by 20% for a number of real-world benchmark models.
Tasks
Published	2019-02-22
URL	http://arxiv.org/abs/1903.06701v1
PDF	http://arxiv.org/pdf/1903.06701v1.pdf
PWC	https://paperswithcode.com/paper/scaling-distributed-machine-learning-with-in
Repo	https://github.com/IETF-Hackathon/p4-ipv6-switch-ml
Framework	none

A Comprehensive Survey on Graph Neural Networks


Title	A Comprehensive Survey on Graph Neural Networks
Authors	Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, Philip S. Yu
Abstract	Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field.
Tasks	Image Classification, Speech Recognition
Published	2019-01-03
URL	https://arxiv.org/abs/1901.00596v4
PDF	https://arxiv.org/pdf/1901.00596v4.pdf
PWC	https://paperswithcode.com/paper/a-comprehensive-survey-on-graph-neural
Repo	https://github.com/farewell7117/knowledge-graph-base
Framework	pytorch

Unsupervised Part-Based Disentangling of Object Shape and Appearance


Title	Unsupervised Part-Based Disentangling of Object Shape and Appearance
Authors	Dominik Lorenz, Leonard Bereska, Timo Milbich, Björn Ommer
Abstract	Large intra-class variation is the result of changes in multiple object characteristics. Images, however, only show the superposition of different variable factors such as appearance or shape. Therefore, learning to disentangle and represent these different characteristics poses a great challenge, especially in the unsupervised case. Moreover, large object articulation calls for a flexible part-based model. We present an unsupervised approach for disentangling appearance and shape by learning parts consistently over all instances of a category. Our model for learning an object representation is trained by simultaneously exploiting invariance and equivariance constraints between synthetically transformed images. Since no part annotation or prior information on an object class is required, the approach is applicable to arbitrary classes. We evaluate our approach on a wide range of object categories and diverse tasks including pose prediction, disentangled image synthesis, and video-to-video translation. The approach outperforms the state-of-the-art on unsupervised keypoint prediction and compares favorably even against supervised approaches on the task of shape and appearance transfer.
Tasks	Image Generation, Pose Prediction
Published	2019-03-16
URL	https://arxiv.org/abs/1903.06946v3
PDF	https://arxiv.org/pdf/1903.06946v3.pdf
PWC	https://paperswithcode.com/paper/unsupervised-part-based-disentangling-of
Repo	https://github.com/NVIDIA/UnsupervisedLandmarkLearning
Framework	pytorch

Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck


Title	Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck
Authors	Shuang Ma, Daniel McDuff, Yale Song
Abstract	Deep generative models have led to significant advances in cross-modal generation such as text-to-image synthesis. Training these models typically requires paired data with direct correspondence between modalities. We introduce the novel problem of translating instances from one modality to another without paired data by leveraging an intermediate modality shared by the two other modalities. To demonstrate this, we take the problem of translating images to speech. In this case, one could leverage disjoint datasets with one shared modality, e.g., image-text pairs and text-speech pairs, with text as the shared modality. We call this problem “skip-modal generation” because the shared modality is skipped during the generation process. We propose a multimodal information bottleneck approach that learns the correspondence between modalities from unpaired data (image and speech) by leveraging the shared modality (text). We address fundamental challenges of skip-modal generation: 1) learning multimodal representations using a single model, 2) bridging the domain gap between two unrelated datasets, and 3) learning the correspondence between modalities from unpaired data. We show qualitative results on image-to-speech synthesis; this is the first time such results have been reported in the literature. We also show that our approach improves performance on traditional cross-modal generation, suggesting that it improves data efficiency in solving individual tasks.
Tasks	Image Generation, Speech Synthesis
Published	2019-08-19
URL	https://arxiv.org/abs/1908.07094v1
PDF	https://arxiv.org/pdf/1908.07094v1.pdf
PWC	https://paperswithcode.com/paper/unpaired-image-to-speech-synthesis-with
Repo	https://github.com/yunyikristy/skipNet
Framework	tf

Signet Ring Cell Detection With a Semi-supervised Learning Framework


Title	Signet Ring Cell Detection With a Semi-supervised Learning Framework
Authors	Jiahui Li, Shuang Yang, Xiaodi Huang, Qian Da, Xiaoqun Yang, Zhiqiang Hu, Qi Duan, Chaofu Wang, Hongsheng Li
Abstract	Signet ring cell carcinoma is a type of rare adenocarcinoma with poor prognosis. Early detection leads to huge improvement of patients’ survival rate. However, pathologists can only visually detect signet ring cells under the microscope. This procedure is not only laborious but also prone to omission. An automatic and accurate signet ring cell detection solution is thus important but has not been investigated before. In this paper, we take the first step to present a semi-supervised learning framework for the signet ring cell detection problem. Self-training is proposed to deal with the challenge of incomplete annotations, and cooperative-training is adapted to explore the unlabeled regions. Combining the two techniques, our semi-supervised learning framework can make better use of both labeled and unlabeled data. Experiments on large real clinical data demonstrate the effectiveness of our design. Our framework achieves accurate signet ring cell detection and can be readily applied in the clinical trails. The dataset will be released soon to facilitate the development of the area.
Tasks
Published	2019-07-09
URL	https://arxiv.org/abs/1907.03954v1
PDF	https://arxiv.org/pdf/1907.03954v1.pdf
PWC	https://paperswithcode.com/paper/signet-ring-cell-detection-with-a-semi
Repo	https://github.com/nisargshah1999/DigestPath2019
Framework	none

Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing


Title	Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing
Authors	Hao Fu, Chunyuan Li, Xiaodong Liu, Jianfeng Gao, Asli Celikyilmaz, Lawrence Carin
Abstract	Variational autoencoders (VAEs) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks. The VAE objective consists of two terms, (i) reconstruction and (ii) KL regularization, balanced by a weighting hyper-parameter \beta. One notorious training difficulty is that the KL term tends to vanish. In this paper we study scheduling schemes for \beta, and show that KL vanishing is caused by the lack of good latent codes in training the decoder at the beginning of optimization. To remedy this, we propose a cyclical annealing schedule, which repeats the process of increasing \beta multiple times. This new procedure allows the progressive learning of more meaningful latent codes, by leveraging the informative representations of previous cycles as warm re-starts. The effectiveness of cyclical annealing is validated on a broad range of NLP tasks, including language modeling, dialog response generation and unsupervised language pre-training.
Tasks	Language Modelling
Published	2019-03-25
URL	https://arxiv.org/abs/1903.10145v3
PDF	https://arxiv.org/pdf/1903.10145v3.pdf
PWC	https://paperswithcode.com/paper/cyclical-annealing-schedule-a-simple-approach
Repo	https://github.com/suvalaki/Deeper
Framework	tf

PyTorch-BigGraph: A Large-scale Graph Embedding System


Title	PyTorch-BigGraph: A Large-scale Graph Embedding System
Authors	Adam Lerer, Ledell Wu, Jiajun Shen, Timothee Lacroix, Luca Wehrstedt, Abhijit Bose, Alex Peysakhovich
Abstract	Graph embedding methods produce unsupervised node features from graphs that can then be used for a variety of machine learning tasks. Modern graphs, particularly in industrial applications, contain billions of nodes and trillions of edges, which exceeds the capability of existing embedding systems. We present PyTorch-BigGraph (PBG), an embedding system that incorporates several modifications to traditional multi-relation embedding systems that allow it to scale to graphs with billions of nodes and trillions of edges. PBG uses graph partitioning to train arbitrarily large embeddings on either a single machine or in a distributed environment. We demonstrate comparable performance with existing embedding systems on common benchmarks, while allowing for scaling to arbitrarily large graphs and parallelization on multiple machines. We train and evaluate embeddings on several large social network graphs as well as the full Freebase dataset, which contains over 100 million nodes and 2 billion edges.
Tasks	Graph Embedding, graph partitioning, Link Prediction
Published	2019-03-28
URL	http://arxiv.org/abs/1903.12287v3
PDF	http://arxiv.org/pdf/1903.12287v3.pdf
PWC	https://paperswithcode.com/paper/pytorch-biggraph-a-large-scale-graph
Repo	https://github.com/facebookresearch/PyTorch-BigGraph
Framework	pytorch

Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients


Title	Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients
Authors	Brenden K. Petersen
Abstract	Discovering the underlying mathematical expressions describing a dataset is a core challenge for artificial intelligence. This is the problem of $\textit{symbolic}$ $\textit{regression.}$ Despite recent advances in training neural networks to solve complex tasks, deep learning approaches to symbolic regression are underexplored. We propose a framework that combines deep learning with symbolic regression via a simple idea: use a large model to search the space of small models. More specifically, we use a recurrent neural network to emit a distribution over tractable mathematical expressions, and employ reinforcement learning to train the network to generate better-fitting expressions. Our algorithm significantly outperforms standard genetic programming-based symbolic regression in its ability to exactly recover symbolic expressions on a series of benchmark problems, both with and without added noise. More broadly, our contributions include a framework that can be applied to optimize hierarchical, variable-length objects under a black-box performance metric, with the ability to incorporate a priori constraints in situ, and a risk-seeking policy gradient formulation that optimizes for best-case performance instead of expected performance.
Tasks
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04871v2
PDF	https://arxiv.org/pdf/1912.04871v2.pdf
PWC	https://paperswithcode.com/paper/deep-symbolic-regression-recovering
Repo	https://github.com/brendenpetersen/deep-symbolic-regression
Framework	none

Recurrent Neural Networks with Stochastic Layers for Acoustic Novelty Detection


Title	Recurrent Neural Networks with Stochastic Layers for Acoustic Novelty Detection
Authors	Duong Nguyen, Oliver S. Kirsebom, Fábio Frazão, Ronan Fablet, Stan Matwin
Abstract	In this paper, we adapt Recurrent Neural Networks with Stochastic Layers, which are the state-of-the-art for generating text, music and speech, to the problem of acoustic novelty detection. By integrating uncertainty into the hidden states, this type of network is able to learn the distribution of complex sequences. Because the learned distribution can be calculated explicitly in terms of probability, we can evaluate how likely an observation is then detect low-probability events as novel. The model is robust, highly unsupervised, end-to-end and requires minimum preprocessing, feature engineering or hyperparameter tuning. An experiment on a benchmark dataset shows that our model outperforms the state-of-the-art acoustic novelty detectors.
Tasks	Acoustic Novelty Detection, Feature Engineering
Published	2019-02-13
URL	http://arxiv.org/abs/1902.04980v1
PDF	http://arxiv.org/pdf/1902.04980v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-neural-networks-with-stochastic
Repo	https://github.com/dnguyengithub/AudioNovelty
Framework	tf

Learning Deep Parameterized Skills from Demonstration for Re-targetable Visuomotor Control


Title	Learning Deep Parameterized Skills from Demonstration for Re-targetable Visuomotor Control
Authors	Jonathan Chang, Nishanth Kumar, Sean Hastings, Aaron Gokaslan, Diego Romeres, Devesh Jha, Daniel Nikovski, George Konidaris, Stefanie Tellex
Abstract	Robots need to learn skills that can not only generalize across similar problems but also be directed to a specific goal. Previous methods either train a new skill for every different goal or do not infer the specific target in the presence of multiple goals from visual data. We introduce an end-to-end method that represents targetable visuomotor skills as a goal-parameterized neural network policy. By training on an informative subset of available goals with the associated target parameters, we are able to learn a policy that can zero-shot generalize to previously unseen goals. We evaluate our method in a representative 2D simulation of a button-grid and on both button-pressing and peg-insertion tasks on two different physical arms. We demonstrate that our model trained on 33% of the possible goals is able to generalize to more than 90% of the targets in the scene for both simulation and robot experiments. We also successfully learn a mapping from target pixel coordinates to a robot policy to complete a specified goal.
Tasks
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10628v1
PDF	https://arxiv.org/pdf/1910.10628v1.pdf
PWC	https://paperswithcode.com/paper/learning-deep-parameterized-skills-from
Repo	https://github.com/h2r/parameterized-imitation-learning
Framework	pytorch

Diving Deeper into Underwater Image Enhancement: A Survey


Title	Diving Deeper into Underwater Image Enhancement: A Survey
Authors	Saeed Anwar, Chongyi Li
Abstract	The powerful representation capacity of deep learning has made it inevitable for the underwater image enhancement community to employ its potential. The exploration of deep underwater image enhancement networks is increasing over time, and hence; a comprehensive survey is the need of the hour. In this paper, our main aim is two-fold, 1): to provide a comprehensive and in-depth survey of the deep learning-based underwater image enhancement, which covers various perspectives ranging from algorithms to open issues, and 2): to conduct a qualitative and quantitative comparison of the deep algorithms on diverse datasets to serve as a benchmark, which has been barely explored before. To be specific, we first introduce the underwater image formation models, which are the base of training data synthesis and design of deep networks, and also helpful for understanding the process of underwater image degradation. Then, we review deep underwater image enhancement algorithms, and a glimpse of some of the aspects of the current networks is presented including network architecture, network parameters, training data, loss function, and training configurations. We also summarize the evaluation metrics and underwater image datasets. Following that, a systematically experimental comparison is carried out to analyze the robustness and effectiveness of deep algorithms. Meanwhile, we point out the shortcomings of current benchmark datasets and evaluation metrics. Finally, we discuss several unsolved open issues and suggest possible research directions. We hope that all efforts done in this paper might serve as a comprehensive reference for future research and call for the development of deep learning-based underwater image enhancement.
Tasks	Image Enhancement
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07863v1
PDF	https://arxiv.org/pdf/1907.07863v1.pdf
PWC	https://paperswithcode.com/paper/diving-deeper-into-underwater-image
Repo	https://github.com/saeed-anwar/UWSurvey
Framework	none

BP-Transformer: Modelling Long-Range Context via Binary Partitioning


Title	BP-Transformer: Modelling Long-Range Context via Binary Partitioning
Authors	Zihao Ye, Qipeng Guo, Quan Gan, Xipeng Qiu, Zheng Zhang
Abstract	The Transformer model is widely successful on many natural language processing tasks. However, the quadratic complexity of self-attention limit its application on long text. In this paper, adopting a fine-to-coarse attention mechanism on multi-scale spans via binary partitioning (BP), we propose BP-Transformer (BPT for short). BPT yields $O(k\cdot n\log (n/k))$ connections where $k$ is a hyperparameter to control the density of attention. BPT has a good balance between computation complexity and model capacity. A series of experiments on text classification, machine translation and language modeling shows BPT has a superior performance for long text than previous self-attention models. Our code, hyperparameters and CUDA kernels for sparse attention are available in PyTorch.
Tasks	Language Modelling, Machine Translation, Sentiment Analysis, Text Classification
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04070v1
PDF	https://arxiv.org/pdf/1911.04070v1.pdf
PWC	https://paperswithcode.com/paper/bp-transformer-modelling-long-range-context
Repo	https://github.com/yzh119/BPT
Framework	pytorch

Style Generator Inversion for Image Enhancement and Animation


Title	Style Generator Inversion for Image Enhancement and Animation
Authors	Aviv Gabbay, Yedid Hoshen
Abstract	One of the main motivations for training high quality image generative models is their potential use as tools for image manipulation. Recently, generative adversarial networks (GANs) have been able to generate images of remarkable quality. Unfortunately, adversarially-trained unconditional generator networks have not been successful as image priors. One of the main requirements for a network to act as a generative image prior, is being able to generate every possible image from the target distribution. Adversarial learning often experiences mode-collapse, which manifests in generators that cannot generate some modes of the target distribution. Another requirement often not satisfied is invertibility i.e. having an efficient way of finding a valid input latent code given a required output image. In this work, we show that differently from earlier GANs, the very recently proposed style-generators are quite easy to invert. We use this important observation to propose style generators as general purpose image priors. We show that style generators outperform other GANs as well as Deep Image Prior as priors for image enhancement tasks. The latent space spanned by style-generators satisfies linear identity-pose relations. The latent space linearity, combined with invertibility, allows us to animate still facial images without supervision. Extensive experiments are performed to support the main contributions of this paper.
Tasks	Image Enhancement
Published	2019-06-05
URL	https://arxiv.org/abs/1906.11880v1
PDF	https://arxiv.org/pdf/1906.11880v1.pdf
PWC	https://paperswithcode.com/paper/style-generator-inversion-for-image
Repo	https://github.com/avivga/style-image-prior
Framework	tf

Learning Space Partitions for Nearest Neighbor Search


Title	Learning Space Partitions for Nearest Neighbor Search
Authors	Yihe Dong, Piotr Indyk, Ilya Razenshteyn, Tal Wagner
Abstract	Space partitions of $\mathbb{R}^d$ underlie a vast and important class of fast nearest neighbor search (NNS) algorithms. Inspired by recent theoretical work on NNS for general metric spaces [Andoni, Naor, Nikolov, Razenshteyn, Waingarten STOC 2018, FOCS 2018], we develop a new framework for building space partitions reducing the problem to \emph{balanced graph partitioning} followed by \emph{supervised classification.} We instantiate this general approach with the KaHIP graph partitioner [Sanders, Schulz SEA 2013] and neural networks, respectively, to obtain a new partitioning procedure called Neural Locality-Sensitive Hashing (Neural LSH). On several standard benchmarks for NNS, our experiments show that the partitions obtained by Neural LSH consistently outperform partitions found by quantization-based and tree-based methods.
Tasks	graph partitioning, Quantization
Published	2019-01-24
URL	https://arxiv.org/abs/1901.08544v3
PDF	https://arxiv.org/pdf/1901.08544v3.pdf
PWC	https://paperswithcode.com/paper/learning-sublinear-time-indexing-for-nearest
Repo	https://github.com/twistedcubic/learn-to-hash
Framework	pytorch

Linguistic Versus Latent Relations for Modeling Coherent Flow in Paragraphs


Title	Linguistic Versus Latent Relations for Modeling Coherent Flow in Paragraphs
Authors	Dongyeop Kang, Hiroaki Hayashi, Alan W Black, Eduard Hovy
Abstract	Generating a long, coherent text such as a paragraph requires a high-level control of different levels of relations between sentences (e.g., tense, coreference). We call such a logical connection between sentences as a (paragraph) flow. In order to produce a coherent flow of text, we explore two forms of intersentential relations in a paragraph: one is a human-created linguistical relation that forms a structure (e.g., discourse tree) and the other is a relation from latent representation learned from the sentences themselves. Our two proposed models incorporate each form of relations into document-level language models: the former is a supervised model that jointly learns a language model as well as discourse relation prediction, and the latter is an unsupervised model that is hierarchically conditioned by a recurrent neural network (RNN) over the latent information. Our proposed models with both forms of relations outperform the baselines in partially conditioned paragraph generation task. Our codes and data are publicly available.
Tasks	Language Modelling
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11790v1
PDF	https://arxiv.org/pdf/1908.11790v1.pdf
PWC	https://paperswithcode.com/paper/linguistic-versus-latent-relations-for
Repo	https://github.com/dykang/flownet
Framework	tf