Paper Group AWR 270
Scaling Distributed Machine Learning with In-Network Aggregation. A Comprehensive Survey on Graph Neural Networks. Unsupervised Part-Based Disentangling of Object Shape and Appearance. Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck. Signet Ring Cell Detection With a Semi-supervised Learning Framework. Cyclical Annealing S …
Scaling Distributed Machine Learning with In-Network Aggregation
Title | Scaling Distributed Machine Learning with In-Network Aggregation |
Authors | Amedeo Sapio, Marco Canini, Chen-Yu Ho, Jacob Nelson, Panos Kalnis, Changhoon Kim, Arvind Krishnamurthy, Masoud Moshref, Dan R. K. Ports, Peter Richtárik |
Abstract | Training complex machine learning models in parallel is an increasingly important workload. We accelerate distributed parallel training by designing a communication primitive that uses a programmable switch dataplane to execute a key step of the training process. Our approach, SwitchML, reduces the volume of exchanged data by aggregating the model updates from multiple workers in the network. We co-design the switch processing with the end-host protocols and ML frameworks to provide a robust, efficient solution that speeds up training by up to 300%, and at least by 20% for a number of real-world benchmark models. |
Tasks | |
Published | 2019-02-22 |
URL | http://arxiv.org/abs/1903.06701v1 |
http://arxiv.org/pdf/1903.06701v1.pdf | |
PWC | https://paperswithcode.com/paper/scaling-distributed-machine-learning-with-in |
Repo | https://github.com/IETF-Hackathon/p4-ipv6-switch-ml |
Framework | none |
A Comprehensive Survey on Graph Neural Networks
Title | A Comprehensive Survey on Graph Neural Networks |
Authors | Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, Philip S. Yu |
Abstract | Deep learning has revolutionized many machine learning tasks in recent years, ranging from image classification and video processing to speech recognition and natural language understanding. The data in these tasks are typically represented in the Euclidean space. However, there is an increasing number of applications where data are generated from non-Euclidean domains and are represented as graphs with complex relationships and interdependency between objects. The complexity of graph data has imposed significant challenges on existing machine learning algorithms. Recently, many studies on extending deep learning approaches for graph data have emerged. In this survey, we provide a comprehensive overview of graph neural networks (GNNs) in data mining and machine learning fields. We propose a new taxonomy to divide the state-of-the-art graph neural networks into four categories, namely recurrent graph neural networks, convolutional graph neural networks, graph autoencoders, and spatial-temporal graph neural networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes, benchmark data sets, and model evaluation of graph neural networks. Finally, we propose potential research directions in this rapidly growing field. |
Tasks | Image Classification, Speech Recognition |
Published | 2019-01-03 |
URL | https://arxiv.org/abs/1901.00596v4 |
https://arxiv.org/pdf/1901.00596v4.pdf | |
PWC | https://paperswithcode.com/paper/a-comprehensive-survey-on-graph-neural |
Repo | https://github.com/farewell7117/knowledge-graph-base |
Framework | pytorch |
Unsupervised Part-Based Disentangling of Object Shape and Appearance
Title | Unsupervised Part-Based Disentangling of Object Shape and Appearance |
Authors | Dominik Lorenz, Leonard Bereska, Timo Milbich, Björn Ommer |
Abstract | Large intra-class variation is the result of changes in multiple object characteristics. Images, however, only show the superposition of different variable factors such as appearance or shape. Therefore, learning to disentangle and represent these different characteristics poses a great challenge, especially in the unsupervised case. Moreover, large object articulation calls for a flexible part-based model. We present an unsupervised approach for disentangling appearance and shape by learning parts consistently over all instances of a category. Our model for learning an object representation is trained by simultaneously exploiting invariance and equivariance constraints between synthetically transformed images. Since no part annotation or prior information on an object class is required, the approach is applicable to arbitrary classes. We evaluate our approach on a wide range of object categories and diverse tasks including pose prediction, disentangled image synthesis, and video-to-video translation. The approach outperforms the state-of-the-art on unsupervised keypoint prediction and compares favorably even against supervised approaches on the task of shape and appearance transfer. |
Tasks | Image Generation, Pose Prediction |
Published | 2019-03-16 |
URL | https://arxiv.org/abs/1903.06946v3 |
https://arxiv.org/pdf/1903.06946v3.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-part-based-disentangling-of |
Repo | https://github.com/NVIDIA/UnsupervisedLandmarkLearning |
Framework | pytorch |
Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck
Title | Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck |
Authors | Shuang Ma, Daniel McDuff, Yale Song |
Abstract | Deep generative models have led to significant advances in cross-modal generation such as text-to-image synthesis. Training these models typically requires paired data with direct correspondence between modalities. We introduce the novel problem of translating instances from one modality to another without paired data by leveraging an intermediate modality shared by the two other modalities. To demonstrate this, we take the problem of translating images to speech. In this case, one could leverage disjoint datasets with one shared modality, e.g., image-text pairs and text-speech pairs, with text as the shared modality. We call this problem “skip-modal generation” because the shared modality is skipped during the generation process. We propose a multimodal information bottleneck approach that learns the correspondence between modalities from unpaired data (image and speech) by leveraging the shared modality (text). We address fundamental challenges of skip-modal generation: 1) learning multimodal representations using a single model, 2) bridging the domain gap between two unrelated datasets, and 3) learning the correspondence between modalities from unpaired data. We show qualitative results on image-to-speech synthesis; this is the first time such results have been reported in the literature. We also show that our approach improves performance on traditional cross-modal generation, suggesting that it improves data efficiency in solving individual tasks. |
Tasks | Image Generation, Speech Synthesis |
Published | 2019-08-19 |
URL | https://arxiv.org/abs/1908.07094v1 |
https://arxiv.org/pdf/1908.07094v1.pdf | |
PWC | https://paperswithcode.com/paper/unpaired-image-to-speech-synthesis-with |
Repo | https://github.com/yunyikristy/skipNet |
Framework | tf |
Signet Ring Cell Detection With a Semi-supervised Learning Framework
Title | Signet Ring Cell Detection With a Semi-supervised Learning Framework |
Authors | Jiahui Li, Shuang Yang, Xiaodi Huang, Qian Da, Xiaoqun Yang, Zhiqiang Hu, Qi Duan, Chaofu Wang, Hongsheng Li |
Abstract | Signet ring cell carcinoma is a type of rare adenocarcinoma with poor prognosis. Early detection leads to huge improvement of patients’ survival rate. However, pathologists can only visually detect signet ring cells under the microscope. This procedure is not only laborious but also prone to omission. An automatic and accurate signet ring cell detection solution is thus important but has not been investigated before. In this paper, we take the first step to present a semi-supervised learning framework for the signet ring cell detection problem. Self-training is proposed to deal with the challenge of incomplete annotations, and cooperative-training is adapted to explore the unlabeled regions. Combining the two techniques, our semi-supervised learning framework can make better use of both labeled and unlabeled data. Experiments on large real clinical data demonstrate the effectiveness of our design. Our framework achieves accurate signet ring cell detection and can be readily applied in the clinical trails. The dataset will be released soon to facilitate the development of the area. |
Tasks | |
Published | 2019-07-09 |
URL | https://arxiv.org/abs/1907.03954v1 |
https://arxiv.org/pdf/1907.03954v1.pdf | |
PWC | https://paperswithcode.com/paper/signet-ring-cell-detection-with-a-semi |
Repo | https://github.com/nisargshah1999/DigestPath2019 |
Framework | none |
Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing
Title | Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing |
Authors | Hao Fu, Chunyuan Li, Xiaodong Liu, Jianfeng Gao, Asli Celikyilmaz, Lawrence Carin |
Abstract | Variational autoencoders (VAEs) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks. The VAE objective consists of two terms, (i) reconstruction and (ii) KL regularization, balanced by a weighting hyper-parameter \beta. One notorious training difficulty is that the KL term tends to vanish. In this paper we study scheduling schemes for \beta, and show that KL vanishing is caused by the lack of good latent codes in training the decoder at the beginning of optimization. To remedy this, we propose a cyclical annealing schedule, which repeats the process of increasing \beta multiple times. This new procedure allows the progressive learning of more meaningful latent codes, by leveraging the informative representations of previous cycles as warm re-starts. The effectiveness of cyclical annealing is validated on a broad range of NLP tasks, including language modeling, dialog response generation and unsupervised language pre-training. |
Tasks | Language Modelling |
Published | 2019-03-25 |
URL | https://arxiv.org/abs/1903.10145v3 |
https://arxiv.org/pdf/1903.10145v3.pdf | |
PWC | https://paperswithcode.com/paper/cyclical-annealing-schedule-a-simple-approach |
Repo | https://github.com/suvalaki/Deeper |
Framework | tf |
PyTorch-BigGraph: A Large-scale Graph Embedding System
Title | PyTorch-BigGraph: A Large-scale Graph Embedding System |
Authors | Adam Lerer, Ledell Wu, Jiajun Shen, Timothee Lacroix, Luca Wehrstedt, Abhijit Bose, Alex Peysakhovich |
Abstract | Graph embedding methods produce unsupervised node features from graphs that can then be used for a variety of machine learning tasks. Modern graphs, particularly in industrial applications, contain billions of nodes and trillions of edges, which exceeds the capability of existing embedding systems. We present PyTorch-BigGraph (PBG), an embedding system that incorporates several modifications to traditional multi-relation embedding systems that allow it to scale to graphs with billions of nodes and trillions of edges. PBG uses graph partitioning to train arbitrarily large embeddings on either a single machine or in a distributed environment. We demonstrate comparable performance with existing embedding systems on common benchmarks, while allowing for scaling to arbitrarily large graphs and parallelization on multiple machines. We train and evaluate embeddings on several large social network graphs as well as the full Freebase dataset, which contains over 100 million nodes and 2 billion edges. |
Tasks | Graph Embedding, graph partitioning, Link Prediction |
Published | 2019-03-28 |
URL | http://arxiv.org/abs/1903.12287v3 |
http://arxiv.org/pdf/1903.12287v3.pdf | |
PWC | https://paperswithcode.com/paper/pytorch-biggraph-a-large-scale-graph |
Repo | https://github.com/facebookresearch/PyTorch-BigGraph |
Framework | pytorch |
Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients
Title | Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients |
Authors | Brenden K. Petersen |
Abstract | Discovering the underlying mathematical expressions describing a dataset is a core challenge for artificial intelligence. This is the problem of $\textit{symbolic}$ $\textit{regression.}$ Despite recent advances in training neural networks to solve complex tasks, deep learning approaches to symbolic regression are underexplored. We propose a framework that combines deep learning with symbolic regression via a simple idea: use a large model to search the space of small models. More specifically, we use a recurrent neural network to emit a distribution over tractable mathematical expressions, and employ reinforcement learning to train the network to generate better-fitting expressions. Our algorithm significantly outperforms standard genetic programming-based symbolic regression in its ability to exactly recover symbolic expressions on a series of benchmark problems, both with and without added noise. More broadly, our contributions include a framework that can be applied to optimize hierarchical, variable-length objects under a black-box performance metric, with the ability to incorporate a priori constraints in situ, and a risk-seeking policy gradient formulation that optimizes for best-case performance instead of expected performance. |
Tasks | |
Published | 2019-12-10 |
URL | https://arxiv.org/abs/1912.04871v2 |
https://arxiv.org/pdf/1912.04871v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-symbolic-regression-recovering |
Repo | https://github.com/brendenpetersen/deep-symbolic-regression |
Framework | none |
Recurrent Neural Networks with Stochastic Layers for Acoustic Novelty Detection
Title | Recurrent Neural Networks with Stochastic Layers for Acoustic Novelty Detection |
Authors | Duong Nguyen, Oliver S. Kirsebom, Fábio Frazão, Ronan Fablet, Stan Matwin |
Abstract | In this paper, we adapt Recurrent Neural Networks with Stochastic Layers, which are the state-of-the-art for generating text, music and speech, to the problem of acoustic novelty detection. By integrating uncertainty into the hidden states, this type of network is able to learn the distribution of complex sequences. Because the learned distribution can be calculated explicitly in terms of probability, we can evaluate how likely an observation is then detect low-probability events as novel. The model is robust, highly unsupervised, end-to-end and requires minimum preprocessing, feature engineering or hyperparameter tuning. An experiment on a benchmark dataset shows that our model outperforms the state-of-the-art acoustic novelty detectors. |
Tasks | Acoustic Novelty Detection, Feature Engineering |
Published | 2019-02-13 |
URL | http://arxiv.org/abs/1902.04980v1 |
http://arxiv.org/pdf/1902.04980v1.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-neural-networks-with-stochastic |
Repo | https://github.com/dnguyengithub/AudioNovelty |
Framework | tf |
Learning Deep Parameterized Skills from Demonstration for Re-targetable Visuomotor Control
Title | Learning Deep Parameterized Skills from Demonstration for Re-targetable Visuomotor Control |
Authors | Jonathan Chang, Nishanth Kumar, Sean Hastings, Aaron Gokaslan, Diego Romeres, Devesh Jha, Daniel Nikovski, George Konidaris, Stefanie Tellex |
Abstract | Robots need to learn skills that can not only generalize across similar problems but also be directed to a specific goal. Previous methods either train a new skill for every different goal or do not infer the specific target in the presence of multiple goals from visual data. We introduce an end-to-end method that represents targetable visuomotor skills as a goal-parameterized neural network policy. By training on an informative subset of available goals with the associated target parameters, we are able to learn a policy that can zero-shot generalize to previously unseen goals. We evaluate our method in a representative 2D simulation of a button-grid and on both button-pressing and peg-insertion tasks on two different physical arms. We demonstrate that our model trained on 33% of the possible goals is able to generalize to more than 90% of the targets in the scene for both simulation and robot experiments. We also successfully learn a mapping from target pixel coordinates to a robot policy to complete a specified goal. |
Tasks | |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10628v1 |
https://arxiv.org/pdf/1910.10628v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-deep-parameterized-skills-from |
Repo | https://github.com/h2r/parameterized-imitation-learning |
Framework | pytorch |
Diving Deeper into Underwater Image Enhancement: A Survey
Title | Diving Deeper into Underwater Image Enhancement: A Survey |
Authors | Saeed Anwar, Chongyi Li |
Abstract | The powerful representation capacity of deep learning has made it inevitable for the underwater image enhancement community to employ its potential. The exploration of deep underwater image enhancement networks is increasing over time, and hence; a comprehensive survey is the need of the hour. In this paper, our main aim is two-fold, 1): to provide a comprehensive and in-depth survey of the deep learning-based underwater image enhancement, which covers various perspectives ranging from algorithms to open issues, and 2): to conduct a qualitative and quantitative comparison of the deep algorithms on diverse datasets to serve as a benchmark, which has been barely explored before. To be specific, we first introduce the underwater image formation models, which are the base of training data synthesis and design of deep networks, and also helpful for understanding the process of underwater image degradation. Then, we review deep underwater image enhancement algorithms, and a glimpse of some of the aspects of the current networks is presented including network architecture, network parameters, training data, loss function, and training configurations. We also summarize the evaluation metrics and underwater image datasets. Following that, a systematically experimental comparison is carried out to analyze the robustness and effectiveness of deep algorithms. Meanwhile, we point out the shortcomings of current benchmark datasets and evaluation metrics. Finally, we discuss several unsolved open issues and suggest possible research directions. We hope that all efforts done in this paper might serve as a comprehensive reference for future research and call for the development of deep learning-based underwater image enhancement. |
Tasks | Image Enhancement |
Published | 2019-07-17 |
URL | https://arxiv.org/abs/1907.07863v1 |
https://arxiv.org/pdf/1907.07863v1.pdf | |
PWC | https://paperswithcode.com/paper/diving-deeper-into-underwater-image |
Repo | https://github.com/saeed-anwar/UWSurvey |
Framework | none |
BP-Transformer: Modelling Long-Range Context via Binary Partitioning
Title | BP-Transformer: Modelling Long-Range Context via Binary Partitioning |
Authors | Zihao Ye, Qipeng Guo, Quan Gan, Xipeng Qiu, Zheng Zhang |
Abstract | The Transformer model is widely successful on many natural language processing tasks. However, the quadratic complexity of self-attention limit its application on long text. In this paper, adopting a fine-to-coarse attention mechanism on multi-scale spans via binary partitioning (BP), we propose BP-Transformer (BPT for short). BPT yields $O(k\cdot n\log (n/k))$ connections where $k$ is a hyperparameter to control the density of attention. BPT has a good balance between computation complexity and model capacity. A series of experiments on text classification, machine translation and language modeling shows BPT has a superior performance for long text than previous self-attention models. Our code, hyperparameters and CUDA kernels for sparse attention are available in PyTorch. |
Tasks | Language Modelling, Machine Translation, Sentiment Analysis, Text Classification |
Published | 2019-11-11 |
URL | https://arxiv.org/abs/1911.04070v1 |
https://arxiv.org/pdf/1911.04070v1.pdf | |
PWC | https://paperswithcode.com/paper/bp-transformer-modelling-long-range-context |
Repo | https://github.com/yzh119/BPT |
Framework | pytorch |
Style Generator Inversion for Image Enhancement and Animation
Title | Style Generator Inversion for Image Enhancement and Animation |
Authors | Aviv Gabbay, Yedid Hoshen |
Abstract | One of the main motivations for training high quality image generative models is their potential use as tools for image manipulation. Recently, generative adversarial networks (GANs) have been able to generate images of remarkable quality. Unfortunately, adversarially-trained unconditional generator networks have not been successful as image priors. One of the main requirements for a network to act as a generative image prior, is being able to generate every possible image from the target distribution. Adversarial learning often experiences mode-collapse, which manifests in generators that cannot generate some modes of the target distribution. Another requirement often not satisfied is invertibility i.e. having an efficient way of finding a valid input latent code given a required output image. In this work, we show that differently from earlier GANs, the very recently proposed style-generators are quite easy to invert. We use this important observation to propose style generators as general purpose image priors. We show that style generators outperform other GANs as well as Deep Image Prior as priors for image enhancement tasks. The latent space spanned by style-generators satisfies linear identity-pose relations. The latent space linearity, combined with invertibility, allows us to animate still facial images without supervision. Extensive experiments are performed to support the main contributions of this paper. |
Tasks | Image Enhancement |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.11880v1 |
https://arxiv.org/pdf/1906.11880v1.pdf | |
PWC | https://paperswithcode.com/paper/style-generator-inversion-for-image |
Repo | https://github.com/avivga/style-image-prior |
Framework | tf |
Learning Space Partitions for Nearest Neighbor Search
Title | Learning Space Partitions for Nearest Neighbor Search |
Authors | Yihe Dong, Piotr Indyk, Ilya Razenshteyn, Tal Wagner |
Abstract | Space partitions of $\mathbb{R}^d$ underlie a vast and important class of fast nearest neighbor search (NNS) algorithms. Inspired by recent theoretical work on NNS for general metric spaces [Andoni, Naor, Nikolov, Razenshteyn, Waingarten STOC 2018, FOCS 2018], we develop a new framework for building space partitions reducing the problem to \emph{balanced graph partitioning} followed by \emph{supervised classification.} We instantiate this general approach with the KaHIP graph partitioner [Sanders, Schulz SEA 2013] and neural networks, respectively, to obtain a new partitioning procedure called Neural Locality-Sensitive Hashing (Neural LSH). On several standard benchmarks for NNS, our experiments show that the partitions obtained by Neural LSH consistently outperform partitions found by quantization-based and tree-based methods. |
Tasks | graph partitioning, Quantization |
Published | 2019-01-24 |
URL | https://arxiv.org/abs/1901.08544v3 |
https://arxiv.org/pdf/1901.08544v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-sublinear-time-indexing-for-nearest |
Repo | https://github.com/twistedcubic/learn-to-hash |
Framework | pytorch |
Linguistic Versus Latent Relations for Modeling Coherent Flow in Paragraphs
Title | Linguistic Versus Latent Relations for Modeling Coherent Flow in Paragraphs |
Authors | Dongyeop Kang, Hiroaki Hayashi, Alan W Black, Eduard Hovy |
Abstract | Generating a long, coherent text such as a paragraph requires a high-level control of different levels of relations between sentences (e.g., tense, coreference). We call such a logical connection between sentences as a (paragraph) flow. In order to produce a coherent flow of text, we explore two forms of intersentential relations in a paragraph: one is a human-created linguistical relation that forms a structure (e.g., discourse tree) and the other is a relation from latent representation learned from the sentences themselves. Our two proposed models incorporate each form of relations into document-level language models: the former is a supervised model that jointly learns a language model as well as discourse relation prediction, and the latter is an unsupervised model that is hierarchically conditioned by a recurrent neural network (RNN) over the latent information. Our proposed models with both forms of relations outperform the baselines in partially conditioned paragraph generation task. Our codes and data are publicly available. |
Tasks | Language Modelling |
Published | 2019-08-30 |
URL | https://arxiv.org/abs/1908.11790v1 |
https://arxiv.org/pdf/1908.11790v1.pdf | |
PWC | https://paperswithcode.com/paper/linguistic-versus-latent-relations-for |
Repo | https://github.com/dykang/flownet |
Framework | tf |