February 1, 2020

3141 words 15 mins read

Paper Group AWR 233

Deep Hyperedges: a Framework for Transductive and Inductive Learning on Hypergraphs. Text2FaceGAN: Face Generation from Fine Grained Textual Descriptions. Training Neural Networks for and by Interpolation. Synthetic Ground Truth Generation for Evaluating Generative Policy Models. Accelerating Deep Learning by Focusing on the Biggest Losers. Portugu …

Deep Hyperedges: a Framework for Transductive and Inductive Learning on Hypergraphs


Title	Deep Hyperedges: a Framework for Transductive and Inductive Learning on Hypergraphs
Authors	Josh Payne
Abstract	From social networks to protein complexes to disease genomes to visual data, hypergraphs are everywhere. However, the scope of research studying deep learning on hypergraphs is still quite sparse and nascent, as there has not yet existed an effective, unified framework for using hyperedge and vertex embeddings jointly in the hypergraph context, despite a large body of prior work that has shown the utility of deep learning over graphs and sets. Building upon these recent advances, we propose \textit{Deep Hyperedges} (DHE), a modular framework that jointly uses contextual and permutation-invariant vertex membership properties of hyperedges in hypergraphs to perform classification and regression in transductive and inductive learning settings. In our experiments, we use a novel random walk procedure and show that our model achieves and, in most cases, surpasses state-of-the-art performance on benchmark datasets. Additionally, we study our framework’s performance on a variety of diverse, non-standard hypergraph datasets and propose several avenues of future work to further enhance DHE.
Tasks	hyperedge classification, hypergraph embedding, Node Classification
Published	2019-10-07
URL	https://arxiv.org/abs/1910.02633v1
PDF	https://arxiv.org/pdf/1910.02633v1.pdf
PWC	https://paperswithcode.com/paper/deep-hyperedges-a-framework-for-transductive
Repo	https://github.com/Josh-Payne/deep-hyperedges
Framework	tf

Text2FaceGAN: Face Generation from Fine Grained Textual Descriptions


Title	Text2FaceGAN: Face Generation from Fine Grained Textual Descriptions
Authors	Osaid Rehman Nasir, Shailesh Kumar Jha, Manraj Singh Grover, Yi Yu, Ajit Kumar, Rajiv Ratn Shah
Abstract	Powerful generative adversarial networks (GAN) have been developed to automatically synthesize realistic images from text. However, most existing tasks are limited to generating simple images such as flowers from captions. In this work, we extend this problem to the less addressed domain of face generation from fine-grained textual descriptions of face, e.g., “A person has curly hair, oval face, and mustache”. We are motivated by the potential of automated face generation to impact and assist critical tasks such as criminal face reconstruction. Since current datasets for the task are either very small or do not contain captions, we generate captions for images in the CelebA dataset by creating an algorithm to automatically convert a list of attributes to a set of captions. We then model the highly multi-modal problem of text to face generation as learning the conditional distribution of faces (conditioned on text) in same latent space. We utilize the current state-of-the-art GAN (DC-GAN with GAN-CLS loss) for learning conditional multi-modality. The presence of more fine-grained details and variable length of the captions makes the problem easier for a user but more difficult to handle compared to the other text-to-image tasks. We flipped the labels for real and fake images and added noise in discriminator. Generated images for diverse textual descriptions show promising results. In the end, we show how the widely used inceptions score is not a good metric to evaluate the performance of generative models used for synthesizing faces from text.
Tasks	Face Generation, Face Reconstruction
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11378v1
PDF	https://arxiv.org/pdf/1911.11378v1.pdf
PWC	https://paperswithcode.com/paper/text2facegan-face-generation-from-fine
Repo	https://github.com/midas-research/text2facegan
Framework	tf

Training Neural Networks for and by Interpolation


Title	Training Neural Networks for and by Interpolation
Authors	Leonard Berrada, Andrew Zisserman, M. Pawan Kumar
Abstract	The majority of modern deep learning models are able to interpolate the data: the empirical loss can be driven near zero on all samples simultaneously. In this work, we explicitly exploit this interpolation property for the design of a new optimization algorithm for deep learning. Specifically, we use it to compute an adaptive learning-rate given a stochastic gradient direction. This results in the Adaptive Learning-rates for Interpolation with Gradients (ALI-G) algorithm. ALI-G retains the advantages of SGD, which are low computational cost and provable convergence in the convex setting. But unlike SGD, the learning-rate of ALI-G can be computed inexpensively in closed-form and does not require a manual schedule. We provide a detailed analysis of ALI-G in the stochastic convex setting with explicit convergence rates. In order to obtain good empirical performance in deep learning, we extend the algorithm to use a maximal learning-rate, which gives a single hyper-parameter to tune. We show that employing such a maximal learning-rate has an intuitive proximal interpretation and preserves all convergence guarantees. We provide experiments on a variety of architectures and tasks: (i) learning a differentiable neural computer; (ii) training a wide residual network on the SVHN data set; (iii) training a Bi-LSTM on the SNLI data set; and (iv) training wide residual networks and densely connected networks on the CIFAR data sets. We empirically show that ALI-G outperforms adaptive gradient methods such as Adam, and provides comparable performance with SGD, although SGD benefits from manual learning rate schedules. We release PyTorch and Tensorflow implementations of ALI-G as standalone optimizers that can be used as a drop-in replacement in existing code (code available at https://github.com/oval-group/ali-g ).
Tasks
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05661v1
PDF	https://arxiv.org/pdf/1906.05661v1.pdf
PWC	https://paperswithcode.com/paper/training-neural-networks-for-and-by
Repo	https://github.com/oval-group/ali-g
Framework	pytorch

Synthetic Ground Truth Generation for Evaluating Generative Policy Models


Title	Synthetic Ground Truth Generation for Evaluating Generative Policy Models
Authors	Daniel Cunnington, Graham White, Geeth de Mel
Abstract	Generative Policy-based Models aim to enable a coalition of systems, be they devices or services to adapt according to contextual changes such as environmental factors, user preferences and different tasks whilst adhering to various constraints and regulations as directed by a managing party or the collective vision of the coalition. Recent developments have proposed new architectures to realize the potential of GPMs but as the complexity of systems and their associated requirements increases, there is an emerging requirement to have scenarios and associated datasets to realistically evaluate GPMs with respect to the properties of the operating environment, be it the future battlespace or an autonomous organization. In order to address this requirement, in this paper, we present a method of applying an agile knowledge representation framework to model requirements, both individualistic and collective that enables synthetic generation of ground truth data such that advanced GPMs can be evaluated robustly in complex environments. We also release conceptual models, annotated datasets, as well as means to extend the data generation approach so that similar datasets can be developed for varying complexities and different situations.
Tasks
Published	2019-04-26
URL	http://arxiv.org/abs/1904.13233v1
PDF	http://arxiv.org/pdf/1904.13233v1.pdf
PWC	https://paperswithcode.com/paper/synthetic-ground-truth-generation-for
Repo	https://github.com/dais-ita/coalition-data
Framework	none

Accelerating Deep Learning by Focusing on the Biggest Losers


Title	Accelerating Deep Learning by Focusing on the Biggest Losers
Authors	Angela H. Jiang, Daniel L. -K. Wong, Giulio Zhou, David G. Andersen, Jeffrey Dean, Gregory R. Ganger, Gauri Joshi, Michael Kaminksy, Michael Kozuch, Zachary C. Lipton, Padmanabhan Pillai
Abstract	This paper introduces Selective-Backprop, a technique that accelerates the training of deep neural networks (DNNs) by prioritizing examples with high loss at each iteration. Selective-Backprop uses the output of a training example’s forward pass to decide whether to use that example to compute gradients and update parameters, or to skip immediately to the next example. By reducing the number of computationally-expensive backpropagation steps performed, Selective-Backprop accelerates training. Evaluation on CIFAR10, CIFAR100, and SVHN, across a variety of modern image models, shows that Selective-Backprop converges to target error rates up to 3.5x faster than with standard SGD and between 1.02–1.8x faster than a state-of-the-art importance sampling approach. Further acceleration of 26% can be achieved by using stale forward pass results for selection, thus also skipping forward passes of low priority examples.
Tasks
Published	2019-10-02
URL	https://arxiv.org/abs/1910.00762v1
PDF	https://arxiv.org/pdf/1910.00762v1.pdf
PWC	https://paperswithcode.com/paper/accelerating-deep-learning-by-focusing-on-the
Repo	https://github.com/Manuscrit/SelectiveBackPropagation
Framework	pytorch

Portuguese Named Entity Recognition using BERT-CRF


Title	Portuguese Named Entity Recognition using BERT-CRF
Authors	Fábio Souza, Rodrigo Nogueira, Roberto Lotufo
Abstract	Recent advances in language representation using neural networks have made it viable to transfer the learned internal states of a trained model to downstream natural language processing tasks, such as named entity recognition (NER) and question answering. It has been shown that the leverage of pre-trained language models improves the overall performance on many tasks and is highly beneficial when labeled data is scarce. In this work, we train Portuguese BERT models and employ a BERT-CRF architecture to the NER task on the Portuguese language, combining the transfer capabilities of BERT with the structured predictions of CRF. We explore feature-based and fine-tuning training strategies for the BERT model. Our fine-tuning approach obtains new state-of-the-art results on the HAREM I dataset, improving the F1-score by 1 point on the selective scenario (5 NE classes) and by 4 points on the total scenario (10 NE classes).
Tasks	Named Entity Recognition, Question Answering
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10649v2
PDF	https://arxiv.org/pdf/1909.10649v2.pdf
PWC	https://paperswithcode.com/paper/portuguese-named-entity-recognition-using-1
Repo	https://github.com/neuralmind-ai/portuguese-bert
Framework	pytorch

Question Answering as an Automatic Evaluation Metric for News Article Summarization


Title	Question Answering as an Automatic Evaluation Metric for News Article Summarization
Authors	Matan Eyal, Tal Baumel, Michael Elhadad
Abstract	Recent work in the field of automatic summarization and headline generation focuses on maximizing ROUGE scores for various news datasets. We present an alternative, extrinsic, evaluation metric for this task, Answering Performance for Evaluation of Summaries. APES utilizes recent progress in the field of reading-comprehension to quantify the ability of a summary to answer a set of manually created questions regarding central entities in the source article. We first analyze the strength of this metric by comparing it to known manual evaluation metrics. We then present an end-to-end neural abstractive model that maximizes APES, while increasing ROUGE scores to competitive results.
Tasks	Question Answering, Reading Comprehension
Published	2019-06-02
URL	https://arxiv.org/abs/1906.00318v1
PDF	https://arxiv.org/pdf/1906.00318v1.pdf
PWC	https://paperswithcode.com/paper/190600318
Repo	https://github.com/mataney/APES-optimizer
Framework	pytorch

Generative Modeling by Estimating Gradients of the Data Distribution


Title	Generative Modeling by Estimating Gradients of the Data Distribution
Authors	Yang Song, Stefano Ermon
Abstract	We introduce a new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching. Because gradients can be ill-defined and hard to estimate when the data resides on low-dimensional manifolds, we perturb the data with different levels of Gaussian noise, and jointly estimate the corresponding scores, i.e., the vector fields of gradients of the perturbed data distribution for all noise levels. For sampling, we propose an annealed Langevin dynamics where we use gradients corresponding to gradually decreasing noise levels as the sampling process gets closer to the data manifold. Our framework allows flexible model architectures, requires no sampling during training or the use of adversarial methods, and provides a learning objective that can be used for principled model comparisons. Our models produce samples comparable to GANs on MNIST, CelebA and CIFAR-10 datasets, achieving a new state-of-the-art inception score of 8.87 on CIFAR-10. Additionally, we demonstrate that our models learn effective representations via image inpainting experiments.
Tasks	Image Generation, Image Inpainting
Published	2019-07-12
URL	https://arxiv.org/abs/1907.05600v2
PDF	https://arxiv.org/pdf/1907.05600v2.pdf
PWC	https://paperswithcode.com/paper/generative-modeling-by-estimating-gradients
Repo	https://github.com/Lornatang/PyTorch-NCSN
Framework	pytorch

TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning


Title	TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning
Authors	Sung Whan Yoon, Jun Seo, Jaekyun Moon
Abstract	Handling previously unseen tasks after given only a few training examples continues to be a tough challenge in machine learning. We propose TapNets, neural networks augmented with task-adaptive projection for improved few-shot learning. Here, employing a meta-learning strategy with episode-based training, a network and a set of per-class reference vectors are learned across widely varying tasks. At the same time, for every episode, features in the embedding space are linearly projected into a new space as a form of quick task-specific conditioning. The training loss is obtained based on a distance metric between the query and the reference vectors in the projection space. Excellent generalization results in this way. When tested on the Omniglot, miniImageNet and tieredImageNet datasets, we obtain state of the art classification accuracies under various few-shot scenarios.
Tasks	Few-Shot Learning, Meta-Learning, Omniglot
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06549v2
PDF	https://arxiv.org/pdf/1905.06549v2.pdf
PWC	https://paperswithcode.com/paper/tapnet-neural-network-augmented-with-task
Repo	https://github.com/istarjun/TapNet
Framework	none

Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering


Title	Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering
Authors	Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Andrew McCallum
Abstract	This paper introduces a new framework for open-domain question answering in which the retriever and the reader iteratively interact with each other. The framework is agnostic to the architecture of the machine reading model, only requiring access to the token-level hidden representations of the reader. The retriever uses fast nearest neighbor search to scale to corpora containing millions of paragraphs. A gated recurrent unit updates the query at each step conditioned on the state of the reader and the reformulated query is used to re-rank the paragraphs by the retriever. We conduct analysis and show that iterative interaction helps in retrieving informative paragraphs from the corpus. Finally, we show that our multi-step-reasoning framework brings consistent improvement when applied to two widely used reader architectures DrQA and BiDAF on various large open-domain datasets — TriviaQA-unfiltered, QuasarT, SearchQA, and SQuAD-Open.
Tasks	Open-Domain Question Answering, Question Answering, Reading Comprehension
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05733v1
PDF	https://arxiv.org/pdf/1905.05733v1.pdf
PWC	https://paperswithcode.com/paper/multi-step-retriever-reader-interaction-for-1
Repo	https://github.com/rajarshd/Multi-Step-Reasoning
Framework	pytorch

Neural Academic Paper Generation


Title	Neural Academic Paper Generation
Authors	Samet Demir, Uras Mutlu, Özgur Özdemir
Abstract	In this work, we tackle the problem of structured text generation, specifically academic paper generation in $\LaTeX{}$, inspired by the surprisingly good results of basic character-level language models. Our motivation is using more recent and advanced methods of language modeling on a more complex dataset of $\LaTeX{}$ source files to generate realistic academic papers. Our first contribution is preparing a dataset with $\LaTeX{}$ source files on recent open-source computer vision papers. Our second contribution is experimenting with recent methods of language modeling and text generation such as Transformer and Transformer-XL to generate consistent $\LaTeX{}$ code. We report cross-entropy and bits-per-character (BPC) results of the trained models, and we also discuss interesting points on some examples of the generated $\LaTeX{}$ code.
Tasks	Language Modelling, Paper generation, Text Generation
Published	2019-12-02
URL	https://arxiv.org/abs/1912.01982v1
PDF	https://arxiv.org/pdf/1912.01982v1.pdf
PWC	https://paperswithcode.com/paper/neural-academic-paper-generation
Repo	https://github.com/inzva/fake-academic-paper-generation
Framework	pytorch

Digital Electronics and Analog Photonics for Convolutional Neural Networks (DEAP-CNNs)


Title	Digital Electronics and Analog Photonics for Convolutional Neural Networks (DEAP-CNNs)
Authors	Viraj Bangari, Bicky A. Marquez, Heidi B. Miller, Alexander N. Tait, Mitchell A. Nahmias, Thomas Ferreira de Lima, Hsuan-Tung Peng, Paul R. Prucnal, Bhavin J. Shastri
Abstract	Convolutional Neural Networks (CNNs) are powerful and highly ubiquitous tools for extracting features from large datasets for applications such as computer vision and natural language processing. However, a convolution is a computationally expensive operation in digital electronics. In contrast, neuromorphic photonic systems, which have experienced a recent surge of interest over the last few years, propose higher bandwidth and energy efficiencies for neural network training and inference. Neuromorphic photonics exploits the advantages of optical electronics, including the ease of analog processing, and busing multiple signals on a single waveguide at the speed of light. Here, we propose a Digital Electronic and Analog Photonic (DEAP) CNN hardware architecture that has potential to be 2.8 to 14 times faster while maintaining the same power usage of current state-of-the-art GPUs.
Tasks
Published	2019-04-23
URL	https://arxiv.org/abs/1907.01525v1
PDF	https://arxiv.org/pdf/1907.01525v1.pdf
PWC	https://paperswithcode.com/paper/digital-electronics-and-analog-photonics-for
Repo	https://github.com/Shastri-Lab/DEAP
Framework	none

Interpretations are useful: penalizing explanations to align neural networks with prior knowledge


Title	Interpretations are useful: penalizing explanations to align neural networks with prior knowledge
Authors	Laura Rieger, Chandan Singh, W. James Murdoch, Bin Yu
Abstract	For an explanation of a deep learning model to be effective, it must provide both insight into a model and suggest a corresponding action in order to achieve some objective. Too often, the litany of proposed explainable deep learning methods stop at the first step, providing practitioners with insight into a model, but no way to act on it. In this paper, we propose contextual decomposition explanation penalization (CDEP), a method which enables practitioners to leverage existing explanation methods in order to increase the predictive accuracy of deep learning models. In particular, when shown that a model has incorrectly assigned importance to some features, CDEP enables practitioners to correct these errors by directly regularizing the provided explanations. Using explanations provided by contextual decomposition (CD) (Murdoch et al., 2018), we demonstrate the ability of our method to increase performance on an array of toy and real datasets.
Tasks
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13584v2
PDF	https://arxiv.org/pdf/1909.13584v2.pdf
PWC	https://paperswithcode.com/paper/interpretations-are-useful-penalizing
Repo	https://github.com/csinva/hierarchical-dnn-interpretations
Framework	pytorch

EduBERT: Pretrained Deep Language Models for Learning Analytics


Title	EduBERT: Pretrained Deep Language Models for Learning Analytics
Authors	Benjamin Clavié, Kobi Gal
Abstract	The use of large pretrained neural networks to create contextualized word embeddings has drastically improved performance on several natural language processing (NLP) tasks. These computationally expensive models have begun to be applied to domain-specific NLP tasks such as re-hospitalization prediction from clinical notes. This paper demonstrates that using large pretrained models produces excellent results on common learning analytics tasks. Pre-training deep language models using student forum data from a wide array of online courses improves performance beyond the state of the art on three text classification tasks. We also show that a smaller, distilled version of our model produces the best results on two of the three tasks while limiting computational cost. We make both models available to the research community at large.
Tasks	Text Classification, Word Embeddings
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00690v1
PDF	https://arxiv.org/pdf/1912.00690v1.pdf
PWC	https://paperswithcode.com/paper/edubert-pretrained-deep-language-models-for
Repo	https://github.com/bclavie/edubert
Framework	pytorch

Cycle-consistent Conditional Adversarial Transfer Networks


Title	Cycle-consistent Conditional Adversarial Transfer Networks
Authors	Jingjing Li, Erpeng Chen, Zhengming Ding, Lei Zhu, Ke Lu, Zi Huang
Abstract	Domain adaptation investigates the problem of cross-domain knowledge transfer where the labeled source domain and unlabeled target domain have distinctive data distributions. Recently, adversarial training have been successfully applied to domain adaptation and achieved state-of-the-art performance. However, there is still a fatal weakness existing in current adversarial models which is raised from the equilibrium challenge of adversarial training. Specifically, although most of existing methods are able to confuse the domain discriminator, they cannot guarantee that the source domain and target domain are sufficiently similar. In this paper, we propose a novel approach named {\it cycle-consistent conditional adversarial transfer networks} (3CATN) to handle this issue. Our approach takes care of the domain alignment by leveraging adversarial training. Specifically, we condition the adversarial networks with the cross-covariance of learned features and classifier predictions to capture the multimodal structures of data distributions. However, since the classifier predictions are not certainty information, a strong condition with the predictions is risky when the predictions are not accurate. We, therefore, further propose that the truly domain-invariant features should be able to be translated from one domain to the other. To this end, we introduce two feature translation losses and one cycle-consistent loss into the conditional adversarial domain adaptation networks. Extensive experiments on both classical and large-scale datasets verify that our model is able to outperform previous state-of-the-arts with significant improvements.
Tasks	Domain Adaptation, Transfer Learning
Published	2019-09-17
URL	https://arxiv.org/abs/1909.07618v1
PDF	https://arxiv.org/pdf/1909.07618v1.pdf
PWC	https://paperswithcode.com/paper/cycle-consistent-conditional-adversarial
Repo	https://github.com/lijin118/3CATN
Framework	pytorch