Paper Group AWR 233
Deep Hyperedges: a Framework for Transductive and Inductive Learning on Hypergraphs. Text2FaceGAN: Face Generation from Fine Grained Textual Descriptions. Training Neural Networks for and by Interpolation. Synthetic Ground Truth Generation for Evaluating Generative Policy Models. Accelerating Deep Learning by Focusing on the Biggest Losers. Portugu …
Deep Hyperedges: a Framework for Transductive and Inductive Learning on Hypergraphs
Title | Deep Hyperedges: a Framework for Transductive and Inductive Learning on Hypergraphs |
Authors | Josh Payne |
Abstract | From social networks to protein complexes to disease genomes to visual data, hypergraphs are everywhere. However, the scope of research studying deep learning on hypergraphs is still quite sparse and nascent, as there has not yet existed an effective, unified framework for using hyperedge and vertex embeddings jointly in the hypergraph context, despite a large body of prior work that has shown the utility of deep learning over graphs and sets. Building upon these recent advances, we propose \textit{Deep Hyperedges} (DHE), a modular framework that jointly uses contextual and permutation-invariant vertex membership properties of hyperedges in hypergraphs to perform classification and regression in transductive and inductive learning settings. In our experiments, we use a novel random walk procedure and show that our model achieves and, in most cases, surpasses state-of-the-art performance on benchmark datasets. Additionally, we study our framework’s performance on a variety of diverse, non-standard hypergraph datasets and propose several avenues of future work to further enhance DHE. |
Tasks | hyperedge classification, hypergraph embedding, Node Classification |
Published | 2019-10-07 |
URL | https://arxiv.org/abs/1910.02633v1 |
https://arxiv.org/pdf/1910.02633v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-hyperedges-a-framework-for-transductive |
Repo | https://github.com/Josh-Payne/deep-hyperedges |
Framework | tf |
Text2FaceGAN: Face Generation from Fine Grained Textual Descriptions
Title | Text2FaceGAN: Face Generation from Fine Grained Textual Descriptions |
Authors | Osaid Rehman Nasir, Shailesh Kumar Jha, Manraj Singh Grover, Yi Yu, Ajit Kumar, Rajiv Ratn Shah |
Abstract | Powerful generative adversarial networks (GAN) have been developed to automatically synthesize realistic images from text. However, most existing tasks are limited to generating simple images such as flowers from captions. In this work, we extend this problem to the less addressed domain of face generation from fine-grained textual descriptions of face, e.g., “A person has curly hair, oval face, and mustache”. We are motivated by the potential of automated face generation to impact and assist critical tasks such as criminal face reconstruction. Since current datasets for the task are either very small or do not contain captions, we generate captions for images in the CelebA dataset by creating an algorithm to automatically convert a list of attributes to a set of captions. We then model the highly multi-modal problem of text to face generation as learning the conditional distribution of faces (conditioned on text) in same latent space. We utilize the current state-of-the-art GAN (DC-GAN with GAN-CLS loss) for learning conditional multi-modality. The presence of more fine-grained details and variable length of the captions makes the problem easier for a user but more difficult to handle compared to the other text-to-image tasks. We flipped the labels for real and fake images and added noise in discriminator. Generated images for diverse textual descriptions show promising results. In the end, we show how the widely used inceptions score is not a good metric to evaluate the performance of generative models used for synthesizing faces from text. |
Tasks | Face Generation, Face Reconstruction |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11378v1 |
https://arxiv.org/pdf/1911.11378v1.pdf | |
PWC | https://paperswithcode.com/paper/text2facegan-face-generation-from-fine |
Repo | https://github.com/midas-research/text2facegan |
Framework | tf |
Training Neural Networks for and by Interpolation
Title | Training Neural Networks for and by Interpolation |
Authors | Leonard Berrada, Andrew Zisserman, M. Pawan Kumar |
Abstract | The majority of modern deep learning models are able to interpolate the data: the empirical loss can be driven near zero on all samples simultaneously. In this work, we explicitly exploit this interpolation property for the design of a new optimization algorithm for deep learning. Specifically, we use it to compute an adaptive learning-rate given a stochastic gradient direction. This results in the Adaptive Learning-rates for Interpolation with Gradients (ALI-G) algorithm. ALI-G retains the advantages of SGD, which are low computational cost and provable convergence in the convex setting. But unlike SGD, the learning-rate of ALI-G can be computed inexpensively in closed-form and does not require a manual schedule. We provide a detailed analysis of ALI-G in the stochastic convex setting with explicit convergence rates. In order to obtain good empirical performance in deep learning, we extend the algorithm to use a maximal learning-rate, which gives a single hyper-parameter to tune. We show that employing such a maximal learning-rate has an intuitive proximal interpretation and preserves all convergence guarantees. We provide experiments on a variety of architectures and tasks: (i) learning a differentiable neural computer; (ii) training a wide residual network on the SVHN data set; (iii) training a Bi-LSTM on the SNLI data set; and (iv) training wide residual networks and densely connected networks on the CIFAR data sets. We empirically show that ALI-G outperforms adaptive gradient methods such as Adam, and provides comparable performance with SGD, although SGD benefits from manual learning rate schedules. We release PyTorch and Tensorflow implementations of ALI-G as standalone optimizers that can be used as a drop-in replacement in existing code (code available at https://github.com/oval-group/ali-g ). |
Tasks | |
Published | 2019-06-13 |
URL | https://arxiv.org/abs/1906.05661v1 |
https://arxiv.org/pdf/1906.05661v1.pdf | |
PWC | https://paperswithcode.com/paper/training-neural-networks-for-and-by |
Repo | https://github.com/oval-group/ali-g |
Framework | pytorch |
Synthetic Ground Truth Generation for Evaluating Generative Policy Models
Title | Synthetic Ground Truth Generation for Evaluating Generative Policy Models |
Authors | Daniel Cunnington, Graham White, Geeth de Mel |
Abstract | Generative Policy-based Models aim to enable a coalition of systems, be they devices or services to adapt according to contextual changes such as environmental factors, user preferences and different tasks whilst adhering to various constraints and regulations as directed by a managing party or the collective vision of the coalition. Recent developments have proposed new architectures to realize the potential of GPMs but as the complexity of systems and their associated requirements increases, there is an emerging requirement to have scenarios and associated datasets to realistically evaluate GPMs with respect to the properties of the operating environment, be it the future battlespace or an autonomous organization. In order to address this requirement, in this paper, we present a method of applying an agile knowledge representation framework to model requirements, both individualistic and collective that enables synthetic generation of ground truth data such that advanced GPMs can be evaluated robustly in complex environments. We also release conceptual models, annotated datasets, as well as means to extend the data generation approach so that similar datasets can be developed for varying complexities and different situations. |
Tasks | |
Published | 2019-04-26 |
URL | http://arxiv.org/abs/1904.13233v1 |
http://arxiv.org/pdf/1904.13233v1.pdf | |
PWC | https://paperswithcode.com/paper/synthetic-ground-truth-generation-for |
Repo | https://github.com/dais-ita/coalition-data |
Framework | none |
Accelerating Deep Learning by Focusing on the Biggest Losers
Title | Accelerating Deep Learning by Focusing on the Biggest Losers |
Authors | Angela H. Jiang, Daniel L. -K. Wong, Giulio Zhou, David G. Andersen, Jeffrey Dean, Gregory R. Ganger, Gauri Joshi, Michael Kaminksy, Michael Kozuch, Zachary C. Lipton, Padmanabhan Pillai |
Abstract | This paper introduces Selective-Backprop, a technique that accelerates the training of deep neural networks (DNNs) by prioritizing examples with high loss at each iteration. Selective-Backprop uses the output of a training example’s forward pass to decide whether to use that example to compute gradients and update parameters, or to skip immediately to the next example. By reducing the number of computationally-expensive backpropagation steps performed, Selective-Backprop accelerates training. Evaluation on CIFAR10, CIFAR100, and SVHN, across a variety of modern image models, shows that Selective-Backprop converges to target error rates up to 3.5x faster than with standard SGD and between 1.02–1.8x faster than a state-of-the-art importance sampling approach. Further acceleration of 26% can be achieved by using stale forward pass results for selection, thus also skipping forward passes of low priority examples. |
Tasks | |
Published | 2019-10-02 |
URL | https://arxiv.org/abs/1910.00762v1 |
https://arxiv.org/pdf/1910.00762v1.pdf | |
PWC | https://paperswithcode.com/paper/accelerating-deep-learning-by-focusing-on-the |
Repo | https://github.com/Manuscrit/SelectiveBackPropagation |
Framework | pytorch |
Portuguese Named Entity Recognition using BERT-CRF
Title | Portuguese Named Entity Recognition using BERT-CRF |
Authors | Fábio Souza, Rodrigo Nogueira, Roberto Lotufo |
Abstract | Recent advances in language representation using neural networks have made it viable to transfer the learned internal states of a trained model to downstream natural language processing tasks, such as named entity recognition (NER) and question answering. It has been shown that the leverage of pre-trained language models improves the overall performance on many tasks and is highly beneficial when labeled data is scarce. In this work, we train Portuguese BERT models and employ a BERT-CRF architecture to the NER task on the Portuguese language, combining the transfer capabilities of BERT with the structured predictions of CRF. We explore feature-based and fine-tuning training strategies for the BERT model. Our fine-tuning approach obtains new state-of-the-art results on the HAREM I dataset, improving the F1-score by 1 point on the selective scenario (5 NE classes) and by 4 points on the total scenario (10 NE classes). |
Tasks | Named Entity Recognition, Question Answering |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.10649v2 |
https://arxiv.org/pdf/1909.10649v2.pdf | |
PWC | https://paperswithcode.com/paper/portuguese-named-entity-recognition-using-1 |
Repo | https://github.com/neuralmind-ai/portuguese-bert |
Framework | pytorch |
Question Answering as an Automatic Evaluation Metric for News Article Summarization
Title | Question Answering as an Automatic Evaluation Metric for News Article Summarization |
Authors | Matan Eyal, Tal Baumel, Michael Elhadad |
Abstract | Recent work in the field of automatic summarization and headline generation focuses on maximizing ROUGE scores for various news datasets. We present an alternative, extrinsic, evaluation metric for this task, Answering Performance for Evaluation of Summaries. APES utilizes recent progress in the field of reading-comprehension to quantify the ability of a summary to answer a set of manually created questions regarding central entities in the source article. We first analyze the strength of this metric by comparing it to known manual evaluation metrics. We then present an end-to-end neural abstractive model that maximizes APES, while increasing ROUGE scores to competitive results. |
Tasks | Question Answering, Reading Comprehension |
Published | 2019-06-02 |
URL | https://arxiv.org/abs/1906.00318v1 |
https://arxiv.org/pdf/1906.00318v1.pdf | |
PWC | https://paperswithcode.com/paper/190600318 |
Repo | https://github.com/mataney/APES-optimizer |
Framework | pytorch |
Generative Modeling by Estimating Gradients of the Data Distribution
Title | Generative Modeling by Estimating Gradients of the Data Distribution |
Authors | Yang Song, Stefano Ermon |
Abstract | We introduce a new generative model where samples are produced via Langevin dynamics using gradients of the data distribution estimated with score matching. Because gradients can be ill-defined and hard to estimate when the data resides on low-dimensional manifolds, we perturb the data with different levels of Gaussian noise, and jointly estimate the corresponding scores, i.e., the vector fields of gradients of the perturbed data distribution for all noise levels. For sampling, we propose an annealed Langevin dynamics where we use gradients corresponding to gradually decreasing noise levels as the sampling process gets closer to the data manifold. Our framework allows flexible model architectures, requires no sampling during training or the use of adversarial methods, and provides a learning objective that can be used for principled model comparisons. Our models produce samples comparable to GANs on MNIST, CelebA and CIFAR-10 datasets, achieving a new state-of-the-art inception score of 8.87 on CIFAR-10. Additionally, we demonstrate that our models learn effective representations via image inpainting experiments. |
Tasks | Image Generation, Image Inpainting |
Published | 2019-07-12 |
URL | https://arxiv.org/abs/1907.05600v2 |
https://arxiv.org/pdf/1907.05600v2.pdf | |
PWC | https://paperswithcode.com/paper/generative-modeling-by-estimating-gradients |
Repo | https://github.com/Lornatang/PyTorch-NCSN |
Framework | pytorch |
TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning
Title | TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning |
Authors | Sung Whan Yoon, Jun Seo, Jaekyun Moon |
Abstract | Handling previously unseen tasks after given only a few training examples continues to be a tough challenge in machine learning. We propose TapNets, neural networks augmented with task-adaptive projection for improved few-shot learning. Here, employing a meta-learning strategy with episode-based training, a network and a set of per-class reference vectors are learned across widely varying tasks. At the same time, for every episode, features in the embedding space are linearly projected into a new space as a form of quick task-specific conditioning. The training loss is obtained based on a distance metric between the query and the reference vectors in the projection space. Excellent generalization results in this way. When tested on the Omniglot, miniImageNet and tieredImageNet datasets, we obtain state of the art classification accuracies under various few-shot scenarios. |
Tasks | Few-Shot Learning, Meta-Learning, Omniglot |
Published | 2019-05-16 |
URL | https://arxiv.org/abs/1905.06549v2 |
https://arxiv.org/pdf/1905.06549v2.pdf | |
PWC | https://paperswithcode.com/paper/tapnet-neural-network-augmented-with-task |
Repo | https://github.com/istarjun/TapNet |
Framework | none |
Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering
Title | Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering |
Authors | Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Andrew McCallum |
Abstract | This paper introduces a new framework for open-domain question answering in which the retriever and the reader iteratively interact with each other. The framework is agnostic to the architecture of the machine reading model, only requiring access to the token-level hidden representations of the reader. The retriever uses fast nearest neighbor search to scale to corpora containing millions of paragraphs. A gated recurrent unit updates the query at each step conditioned on the state of the reader and the reformulated query is used to re-rank the paragraphs by the retriever. We conduct analysis and show that iterative interaction helps in retrieving informative paragraphs from the corpus. Finally, we show that our multi-step-reasoning framework brings consistent improvement when applied to two widely used reader architectures DrQA and BiDAF on various large open-domain datasets — TriviaQA-unfiltered, QuasarT, SearchQA, and SQuAD-Open. |
Tasks | Open-Domain Question Answering, Question Answering, Reading Comprehension |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05733v1 |
https://arxiv.org/pdf/1905.05733v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-step-retriever-reader-interaction-for-1 |
Repo | https://github.com/rajarshd/Multi-Step-Reasoning |
Framework | pytorch |
Neural Academic Paper Generation
Title | Neural Academic Paper Generation |
Authors | Samet Demir, Uras Mutlu, Özgur Özdemir |
Abstract | In this work, we tackle the problem of structured text generation, specifically academic paper generation in $\LaTeX{}$, inspired by the surprisingly good results of basic character-level language models. Our motivation is using more recent and advanced methods of language modeling on a more complex dataset of $\LaTeX{}$ source files to generate realistic academic papers. Our first contribution is preparing a dataset with $\LaTeX{}$ source files on recent open-source computer vision papers. Our second contribution is experimenting with recent methods of language modeling and text generation such as Transformer and Transformer-XL to generate consistent $\LaTeX{}$ code. We report cross-entropy and bits-per-character (BPC) results of the trained models, and we also discuss interesting points on some examples of the generated $\LaTeX{}$ code. |
Tasks | Language Modelling, Paper generation, Text Generation |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.01982v1 |
https://arxiv.org/pdf/1912.01982v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-academic-paper-generation |
Repo | https://github.com/inzva/fake-academic-paper-generation |
Framework | pytorch |
Digital Electronics and Analog Photonics for Convolutional Neural Networks (DEAP-CNNs)
Title | Digital Electronics and Analog Photonics for Convolutional Neural Networks (DEAP-CNNs) |
Authors | Viraj Bangari, Bicky A. Marquez, Heidi B. Miller, Alexander N. Tait, Mitchell A. Nahmias, Thomas Ferreira de Lima, Hsuan-Tung Peng, Paul R. Prucnal, Bhavin J. Shastri |
Abstract | Convolutional Neural Networks (CNNs) are powerful and highly ubiquitous tools for extracting features from large datasets for applications such as computer vision and natural language processing. However, a convolution is a computationally expensive operation in digital electronics. In contrast, neuromorphic photonic systems, which have experienced a recent surge of interest over the last few years, propose higher bandwidth and energy efficiencies for neural network training and inference. Neuromorphic photonics exploits the advantages of optical electronics, including the ease of analog processing, and busing multiple signals on a single waveguide at the speed of light. Here, we propose a Digital Electronic and Analog Photonic (DEAP) CNN hardware architecture that has potential to be 2.8 to 14 times faster while maintaining the same power usage of current state-of-the-art GPUs. |
Tasks | |
Published | 2019-04-23 |
URL | https://arxiv.org/abs/1907.01525v1 |
https://arxiv.org/pdf/1907.01525v1.pdf | |
PWC | https://paperswithcode.com/paper/digital-electronics-and-analog-photonics-for |
Repo | https://github.com/Shastri-Lab/DEAP |
Framework | none |
Interpretations are useful: penalizing explanations to align neural networks with prior knowledge
Title | Interpretations are useful: penalizing explanations to align neural networks with prior knowledge |
Authors | Laura Rieger, Chandan Singh, W. James Murdoch, Bin Yu |
Abstract | For an explanation of a deep learning model to be effective, it must provide both insight into a model and suggest a corresponding action in order to achieve some objective. Too often, the litany of proposed explainable deep learning methods stop at the first step, providing practitioners with insight into a model, but no way to act on it. In this paper, we propose contextual decomposition explanation penalization (CDEP), a method which enables practitioners to leverage existing explanation methods in order to increase the predictive accuracy of deep learning models. In particular, when shown that a model has incorrectly assigned importance to some features, CDEP enables practitioners to correct these errors by directly regularizing the provided explanations. Using explanations provided by contextual decomposition (CD) (Murdoch et al., 2018), we demonstrate the ability of our method to increase performance on an array of toy and real datasets. |
Tasks | |
Published | 2019-09-30 |
URL | https://arxiv.org/abs/1909.13584v2 |
https://arxiv.org/pdf/1909.13584v2.pdf | |
PWC | https://paperswithcode.com/paper/interpretations-are-useful-penalizing |
Repo | https://github.com/csinva/hierarchical-dnn-interpretations |
Framework | pytorch |
EduBERT: Pretrained Deep Language Models for Learning Analytics
Title | EduBERT: Pretrained Deep Language Models for Learning Analytics |
Authors | Benjamin Clavié, Kobi Gal |
Abstract | The use of large pretrained neural networks to create contextualized word embeddings has drastically improved performance on several natural language processing (NLP) tasks. These computationally expensive models have begun to be applied to domain-specific NLP tasks such as re-hospitalization prediction from clinical notes. This paper demonstrates that using large pretrained models produces excellent results on common learning analytics tasks. Pre-training deep language models using student forum data from a wide array of online courses improves performance beyond the state of the art on three text classification tasks. We also show that a smaller, distilled version of our model produces the best results on two of the three tasks while limiting computational cost. We make both models available to the research community at large. |
Tasks | Text Classification, Word Embeddings |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00690v1 |
https://arxiv.org/pdf/1912.00690v1.pdf | |
PWC | https://paperswithcode.com/paper/edubert-pretrained-deep-language-models-for |
Repo | https://github.com/bclavie/edubert |
Framework | pytorch |
Cycle-consistent Conditional Adversarial Transfer Networks
Title | Cycle-consistent Conditional Adversarial Transfer Networks |
Authors | Jingjing Li, Erpeng Chen, Zhengming Ding, Lei Zhu, Ke Lu, Zi Huang |
Abstract | Domain adaptation investigates the problem of cross-domain knowledge transfer where the labeled source domain and unlabeled target domain have distinctive data distributions. Recently, adversarial training have been successfully applied to domain adaptation and achieved state-of-the-art performance. However, there is still a fatal weakness existing in current adversarial models which is raised from the equilibrium challenge of adversarial training. Specifically, although most of existing methods are able to confuse the domain discriminator, they cannot guarantee that the source domain and target domain are sufficiently similar. In this paper, we propose a novel approach named {\it cycle-consistent conditional adversarial transfer networks} (3CATN) to handle this issue. Our approach takes care of the domain alignment by leveraging adversarial training. Specifically, we condition the adversarial networks with the cross-covariance of learned features and classifier predictions to capture the multimodal structures of data distributions. However, since the classifier predictions are not certainty information, a strong condition with the predictions is risky when the predictions are not accurate. We, therefore, further propose that the truly domain-invariant features should be able to be translated from one domain to the other. To this end, we introduce two feature translation losses and one cycle-consistent loss into the conditional adversarial domain adaptation networks. Extensive experiments on both classical and large-scale datasets verify that our model is able to outperform previous state-of-the-arts with significant improvements. |
Tasks | Domain Adaptation, Transfer Learning |
Published | 2019-09-17 |
URL | https://arxiv.org/abs/1909.07618v1 |
https://arxiv.org/pdf/1909.07618v1.pdf | |
PWC | https://paperswithcode.com/paper/cycle-consistent-conditional-adversarial |
Repo | https://github.com/lijin118/3CATN |
Framework | pytorch |