Paper Group NANR 102
The Synthesis of XNOR Recurrent Neural Networks with Stochastic Logic. Generating Abstractive Summaries with Finetuned Language Models. Identifying Fluently Inadequate Output in Neural and Statistical Machine Translation. A Generative Adversarial Density Estimator. Generalization Error Analysis of Quantized Compressive Learning. Wikipedia as a Reso …
The Synthesis of XNOR Recurrent Neural Networks with Stochastic Logic
Title | The Synthesis of XNOR Recurrent Neural Networks with Stochastic Logic |
Authors | Arash Ardakani, Zhengyun Ji, Amir Ardakani, Warren Gross |
Abstract | The emergence of XNOR networks seek to reduce the model size and computational cost of neural networks for their deployment on specialized hardware requiring real-time processes with limited hardware resources. In XNOR networks, both weights and activations are binary, bringing great benefits to specialized hardware by replacing expensive multiplications with simple XNOR operations. Although XNOR convolutional and fully-connected neural networks have been successfully developed during the past few years, there is no XNOR network implementing commonly-used variants of recurrent neural networks such as long short-term memories (LSTMs). The main computational core of LSTMs involves vector-matrix multiplications followed by a set of non-linear functions and element-wise multiplications to obtain the gate activations and state vectors, respectively. Several previous attempts on quantization of LSTMs only focused on quantization of the vector-matrix multiplications in LSTMs while retaining the element-wise multiplications in full precision. In this paper, we propose a method that converts all the multiplications in LSTMs to XNOR operations using stochastic computing. To this end, we introduce a weighted finite-state machine and its synthesis method to approximate the non-linear functions used in LSTMs on stochastic bit streams. Experimental results show that the proposed XNOR LSTMs reduce the computational complexity of their quantized counterparts by a factor of 86x without any sacrifice on latency while achieving a better accuracy across various temporal tasks. |
Tasks | Quantization |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9052-the-synthesis-of-xnor-recurrent-neural-networks-with-stochastic-logic |
http://papers.nips.cc/paper/9052-the-synthesis-of-xnor-recurrent-neural-networks-with-stochastic-logic.pdf | |
PWC | https://paperswithcode.com/paper/the-synthesis-of-xnor-recurrent-neural |
Repo | |
Framework | |
Generating Abstractive Summaries with Finetuned Language Models
Title | Generating Abstractive Summaries with Finetuned Language Models |
Authors | Sebastian Gehrmann, Zachary Ziegler, Alex Rush, er |
Abstract | Neural abstractive document summarization is commonly approached by models that exhibit a mostly extractive behavior. This behavior is facilitated by a copy-attention which allows models to copy words from a source document. While models in the mostly extractive news summarization domain benefit from this inductive bias, they commonly fail to paraphrase or compress information from the source document. Recent advances in transfer-learning from large pretrained language models give rise to alternative approaches that do not rely on copy-attention and instead learn to generate concise and abstractive summaries. In this paper, as part of the TL;DR challenge, we compare the abstractiveness of summaries from different summarization approaches and show that transfer-learning can be efficiently utilized without any changes to the model architecture. We demonstrate that the approach leads to a higher level of abstraction for a similar performance on the TL;DR challenge tasks, enabling true natural language compression. |
Tasks | Document Summarization, Transfer Learning |
Published | 2019-10-01 |
URL | https://www.aclweb.org/anthology/W19-8665/ |
https://www.aclweb.org/anthology/W19-8665 | |
PWC | https://paperswithcode.com/paper/generating-abstractive-summaries-with |
Repo | |
Framework | |
Identifying Fluently Inadequate Output in Neural and Statistical Machine Translation
Title | Identifying Fluently Inadequate Output in Neural and Statistical Machine Translation |
Authors | Marianna Martindale, Marine Carpuat, Kevin Duh, Paul McNamee |
Abstract | |
Tasks | Machine Translation |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-6623/ |
https://www.aclweb.org/anthology/W19-6623 | |
PWC | https://paperswithcode.com/paper/identifying-fluently-inadequate-output-in |
Repo | |
Framework | |
A Generative Adversarial Density Estimator
Title | A Generative Adversarial Density Estimator |
Authors | M. Ehsan Abbasnejad, Qinfeng Shi, Anton van den Hengel, Lingqiao Liu |
Abstract | Density estimation is a challenging unsupervised learning problem. Current maximum likelihood approaches for density estimation are either restrictive or incapable of producing high-quality samples. On the other hand, likelihood-free models such as generative adversarial networks, produce sharp samples without a density model. The lack of a density estimate limits the applications to which the sampled data can be put, however. We propose a Generative Adversarial Density Estimator, a density estimation approach that bridges the gap between the two. Allowing for a prior on the parameters of the model, we extend our density estimator to a Bayesian model where we can leverage the predictive variance to measure our confidence in the likelihood. Our experiments on challenging applications such as visual dialog where the density and the confidence in predictions are crucial shows the effectiveness of our approach. |
Tasks | Density Estimation, Visual Dialog |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Abbasnejad_A_Generative_Adversarial_Density_Estimator_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Abbasnejad_A_Generative_Adversarial_Density_Estimator_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/a-generative-adversarial-density-estimator |
Repo | |
Framework | |
Generalization Error Analysis of Quantized Compressive Learning
Title | Generalization Error Analysis of Quantized Compressive Learning |
Authors | Xiaoyun Li, Ping Li |
Abstract | Compressive learning is an effective method to deal with very high dimensional datasets by applying learning algorithms in a randomly projected lower dimensional space. In this paper, we consider the learning problem where the projected data is further compressed by scalar quantization, which is called quantized compressive learning. Generalization error bounds are derived for three models: nearest neighbor (NN) classifier, linear classifier and least squares regression. Besides studying finite sample setting, our asymptotic analysis shows that the inner product estimators have deep connection with NN and linear classification problem through the variance of their debiased counterparts. By analyzing the extra error term brought by quantization, our results provide useful implications to the choice of quantizers in applications involving different learning tasks. Empirical study is also conducted to validate our theoretical findings. |
Tasks | Quantization |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9651-generalization-error-analysis-of-quantized-compressive-learning |
http://papers.nips.cc/paper/9651-generalization-error-analysis-of-quantized-compressive-learning.pdf | |
PWC | https://paperswithcode.com/paper/generalization-error-analysis-of-quantized |
Repo | |
Framework | |
Wikipedia as a Resource for Text Analysis and Retrieval
Title | Wikipedia as a Resource for Text Analysis and Retrieval |
Authors | Marius Pasca |
Abstract | This tutorial examines the role of Wikipedia in tasks related to text analysis and retrieval. Text analysis tasks, which take advantage of Wikipedia, include coreference resolution, word sense and entity disambiguation and information extraction. In information retrieval, a better understanding of the structure and meaning of queries helps in matching queries against documents, clustering search results, answer and entity retrieval and retrieving knowledge panels for queries asking about popular entities. |
Tasks | Coreference Resolution, Entity Disambiguation, Information Retrieval |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-4005/ |
https://www.aclweb.org/anthology/P19-4005 | |
PWC | https://paperswithcode.com/paper/wikipedia-as-a-resource-for-text-analysis-and |
Repo | |
Framework | |
Enhancing Variational Autoencoders with Mutual Information Neural Estimation for Text Generation
Title | Enhancing Variational Autoencoders with Mutual Information Neural Estimation for Text Generation |
Authors | Dong Qian, William K. Cheung |
Abstract | While broadly applicable to many natural language processing (NLP) tasks, variational autoencoders (VAEs) are hard to train due to the posterior collapse issue where the latent variable fails to encode the input data effectively. Various approaches have been proposed to alleviate this problem to improve the capability of the VAE. In this paper, we propose to introduce a mutual information (MI) term between the input and its latent variable to regularize the objective of the VAE. Since estimating the MI in the high-dimensional space is intractable, we employ neural networks for the estimation of the MI and provide a training algorithm based on the convex duality approach. Our experimental results on three benchmark datasets demonstrate that the proposed model, compared to the state-of-the-art baselines, exhibits less posterior collapse and has comparable or better performance in language modeling and text generation. We also qualitatively evaluate the inferred latent space and show that the proposed model can generate more reasonable and diverse sentences via linear interpolation in the latent space. |
Tasks | Language Modelling, Text Generation |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1416/ |
https://www.aclweb.org/anthology/D19-1416 | |
PWC | https://paperswithcode.com/paper/enhancing-variational-autoencoders-with |
Repo | |
Framework | |
Stick to the Facts: Learning towards a Fidelity-oriented E-Commerce Product Description Generation
Title | Stick to the Facts: Learning towards a Fidelity-oriented E-Commerce Product Description Generation |
Authors | Zhangming Chan, Xiuying Chen, Yongliang Wang, Juntao Li, Zhiqiang Zhang, Kun Gai, Dongyan Zhao, Rui Yan |
Abstract | Different from other text generation tasks, in product description generation, it is of vital importance to generate faithful descriptions that stick to the product attribute information. However, little attention has been paid to this problem. To bridge this gap we propose a model named Fidelity-oriented Product Description Generator (FPDG). FPDG takes the entity label of each word into account, since the product attribute information is always conveyed by entity words. Specifically, we first propose a Recurrent Neural Network (RNN) decoder based on the Entity-label-guided Long Short-Term Memory (ELSTM) cell, taking both the embedding and the entity label of each word as input. Second, we establish a keyword memory that stores the entity labels as keys and keywords as values, and FPDG will attend to keywords through attending to their entity labels. Experiments conducted a large-scale real-world product description dataset show that our model achieves the state-of-the-art performance in terms of both traditional generation metrics as well as human evaluations. Specifically, FPDG increases the fidelity of the generated descriptions by 25{%}. |
Tasks | Text Generation |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1501/ |
https://www.aclweb.org/anthology/D19-1501 | |
PWC | https://paperswithcode.com/paper/stick-to-the-facts-learning-towards-a |
Repo | |
Framework | |
On the Trajectory of Stochastic Gradient Descent in the Information Plane
Title | On the Trajectory of Stochastic Gradient Descent in the Information Plane |
Authors | Emilio Rafael Balda, Arash Behboodi, Rudolf Mathar |
Abstract | Studying the evolution of information theoretic quantities during Stochastic Gradient Descent (SGD) learning of Artificial Neural Networks (ANNs) has gained popularity in recent years. Nevertheless, these type of experiments require estimating mutual information and entropy which becomes intractable for moderately large problems. In this work we propose a framework for understanding SGD learning in the information plane which consists of observing entropy and conditional entropy of the output labels of ANN. Through experimental results and theoretical justifications it is shown that, under some assumptions, the SGD learning trajectories appear to be similar for different ANN architectures. First, the SGD learning is modeled as a Hidden Markov Process (HMP) whose entropy tends to increase to the maximum. Then, it is shown that the SGD learning trajectory appears to move close to the shortest path between the initial and final joint distributions in the space of probability measures equipped with the total variation metric. Furthermore, it is shown that the trajectory of learning in the information plane can provide an alternative for observing the learning process, with potentially richer information about the learning than the trajectories in training and test error. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=SkMON20ctX |
https://openreview.net/pdf?id=SkMON20ctX | |
PWC | https://paperswithcode.com/paper/on-the-trajectory-of-stochastic-gradient |
Repo | |
Framework | |
Utilizing Monolingual Data in NMT for Similar Languages: Submission to Similar Language Translation Task
Title | Utilizing Monolingual Data in NMT for Similar Languages: Submission to Similar Language Translation Task |
Authors | Jyotsana Khatri, Pushpak Bhattacharyya |
Abstract | This paper describes our submission to Shared Task on Similar Language Translation in Fourth Conference on Machine Translation (WMT 2019). We submitted three systems for Hindi -{\textgreater} Nepali direction in which we have examined the performance of a RNN based NMT system, a semi-supervised NMT system where monolingual data of both languages is utilized using the architecture by and a system trained with extra synthetic sentences generated using copy of source and target sentences without using any additional monolingual data. |
Tasks | Machine Translation |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5426/ |
https://www.aclweb.org/anthology/W19-5426 | |
PWC | https://paperswithcode.com/paper/utilizing-monolingual-data-in-nmt-for-similar |
Repo | |
Framework | |
Sentim at SemEval-2019 Task 3: Convolutional Neural Networks For Sentiment in Conversations
Title | Sentim at SemEval-2019 Task 3: Convolutional Neural Networks For Sentiment in Conversations |
Authors | Jacob Anderson |
Abstract | In this work convolutional neural networks were used in order to determine the sentiment in a conversational setting. This paper{'}s contributions include a method for handling any sized input and a method for breaking down the conversation into separate parts for easier processing. Finally, clustering was shown to improve results and that such a model for handling sentiment in conversations is both fast and accurate. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2052/ |
https://www.aclweb.org/anthology/S19-2052 | |
PWC | https://paperswithcode.com/paper/sentim-at-semeval-2019-task-3-convolutional |
Repo | |
Framework | |
Towards Text Processing Pipelines to Identify Adverse Drug Events-related Tweets: University of Michigan @ SMM4H 2019 Task 1
Title | Towards Text Processing Pipelines to Identify Adverse Drug Events-related Tweets: University of Michigan @ SMM4H 2019 Task 1 |
Authors | V.G.Vinod Vydiswaran, Grace Ganzel, Bryan Romas, Deahan Yu, Amy Austin, Neha Bhomia, Socheatha Chan, Stephanie Hall, Van Le, Aaron Miller, Olawunmi Oduyebo, Aulia Song, Radhika Sondhi, Danny Teng, Hao Tseng, Kim Vuong, Stephanie Zimmerman |
Abstract | We participated in Task 1 of the Social Media Mining for Health Applications (SMM4H) 2019 Shared Tasks on detecting mentions of adverse drug events (ADEs) in tweets. Our approach relied on a text processing pipeline for tweets, and training traditional machine learning and deep learning models. Our submitted runs performed above average for the task. |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-3217/ |
https://www.aclweb.org/anthology/W19-3217 | |
PWC | https://paperswithcode.com/paper/towards-text-processing-pipelines-to-identify |
Repo | |
Framework | |
Pretrained Ensemble Learning for Fine-Grained Propaganda Detection
Title | Pretrained Ensemble Learning for Fine-Grained Propaganda Detection |
Authors | Ali Fadel, Ibraheem Tuffaha, Mahmoud Al-Ayyoub |
Abstract | In this paper, we describe our team{'}s effort on the fine-grained propaganda detection on sentence level classification (SLC) task of NLP4IF 2019 workshop co-located with the EMNLP-IJCNLP 2019 conference. Our top performing system results come from applying ensemble average on three pretrained models to make their predictions. The first two models use the uncased and cased versions of Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., 2018) while the third model uses Universal Sentence Encoder (USE) (Cer et al. 2018). Out of 26 participating teams, our system is ranked in the first place with 68.8312 F1-score on the development dataset and in the sixth place with 61.3870 F1-score on the testing dataset. |
Tasks | |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-5020/ |
https://www.aclweb.org/anthology/D19-5020 | |
PWC | https://paperswithcode.com/paper/pretrained-ensemble-learning-for-fine-grained |
Repo | |
Framework | |
RegularFace: Deep Face Recognition via Exclusive Regularization
Title | RegularFace: Deep Face Recognition via Exclusive Regularization |
Authors | Kai Zhao, Jingyi Xu, Ming-Ming Cheng |
Abstract | We consider the face recognition task where facial images of the same identity (person) is expected to be closer in the representation space, while different identities be far apart. Several recent studies encourage the intra-class compactness by developing loss functions that penalize the variance of representations of the same identity. In this paper, we propose the `exclusive regularization’ that focuses on the other aspect of discriminability – the inter-class separability, which is neglected in many recent approaches. The proposed method, named RegularFace, explicitly distances identities by penalizing the angle between an identity and its nearest neighbor, resulting in discriminative face representations. Our method has intuitive geometric interpretation and presents unique benefits that are absent in previous works. Quantitative comparisons against prior methods on several open benchmarks demonstrate the superiority of our method. In addition, our method is easy to implement and requires only a few lines of python code on modern deep learning frameworks. | |
Tasks | Face Recognition |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Zhao_RegularFace_Deep_Face_Recognition_via_Exclusive_Regularization_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhao_RegularFace_Deep_Face_Recognition_via_Exclusive_Regularization_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/regularface-deep-face-recognition-via |
Repo | |
Framework | |
Efficient Multi-Domain Learning by Covariance Normalization
Title | Efficient Multi-Domain Learning by Covariance Normalization |
Authors | Yunsheng Li, Nuno Vasconcelos |
Abstract | The problem of multi-domain learning of deep networks is considered. An adaptive layer is induced per target domain and a novel procedure, denoted covariance normalization (CovNorm), proposed to reduce its parameters. CovNorm is a data driven method of fairly simple implementation, requiring two principal component analyzes (PCA) and fine-tuning of a mini-adaptation layer. Nevertheless, it is shown, both theoretically and experimentally, to have several advantages over previous approaches, such as batch normalization or geometric matrix approximations. Furthermore, CovNorm can be deployed both when target datasets are available sequentially or simultaneously. Experiments show that, in both cases, it has performance comparable to a fully fine-tuned network, using as few as 0.13% of the corresponding parameters per target domain. |
Tasks | |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Li_Efficient_Multi-Domain_Learning_by_Covariance_Normalization_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Li_Efficient_Multi-Domain_Learning_by_Covariance_Normalization_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/efficient-multi-domain-learning-by-covariance |
Repo | |
Framework | |