February 1, 2020

3109 words 15 mins read

Paper Group AWR 210

Paper Group AWR 210

Additive function approximation in the brain. Non-parametric Uni-modality Constraints for Deep Ordinal Classification. Be Concise and Precise: Synthesizing Open-Domain Entity Descriptions from Facts. Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions. Hierarchical Pointer Net Parsing. TANDA: Transfe …

Additive function approximation in the brain

Title Additive function approximation in the brain
Authors Kameron Decker Harris
Abstract Many biological learning systems such as the mushroom body, hippocampus, and cerebellum are built from sparsely connected networks of neurons. For a new understanding of such networks, we study the function spaces induced by sparse random features and characterize what functions may and may not be learned. A network with $d$ inputs per neuron is found to be equivalent to an additive model of order $d$, whereas with a degree distribution the network combines additive terms of different orders. We identify three specific advantages of sparsity: additive function approximation is a powerful inductive bias that limits the curse of dimensionality, sparse networks are stable to outlier noise in the inputs, and sparse random features are scalable. Thus, even simple brain architectures can be powerful function approximators. Finally, we hope that this work helps popularize kernel theories of networks among computational neuroscientists.
Tasks
Published 2019-09-05
URL https://arxiv.org/abs/1909.02603v2
PDF https://arxiv.org/pdf/1909.02603v2.pdf
PWC https://paperswithcode.com/paper/additive-function-approximation-in-the-brain
Repo https://github.com/kharris/sparse-random-features
Framework none

Non-parametric Uni-modality Constraints for Deep Ordinal Classification

Title Non-parametric Uni-modality Constraints for Deep Ordinal Classification
Authors Soufiane Belharbi, Ismail Ben Ayed, Luke McCaffrey, Eric Granger
Abstract We propose a new constrained-optimization formulation for deep ordinal classification, in which uni-modality of the label distribution is enforced implicitly via a set of inequality constraints over all the pairs of adjacent labels. Based on (c-1) constraints for c labels, our model is non-parametric and, therefore, more flexible than the existing deep ordinal classification techniques. Unlike these, it does not restrict the learned representation to a single and specific parametric model (or penalty) imposed on all the labels. Therefore, it enables the training to explore larger spaces of solutions, while removing the need for ad hoc choices and scaling up to large numbers of labels. It can be used in conjunction with any standard classification loss and any deep architecture. To tackle the ensuing challenging optimization problem, we solve a sequence of unconstrained losses based on a powerful extension of the log-barrier method. This handles effectively competing constraints and accommodates standard SGD for deep networks, while avoiding computationally expensive Lagrangian dual steps and outperforming substantially penalty methods. Furthermore, we propose a new performance metric for ordinal classification, as a proxy to measure distribution uni-modality, referred to as the Sides Order Index (SOI). We report comprehensive evaluations and comparisons to state-of-the-art methods on benchmark public datasets for several ordinal classification tasks, showing the merits of our approach in terms of label consistency, classification accuracy and scalability. Importantly, enforcing label consistency with our model does not incur higher classification errors, unlike many existing ordinal classification methods. A public reproducible PyTorch implementation is provided. (https://github.com/sbelharbi/unimodal-prob-deep-oc-free-distribution)
Tasks
Published 2019-11-25
URL https://arxiv.org/abs/1911.10720v2
PDF https://arxiv.org/pdf/1911.10720v2.pdf
PWC https://paperswithcode.com/paper/deep-ordinal-classification-with-inequality
Repo https://github.com/sbelharbi/Deep-Ordinal-Classification-with-Inequality-Constraints
Framework pytorch

Be Concise and Precise: Synthesizing Open-Domain Entity Descriptions from Facts

Title Be Concise and Precise: Synthesizing Open-Domain Entity Descriptions from Facts
Authors Rajarshi Bhowmik, Gerard de Melo
Abstract Despite being vast repositories of factual information, cross-domain knowledge graphs, such as Wikidata and the Google Knowledge Graph, only sparsely provide short synoptic descriptions for entities. Such descriptions that briefly identify the most discernible features of an entity provide readers with a near-instantaneous understanding of what kind of entity they are being presented. They can also aid in tasks such as named entity disambiguation, ontological type determination, and answering entity queries. Given the rapidly increasing numbers of entities in knowledge graphs, a fully automated synthesis of succinct textual descriptions from underlying factual information is essential. To this end, we propose a novel fact-to-sequence encoder-decoder model with a suitable copy mechanism to generate concise and precise textual descriptions of entities. In an in-depth evaluation, we demonstrate that our method significantly outperforms state-of-the-art alternatives.
Tasks Entity Disambiguation, Knowledge Graphs
Published 2019-04-16
URL http://arxiv.org/abs/1904.07391v1
PDF http://arxiv.org/pdf/1904.07391v1.pdf
PWC https://paperswithcode.com/paper/be-concise-and-precise-synthesizing-open
Repo https://github.com/kingsaint/Wikidata-Descriptions
Framework none

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Title Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions
Authors Simon Luo, Mahito Sugiyama
Abstract Hierarchical probabilistic models are able to use a large number of parameters to create a model with a high representation power. However, it is well known that increasing the number of parameters also increases the complexity of the model which leads to a bias-variance trade-off. Although it is a classical problem, the bias-variance trade-off between hidden layers and higher-order interactions have not been well studied. In our study, we propose an efficient inference algorithm for the log-linear formulation of the higher-order Boltzmann machine using a combination of Gibbs sampling and annealed importance sampling. We then perform a bias-variance decomposition to study the differences in hidden layers and higher-order interactions. Our results have shown that using hidden layers and higher-order interactions have a comparable error with a similar order of magnitude and using higher-order interactions produce less variance for smaller sample size.
Tasks
Published 2019-06-28
URL https://arxiv.org/abs/1906.12063v1
PDF https://arxiv.org/pdf/1906.12063v1.pdf
PWC https://paperswithcode.com/paper/bias-variance-trade-off-in-hierarchical
Repo https://github.com/sjmluo/HBM
Framework none

Hierarchical Pointer Net Parsing

Title Hierarchical Pointer Net Parsing
Authors Linlin Liu, Xiang Lin, Shafiq Joty, Simeng Han, Lidong Bing
Abstract Transition-based top-down parsing with pointer networks has achieved state-of-the-art results in multiple parsing tasks, while having a linear time complexity. However, the decoder of these parsers has a sequential structure, which does not yield the most appropriate inductive bias for deriving tree structures. In this paper, we propose hierarchical pointer network parsers, and apply them to dependency and sentence-level discourse parsing tasks. Our results on standard benchmark datasets demonstrate the effectiveness of our approach, outperforming existing methods and setting a new state-of-the-art.
Tasks
Published 2019-08-30
URL https://arxiv.org/abs/1908.11571v1
PDF https://arxiv.org/pdf/1908.11571v1.pdf
PWC https://paperswithcode.com/paper/hierarchical-pointer-net-parsing
Repo https://github.com/ntunlp/ptrnet-depparser
Framework pytorch

TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection

Title TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection
Authors Siddhant Garg, Thuy Vu, Alessandro Moschitti
Abstract We propose TANDA, an effective technique for fine-tuning pre-trained Transformer models for natural language tasks. Specifically, we first transfer a pre-trained model into a model for a general task by fine-tuning it with a large and high-quality dataset. We then perform a second fine-tuning step to adapt the transferred model to the target domain. We demonstrate the benefits of our approach for answer sentence selection, which is a well-known inference task in Question Answering. We built a large scale dataset to enable the transfer step, exploiting the Natural Questions dataset. Our approach establishes the state of the art on two well-known benchmarks, WikiQA and TREC-QA, achieving MAP scores of 92% and 94.3%, respectively, which largely outperform the previous highest scores of 83.4% and 87.5%, obtained in very recent work. We empirically show that TANDA generates more stable and robust models reducing the effort required for selecting optimal hyper-parameters. Additionally, we show that the transfer step of TANDA makes the adaptation step more robust to noise. This enables a more effective use of noisy datasets for fine-tuning. Finally, we also confirm the positive impact of TANDA in an industrial setting, using domain specific datasets subject to different types of noise.
Tasks Question Answering
Published 2019-11-11
URL https://arxiv.org/abs/1911.04118v2
PDF https://arxiv.org/pdf/1911.04118v2.pdf
PWC https://paperswithcode.com/paper/tanda-transfer-and-adapt-pre-trained
Repo https://github.com/alexa/wqa_tanda
Framework none

Analyzing and Improving Representations with the Soft Nearest Neighbor Loss

Title Analyzing and Improving Representations with the Soft Nearest Neighbor Loss
Authors Nicholas Frosst, Nicolas Papernot, Geoffrey Hinton
Abstract We explore and expand the $\textit{Soft Nearest Neighbor Loss}$ to measure the $\textit{entanglement}$ of class manifolds in representation space: i.e., how close pairs of points from the same class are relative to pairs of points from different classes. We demonstrate several use cases of the loss. As an analytical tool, it provides insights into the evolution of class similarity structures during learning. Surprisingly, we find that $\textit{maximizing}$ the entanglement of representations of different classes in the hidden layers is beneficial for discrimination in the final layer, possibly because it encourages representations to identify class-independent similarity structures. Maximizing the soft nearest neighbor loss in the hidden layers leads not only to improved generalization but also to better-calibrated estimates of uncertainty on outlier data. Data that is not from the training distribution can be recognized by observing that in the hidden layers, it has fewer than the normal number of neighbors from the predicted class.
Tasks
Published 2019-02-05
URL http://arxiv.org/abs/1902.01889v1
PDF http://arxiv.org/pdf/1902.01889v1.pdf
PWC https://paperswithcode.com/paper/analyzing-and-improving-representations-with
Repo https://github.com/vimarshc/fastai_experiments
Framework tf

Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation

Title Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation
Authors Kiru Park, Timothy Patten, Markus Vincze
Abstract Estimating the 6D pose of objects using only RGB images remains challenging because of problems such as occlusion and symmetries. It is also difficult to construct 3D models with precise texture without expert knowledge or specialized scanning devices. To address these problems, we propose a novel pose estimation method, Pix2Pose, that predicts the 3D coordinates of each object pixel without textured models. An auto-encoder architecture is designed to estimate the 3D coordinates and expected errors per pixel. These pixel-wise predictions are then used in multiple stages to form 2D-3D correspondences to directly compute poses with the PnP algorithm with RANSAC iterations. Our method is robust to occlusion by leveraging recent achievements in generative adversarial training to precisely recover occluded parts. Furthermore, a novel loss function, the transformer loss, is proposed to handle symmetric objects by guiding predictions to the closest symmetric pose. Evaluations on three different benchmark datasets containing symmetric and occluded objects show our method outperforms the state of the art using only RGB images.
Tasks 6D Pose Estimation, 6D Pose Estimation using RGB, Pose Estimation
Published 2019-08-20
URL https://arxiv.org/abs/1908.07433v1
PDF https://arxiv.org/pdf/1908.07433v1.pdf
PWC https://paperswithcode.com/paper/pix2pose-pixel-wise-coordinate-regression-of
Repo https://github.com/kirumang/Pix2Pose
Framework tf

A Multiscale Visualization of Attention in the Transformer Model

Title A Multiscale Visualization of Attention in the Transformer Model
Authors Jesse Vig
Abstract The Transformer is a sequence model that forgoes traditional recurrent architectures in favor of a fully attention-based approach. Besides improving performance, an advantage of using attention is that it can also help to interpret a model by showing how the model assigns weight to different input elements. However, the multi-layer, multi-head attention mechanism in the Transformer model can be difficult to decipher. To make the model more accessible, we introduce an open-source tool that visualizes attention at multiple scales, each of which provides a unique perspective on the attention mechanism. We demonstrate the tool on BERT and OpenAI GPT-2 and present three example use cases: detecting model bias, locating relevant attention heads, and linking neurons to model behavior.
Tasks
Published 2019-06-12
URL https://arxiv.org/abs/1906.05714v1
PDF https://arxiv.org/pdf/1906.05714v1.pdf
PWC https://paperswithcode.com/paper/a-multiscale-visualization-of-attention-in
Repo https://github.com/jessevig/bertviz
Framework pytorch

Attentive History Selection for Conversational Question Answering

Title Attentive History Selection for Conversational Question Answering
Authors Chen Qu, Liu Yang, Minghui Qiu, Yongfeng Zhang, Cen Chen, W. Bruce Croft, Mohit Iyyer
Abstract Conversational question answering (ConvQA) is a simplified but concrete setting of conversational search. One of its major challenges is to leverage the conversation history to understand and answer the current question. In this work, we propose a novel solution for ConvQA that involves three aspects. First, we propose a positional history answer embedding method to encode conversation history with position information using BERT in a natural way. BERT is a powerful technique for text representation. Second, we design a history attention mechanism (HAM) to conduct a “soft selection” for conversation histories. This method attends to history turns with different weights based on how helpful they are on answering the current question. Third, in addition to handling conversation history, we take advantage of multi-task learning (MTL) to do answer prediction along with another essential conversation task (dialog act prediction) using a uniform model architecture. MTL is able to learn more expressive and generic representations to improve the performance of ConvQA. We demonstrate the effectiveness of our model with extensive experimental evaluations on QuAC, a large-scale ConvQA dataset. We show that position information plays an important role in conversation history modeling. We also visualize the history attention and provide new insights into conversation history understanding.
Tasks Multi-Task Learning, Question Answering
Published 2019-08-26
URL https://arxiv.org/abs/1908.09456v1
PDF https://arxiv.org/pdf/1908.09456v1.pdf
PWC https://paperswithcode.com/paper/attentive-history-selection-for
Repo https://github.com/prdwb/attentive_history_selection
Framework tf

Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index

Title Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index
Authors Minjoon Seo, Jinhyuk Lee, Tom Kwiatkowski, Ankur P. Parikh, Ali Farhadi, Hannaneh Hajishirzi
Abstract Existing open-domain question answering (QA) models are not suitable for real-time usage because they need to process several long documents on-demand for every input query. In this paper, we introduce the query-agnostic indexable representation of document phrases that can drastically speed up open-domain QA and also allows us to reach long-tail targets. In particular, our dense-sparse phrase encoding effectively captures syntactic, semantic, and lexical information of the phrases and eliminates the pipeline filtering of context documents. Leveraging optimization strategies, our model can be trained in a single 4-GPU server and serve entire Wikipedia (up to 60 billion phrases) under 2TB with CPUs only. Our experiments on SQuAD-Open show that our model is more accurate than DrQA (Chen et al., 2017) with 6000x reduced computational cost, which translates into at least 58x faster end-to-end inference benchmark on CPUs.
Tasks Open-Domain Question Answering, Question Answering
Published 2019-06-13
URL https://arxiv.org/abs/1906.05807v2
PDF https://arxiv.org/pdf/1906.05807v2.pdf
PWC https://paperswithcode.com/paper/real-time-open-domain-question-answering-with
Repo https://github.com/uwnlp/denspi
Framework pytorch

Semi-Supervised Graph Classification: A Hierarchical Graph Perspective

Title Semi-Supervised Graph Classification: A Hierarchical Graph Perspective
Authors Jia Li, Yu Rong, Hong Cheng, Helen Meng, Wenbing Huang, Junzhou Huang
Abstract Node classification and graph classification are two graph learning problems that predict the class label of a node and the class label of a graph respectively. A node of a graph usually represents a real-world entity, e.g., a user in a social network, or a protein in a protein-protein interaction network. In this work, we consider a more challenging but practically useful setting, in which a node itself is a graph instance. This leads to a hierarchical graph perspective which arises in many domains such as social network, biological network and document collection. For example, in a social network, a group of people with shared interests forms a user group, whereas a number of user groups are interconnected via interactions or common members. We study the node classification problem in the hierarchical graph where a `node’ is a graph instance, e.g., a user group in the above example. As labels are usually limited in real-world data, we design two novel semi-supervised solutions named \underline{SE}mi-supervised gr\underline{A}ph c\underline{L}assification via \underline{C}autious/\underline{A}ctive \underline{I}teration (or SEAL-C/AI in short). SEAL-C/AI adopt an iterative framework that takes turns to build or update two classifiers, one working at the graph instance level and the other at the hierarchical graph level. To simplify the representation of the hierarchical graph, we propose a novel supervised, self-attentive graph embedding method called SAGE, which embeds graph instances of arbitrary size into fixed-length vectors. Through experiments on synthetic data and Tencent QQ group data, we demonstrate that SEAL-C/AI not only outperform competing methods by a significant margin in terms of accuracy/Macro-F1, but also generate meaningful interpretations of the learned representations. |
Tasks Graph Classification, Graph Embedding, Node Classification
Published 2019-04-10
URL http://arxiv.org/abs/1904.05003v1
PDF http://arxiv.org/pdf/1904.05003v1.pdf
PWC https://paperswithcode.com/paper/semi-supervised-graph-classification-a
Repo https://github.com/benedekrozemberczki/SEAL-CI
Framework pytorch

Exploiting Out-of-Domain Parallel Data through Multilingual Transfer Learning for Low-Resource Neural Machine Translation

Title Exploiting Out-of-Domain Parallel Data through Multilingual Transfer Learning for Low-Resource Neural Machine Translation
Authors Aizhan Imankulova, Raj Dabre, Atsushi Fujita, Kenji Imamura
Abstract This paper proposes a novel multilingual multistage fine-tuning approach for low-resource neural machine translation (NMT), taking a challenging Japanese–Russian pair for benchmarking. Although there are many solutions for low-resource scenarios, such as multilingual NMT and back-translation, we have empirically confirmed their limited success when restricted to in-domain data. We therefore propose to exploit out-of-domain data through transfer learning, by using it to first train a multilingual NMT model followed by multistage fine-tuning on in-domain parallel and back-translated pseudo-parallel data. Our approach, which combines domain adaptation, multilingualism, and back-translation, helps improve the translation quality by more than 3.7 BLEU points, over a strong baseline, for this extremely low-resource scenario.
Tasks Domain Adaptation, Low-Resource Neural Machine Translation, Machine Translation, Transfer Learning
Published 2019-07-06
URL https://arxiv.org/abs/1907.03060v1
PDF https://arxiv.org/pdf/1907.03060v1.pdf
PWC https://paperswithcode.com/paper/exploiting-out-of-domain-parallel-data
Repo https://github.com/aizhanti/JaRuNC
Framework none

Quality Estimation for Image Captions Based on Large-scale Human Evaluations

Title Quality Estimation for Image Captions Based on Large-scale Human Evaluations
Authors Tomer Levinboim, Ashish Thapliyal, Piyush Sharma, Radu Soricut
Abstract Automatic image captioning has improved significantly in the last few years, but the problem is far from being solved. Furthermore, while the standard automatic metrics, such as CIDEr and SPICE~\cite{cider,spice}, can be used for model selection, they cannot be used at inference-time given a previously unseen image since they require ground-truth references. In this paper, we focus on the related problem called Quality Estimation (QE) of image-captions. In contrast to automatic metrics, QE attempts to model caption quality without relying on ground-truth references. It can thus be applied as a second-pass model (after caption generation) to estimate the quality of captions even for previously unseen images. We conduct a large-scale human evaluation experiment, in which we collect a new dataset of more than 600k ratings of image-caption pairs. Using this dataset, we design and experiment with several QE modeling approaches and provide an analysis of their performance. Our results show that QE is feasible for image captioning.
Tasks Image Captioning, Model Selection
Published 2019-09-08
URL https://arxiv.org/abs/1909.03396v1
PDF https://arxiv.org/pdf/1909.03396v1.pdf
PWC https://paperswithcode.com/paper/quality-estimation-for-image-captions-based
Repo https://github.com/google-research-datasets/Image-Caption-Quality-Dataset
Framework none

Revisiting Low-Resource Neural Machine Translation: A Case Study

Title Revisiting Low-Resource Neural Machine Translation: A Case Study
Authors Rico Sennrich, Biao Zhang
Abstract It has been shown that the performance of neural machine translation (NMT) drops starkly in low-resource conditions, underperforming phrase-based statistical machine translation (PBSMT) and requiring large amounts of auxiliary data to achieve competitive results. In this paper, we re-assess the validity of these results, arguing that they are the result of lack of system adaptation to low-resource settings. We discuss some pitfalls to be aware of when training low-resource NMT systems, and recent techniques that have shown to be especially helpful in low-resource settings, resulting in a set of best practices for low-resource NMT. In our experiments on German–English with different amounts of IWSLT14 training data, we show that, without the use of any auxiliary monolingual or multilingual data, an optimized NMT system can outperform PBSMT with far less data than previously claimed. We also apply these techniques to a low-resource Korean-English dataset, surpassing previously reported results by 4 BLEU.
Tasks Low-Resource Neural Machine Translation, Machine Translation
Published 2019-05-28
URL https://arxiv.org/abs/1905.11901v1
PDF https://arxiv.org/pdf/1905.11901v1.pdf
PWC https://paperswithcode.com/paper/revisiting-low-resource-neural-machine
Repo https://github.com/yuekai146/NMT
Framework pytorch
comments powered by Disqus