Paper Group AWR 52
Attend to the beginning: A study on using bidirectional attention for extractive summarization. Calibration of Pre-trained Transformers. K-Core based Temporal Graph Convolutional Network for Dynamic Graphs. Comment Ranking Diversification in Forum Discussions. Going beyond accuracy: estimating homophily in social networks using predictions. State-o …
Attend to the beginning: A study on using bidirectional attention for extractive summarization
Title | Attend to the beginning: A study on using bidirectional attention for extractive summarization |
Authors | Ahmed Magooda, Cezary Marcjan |
Abstract | Forum discussion data differ in both structure and properties from generic form of textual data such as news. Henceforth, summarization techniques should, in turn, make use of such differences, and craft models that can benefit from the structural nature of discussion data. In this work, we propose attending to the beginning of a document, to improve the performance of extractive summarization models when applied to forum discussion data. Evaluations demonstrated that with the help of bidirectional attention mechanism, attending to the beginning of a document (initial comment/post) in a discussion thread, can introduce a consistent boost in ROUGE scores, as well as introducing a new State Of The Art (SOTA) ROUGE scores on the forum discussions dataset. Additionally, we explored whether this hypothesis is extendable to other generic forms of textual data. We make use of the tendency of introducing important information early in the text, by attending to the first few sentences in generic textual data. Evaluations demonstrated that attending to introductory sentences using bidirectional attention, improves the performance of extractive summarization models when even applied to more generic form of textual data. |
Tasks | |
Published | 2020-02-09 |
URL | https://arxiv.org/abs/2002.03405v2 |
https://arxiv.org/pdf/2002.03405v2.pdf | |
PWC | https://paperswithcode.com/paper/attend-to-the-beginning-a-study-on-using |
Repo | https://github.com/amagooda/SummaRuNNer_coattention |
Framework | pytorch |
Calibration of Pre-trained Transformers
Title | Calibration of Pre-trained Transformers |
Authors | Shrey Desai, Greg Durrett |
Abstract | Pre-trained Transformers are now ubiquitous in natural language processing, but despite their high end-task performance, little is known empirically about whether they are calibrated. Specifically, do these models’ posterior probabilities provide an accurate empirical measure of how likely the model is to be correct on a given example? We focus on BERT and RoBERTa in this work, and analyze their calibration across three tasks: natural language inference, paraphrase detection, and commonsense reasoning. For each task, we consider in-domain as well as challenging out-of-domain settings, where models face more examples they should be uncertain about. We show that: (1) when used out-of-the-box, pre-trained models are calibrated in-domain, and compared to baselines, their calibration error out-of-domain can be as much as 3.5x lower; (2) temperature scaling is effective at further reducing calibration error in-domain, and using label smoothing to deliberately increase empirical uncertainty helps calibrate posteriors out-of-domain. |
Tasks | Calibration, Natural Language Inference |
Published | 2020-03-17 |
URL | https://arxiv.org/abs/2003.07892v2 |
https://arxiv.org/pdf/2003.07892v2.pdf | |
PWC | https://paperswithcode.com/paper/calibration-of-pre-trained-transformers |
Repo | https://github.com/shreydesai/calibration |
Framework | pytorch |
K-Core based Temporal Graph Convolutional Network for Dynamic Graphs
Title | K-Core based Temporal Graph Convolutional Network for Dynamic Graphs |
Authors | Jingxin Liu, Chang Xu, Chang Yin, Weiqiang Wu, You Song |
Abstract | Graph representation learning is a fundamental task of various applications, aiming to learn low-dimensional embeddings for nodes which can preserve graph topology information. However, many existing methods focus on static graphs while ignoring graph evolving patterns. Inspired by the success of graph convolutional networks(GCNs) in static graph embedding, we propose a novel k-core based temporal graph convolutional network, namely CTGCN, to learn node representations for dynamic graphs. In contrast to previous dynamic graph embedding methods, CTGCN can preserve both local connective proximity and global structural similarity in a unified framework while simultaneously capturing graph dynamics. In the proposed framework, the traditional graph convolution operation is generalized into two parts: feature transformation and feature aggregation, which gives CTGCN more flexibility and enables CTGCN to learn connective and structural information under the same framework. Experimental results on 7 real-world graphs demonstrate CTGCN outperforms existing state-of-the-art graph embedding methods in several tasks, such as link prediction and structural role classification. The source code of this work can be obtained from https://github.com/jhljx/CTGCN. |
Tasks | Graph Embedding, Graph Representation Learning, Link Prediction, Representation Learning |
Published | 2020-03-22 |
URL | https://arxiv.org/abs/2003.09902v1 |
https://arxiv.org/pdf/2003.09902v1.pdf | |
PWC | https://paperswithcode.com/paper/k-core-based-temporal-graph-convolutional |
Repo | https://github.com/jhljx/CTGCN |
Framework | pytorch |
Comment Ranking Diversification in Forum Discussions
Title | Comment Ranking Diversification in Forum Discussions |
Authors | Curtis G. Northcutt, Kimberly A. Leon, Naichun Chen |
Abstract | Viewing consumption of discussion forums with hundreds or more comments depends on ranking because most users only view top-ranked comments. When comments are ranked by an ordered score (e.g. number of replies or up-votes) without adjusting for semantic similarity of near-ranked comments, top-ranked comments are more likely to emphasize the majority opinion and incur redundancy. In this paper, we propose a top K comment diversification re-ranking model using Maximal Marginal Relevance (MMR) and evaluate its impact in three categories: (1) semantic diversity, (2) inclusion of the semantics of lower-ranked comments, and (3) redundancy, within the context of a HarvardX course discussion forum. We conducted a double-blind, small-scale evaluation experiment requiring subjects to select between the top 5 comments of a diversified ranking and a baseline ranking ordered by score. For three subjects, across 100 trials, subjects selected the diversified (75% score, 25% diversification) ranking as significantly (1) more diverse, (2) more inclusive, and (3) less redundant. Within each category, inter-rater reliability showed moderate consistency, with typical Cohen-Kappa scores near 0.2. Our findings suggest that our model improves (1) diversification, (2) inclusion, and (3) redundancy, among top K ranked comments in online discussion forums. |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2002.12457v1 |
https://arxiv.org/pdf/2002.12457v1.pdf | |
PWC | https://paperswithcode.com/paper/comment-ranking-diversification-in-forum |
Repo | https://github.com/cgnorthcutt/forum-diversification |
Framework | tf |
Going beyond accuracy: estimating homophily in social networks using predictions
Title | Going beyond accuracy: estimating homophily in social networks using predictions |
Authors | George Berry, Antonio Sirianni, Ingmar Weber, Jisun An, Michael Macy |
Abstract | In online social networks, it is common to use predictions of node categories to estimate measures of homophily and other relational properties. However, online social network data often lacks basic demographic information about the nodes. Researchers must rely on predicted node attributes to estimate measures of homophily, but little is known about the validity of these measures. We show that estimating homophily in a network can be viewed as a dyadic prediction problem, and that homophily estimates are unbiased when dyad-level residuals sum to zero in the network. Node-level prediction models, such as the use of names to classify ethnicity or gender, do not generally have this property and can introduce large biases into homophily estimates. Bias occurs due to error autocorrelation along dyads. Importantly, node-level classification performance is not a reliable indicator of estimation accuracy for homophily. We compare estimation strategies that make predictions at the node and dyad levels, evaluating performance in different settings. We propose a novel “ego-alter” modeling approach that outperforms standard node and dyad classification strategies. While this paper focuses on homophily, results generalize to other relational measures which aggregate predictions along the dyads in a network. We conclude with suggestions for research designs to study homophily in online networks. Code for this paper is available at https://github.com/georgeberry/autocorr. |
Tasks | |
Published | 2020-01-30 |
URL | https://arxiv.org/abs/2001.11171v1 |
https://arxiv.org/pdf/2001.11171v1.pdf | |
PWC | https://paperswithcode.com/paper/going-beyond-accuracy-estimating-homophily-in |
Repo | https://github.com/georgeberry/autocorr |
Framework | none |
State-only Imitation with Transition Dynamics Mismatch
Title | State-only Imitation with Transition Dynamics Mismatch |
Authors | Tanmay Gangwani, Jian Peng |
Abstract | Imitation Learning (IL) is a popular paradigm for training agents to achieve complicated goals by leveraging expert behavior, rather than dealing with the hardships of designing a correct reward function. With the environment modeled as a Markov Decision Process (MDP), most of the existing IL algorithms are contingent on the availability of expert demonstrations in the same MDP as the one in which a new imitator policy is to be learned. This is uncharacteristic of many real-life scenarios where discrepancies between the expert and the imitator MDPs are common, especially in the transition dynamics function. Furthermore, obtaining expert actions may be costly or infeasible, making the recent trend towards state-only IL (where expert demonstrations constitute only states or observations) ever so promising. Building on recent adversarial imitation approaches that are motivated by the idea of divergence minimization, we present a new state-only IL algorithm in this paper. It divides the overall optimization objective into two subproblems by introducing an indirection step and solves the subproblems iteratively. We show that our algorithm is particularly effective when there is a transition dynamics mismatch between the expert and imitator MDPs, while the baseline IL methods suffer from performance degradation. To analyze this, we construct several interesting MDPs by modifying the configuration parameters for the MuJoCo locomotion tasks from OpenAI Gym. |
Tasks | Imitation Learning |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2002.11879v1 |
https://arxiv.org/pdf/2002.11879v1.pdf | |
PWC | https://paperswithcode.com/paper/state-only-imitation-with-transition-dynamics-1 |
Repo | https://github.com/tgangwani/RL-Indirect-imitation |
Framework | pytorch |
BasisVAE: Translation-invariant feature-level clustering with Variational Autoencoders
Title | BasisVAE: Translation-invariant feature-level clustering with Variational Autoencoders |
Authors | Kaspar Märtens, Christopher Yau |
Abstract | Variational Autoencoders (VAEs) provide a flexible and scalable framework for non-linear dimensionality reduction. However, in application domains such as genomics where data sets are typically tabular and high-dimensional, a black-box approach to dimensionality reduction does not provide sufficient insights. Common data analysis workflows additionally use clustering techniques to identify groups of similar features. This usually leads to a two-stage process, however, it would be desirable to construct a joint modelling framework for simultaneous dimensionality reduction and clustering of features. In this paper, we propose to achieve this through the BasisVAE: a combination of the VAE and a probabilistic clustering prior, which lets us learn a one-hot basis function representation as part of the decoder network. Furthermore, for scenarios where not all features are aligned, we develop an extension to handle translation-invariant basis functions. We show how a collapsed variational inference scheme leads to scalable and efficient inference for BasisVAE, demonstrated on various toy examples as well as on single-cell gene expression data. |
Tasks | Dimensionality Reduction |
Published | 2020-03-06 |
URL | https://arxiv.org/abs/2003.03462v1 |
https://arxiv.org/pdf/2003.03462v1.pdf | |
PWC | https://paperswithcode.com/paper/basisvae-translation-invariant-feature-level |
Repo | https://github.com/kasparmartens/BasisVAE |
Framework | pytorch |
Probabilistic Pixel-Adaptive Refinement Networks
Title | Probabilistic Pixel-Adaptive Refinement Networks |
Authors | Anne S. Wannenwetsch, Stefan Roth |
Abstract | Encoder-decoder networks have found widespread use in various dense prediction tasks. However, the strong reduction of spatial resolution in the encoder leads to a loss of location information as well as boundary artifacts. To address this, image-adaptive post-processing methods have shown beneficial by leveraging the high-resolution input image(s) as guidance data. We extend such approaches by considering an important orthogonal source of information: the network’s confidence in its own predictions. We introduce probabilistic pixel-adaptive convolutions (PPACs), which not only depend on image guidance data for filtering, but also respect the reliability of per-pixel predictions. As such, PPACs allow for image-adaptive smoothing and simultaneously propagating pixels of high confidence into less reliable regions, while respecting object boundaries. We demonstrate their utility in refinement networks for optical flow and semantic segmentation, where PPACs lead to a clear reduction in boundary artifacts. Moreover, our proposed refinement step is able to substantially improve the accuracy on various widely used benchmarks. |
Tasks | Optical Flow Estimation, Semantic Segmentation |
Published | 2020-03-31 |
URL | https://arxiv.org/abs/2003.14407v1 |
https://arxiv.org/pdf/2003.14407v1.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-pixel-adaptive-refinement |
Repo | https://github.com/visinf/ppac_refinement |
Framework | pytorch |
Combating noisy labels by agreement: A joint training method with co-regularization
Title | Combating noisy labels by agreement: A joint training method with co-regularization |
Authors | Hongxin Wei, Lei Feng, Xiangyu Chen, Bo An |
Abstract | Deep Learning with noisy labels is a practically challenging problem in weakly-supervised learning. The state-of-the-art approaches “Decoupling” and “Co-teaching+” claim that the “disagreement” strategy is crucial for alleviating the problem of learning with noisy labels. In this paper, we start from a different perspective and propose a robust learning paradigm called JoCoR, which aims to reduce the diversity of two networks during training. Specifically, we first use two networks to make predictions on the same mini-batch data and calculate a joint loss with Co-Regularization for each training example. Then we select small-loss examples to update the parameters of both two networks simultaneously. Trained by the joint loss, these two networks would be more and more similar due to the effect of Co-Regularization. Extensive experimental results on corrupted data from benchmark datasets including MNIST, CIFAR-10, CIFAR-100 and Clothing1M demonstrate that JoCoR is superior to many state-of-the-art approaches for learning with noisy labels. |
Tasks | |
Published | 2020-03-05 |
URL | https://arxiv.org/abs/2003.02752v2 |
https://arxiv.org/pdf/2003.02752v2.pdf | |
PWC | https://paperswithcode.com/paper/combating-noisy-labels-by-agreement-a-joint |
Repo | https://github.com/hongxin001/JoCoR |
Framework | pytorch |
Learning Object Permanence from Video
Title | Learning Object Permanence from Video |
Authors | Aviv Shamsian, Ofri Kleinfeld, Amir Globerson, Gal Chechik |
Abstract | Object Permanence allows people to reason about the location of non-visible objects, by understanding that they continue to exist even when not perceived directly. Object Permanence is critical for building a model of the world, since objects in natural visual scenes dynamically occlude and contain each-other. Intensive studies in developmental psychology suggest that object permanence is a challenging task that is learned through extensive experience. Here we introduce the setup of learning Object Permanence from data. We explain why this learning problem should be dissected into four components, where objects are (1) visible, (2) occluded, (3) contained by another object and (4) carried by a containing object. The fourth subtask, where a target object is carried by a containing object, is particularly challenging because it requires a system to reason about a moving location of an invisible object. We then present a unified deep architecture that learns to predict object location under these four scenarios. We evaluate the architecture and system on a new dataset based on CATER, and find that it outperforms previous localization methods and various baselines. |
Tasks | |
Published | 2020-03-23 |
URL | https://arxiv.org/abs/2003.10469v2 |
https://arxiv.org/pdf/2003.10469v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-object-permanence-from-video |
Repo | https://github.com/ofrikleinfeld/ObjectPermanence |
Framework | pytorch |
CorGAN: Correlation-Capturing Convolutional Generative Adversarial Networks for Generating Synthetic Healthcare Records
Title | CorGAN: Correlation-Capturing Convolutional Generative Adversarial Networks for Generating Synthetic Healthcare Records |
Authors | Amirsina Torfi, Edward A. Fox |
Abstract | Deep learning models have demonstrated high-quality performance in areas such as image classification and speech processing. However, creating a deep learning model using electronic health record (EHR) data, requires addressing particular privacy challenges that are unique to researchers in this domain. This matter focuses attention on generating realistic synthetic data while ensuring privacy. In this paper, we propose a novel framework called correlation-capturing Generative Adversarial Network (CorGAN), to generate synthetic healthcare records. In CorGAN we utilize Convolutional Neural Networks to capture the correlations between adjacent medical features in the data representation space by combining Convolutional Generative Adversarial Networks and Convolutional Autoencoders. To demonstrate the model fidelity, we show that CorGAN generates synthetic data with performance similar to that of real data in various Machine Learning settings such as classification and prediction. We also give a privacy assessment and report on statistical analysis regarding realistic characteristics of the synthetic data. The software of this work is open-source and is available at: https://github.com/astorfi/cor-gan. |
Tasks | Disease Prediction, Image Classification, Synthetic Data Generation |
Published | 2020-01-25 |
URL | https://arxiv.org/abs/2001.09346v2 |
https://arxiv.org/pdf/2001.09346v2.pdf | |
PWC | https://paperswithcode.com/paper/cor-gan-correlation-capturing-convolutional |
Repo | https://github.com/astorfi/cor-gan |
Framework | pytorch |
Exploring Unknown States with Action Balance
Title | Exploring Unknown States with Action Balance |
Authors | Yan Song, Yingfeng Chen, Yujing Hu, Changjie Fan |
Abstract | Exploration is a key problem in reinforcement learning. Recently bonus-based methods have achieved considerable successes in environments where exploration is difficult such as Montezuma’s Revenge, which assign additional bonus (e.g., intrinsic reward) to guide the agent to rarely visited states. Since the bonus is calculated according to the novelty of the next state after performing an action, we call such methods the next-state bonus methods. However, the next-state bonus methods bring extra issues. It may lead agent to be trapped in states that fewer being visited and ignore to explore unknown states. Moreover, the behavior policy of the agent is also influenced by the bonus added to the state (or state-action) values indirectly. In contrast to the bonus-based methods which explore in known states, in this paper, we focus on the other part of exploration: exploration for finding unknown states. We propose the action balance exploration method to overcome the defects of the next-state bonus methods, which balances the chosen time of each action in each state and can be treated as an extension of upper confidence bound (UCB) to deep reinforcement learning. To take both the advantages of the next-state bonus method and our action balance exploration method, we propose the action balance RND method, which takes both parts of exploration into consideration. The experiments on grid world and Atari games demonstrate action balance exploration has a better capability in finding unknown states and can improve the real performance of RND in some hard exploration environments respectively. |
Tasks | Atari Games, Montezuma’s Revenge |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.04518v1 |
https://arxiv.org/pdf/2003.04518v1.pdf | |
PWC | https://paperswithcode.com/paper/exploring-unknown-states-with-action-balance |
Repo | https://github.com/gomiss/action-balance-exploration |
Framework | tf |
Spine intervertebral disc labeling using a fully convolutional redundant counting model
Title | Spine intervertebral disc labeling using a fully convolutional redundant counting model |
Authors | Lucas Rouhier, Francisco Perdigon Romero, Joseph Paul Cohen, Julien Cohen-Adad |
Abstract | Labeling intervertebral discs is relevant as it notably enables clinicians to understand the relationship between a patient’s symptoms (pain, paralysis) and the exact level of spinal cord injury. However manually labeling those discs is a tedious and user-biased task which would benefit from automated methods. While some automated methods already exist for MRI and CT-scan, they are either not publicly available, or fail to generalize across various imaging contrasts. In this paper we combine a Fully Convolutional Network (FCN) with inception modules to localize and label intervertebral discs. We demonstrate a proof-of-concept application in a publicly-available multi-center and multi-contrast MRI database (n=235 subjects). The code is publicly available at https://github.com/neuropoly/vertebral-labeling-deep-learning. |
Tasks | |
Published | 2020-03-09 |
URL | https://arxiv.org/abs/2003.04387v2 |
https://arxiv.org/pdf/2003.04387v2.pdf | |
PWC | https://paperswithcode.com/paper/spine-intervertebral-disc-labeling-using-a |
Repo | https://github.com/neuropoly/vertebral-labeling-deep-learning |
Framework | pytorch |
Exploring Inherent Properties of the Monophonic Melody of Songs
Title | Exploring Inherent Properties of the Monophonic Melody of Songs |
Authors | Zehao Wang, Shicheng Zhang, Xiaoou Chen |
Abstract | Melody is one of the most important components in music. Unlike other components in music theory, such as harmony and counterpoint, computable features for melody is urgently in need. These features are highly demanded as data-driven methods dominating the fields such as musical information retrieval and automatic music composition. To boost the performance of deep-learning-related musical tasks, we propose a set of interpretable features on monophonic melody for computational purposes. These features are defined not only in mathematical form, but also with some considerations on composers ‘intuition. For example, the Melodic Center of Gravity can reflect the sentence-wise contour of the melody, the local / global melody dynamics quantifies the dynamics of a melody that couples pitch and time in a sentence. We found that these features are considered by people universally in many genres of songs, even for atonal composition practices. Hopefully, these melodic features can provide nov el inspiration for future researchers as a tool in the field of MIR and automatic composition. |
Tasks | Information Retrieval |
Published | 2020-03-20 |
URL | https://arxiv.org/abs/2003.09287v1 |
https://arxiv.org/pdf/2003.09287v1.pdf | |
PWC | https://paperswithcode.com/paper/exploring-inherent-properties-of-the |
Repo | https://github.com/water45wzh/exploring_melody |
Framework | none |
Semantically-Enriched Search Engine for Geoportals: A Case Study with ArcGIS Online
Title | Semantically-Enriched Search Engine for Geoportals: A Case Study with ArcGIS Online |
Authors | Gengchen Mai, Krzysztof Janowicz, Sathya Prasad, Meilin Shi, Ling Cai, Rui Zhu, Blake Regalia, Ni Lao |
Abstract | Many geoportals such as ArcGIS Online are established with the goal of improving geospatial data reusability and achieving intelligent knowledge discovery. However, according to previous research, most of the existing geoportals adopt Lucene-based techniques to achieve their core search functionality, which has a limited ability to capture the user’s search intentions. To better understand a user’s search intention, query expansion can be used to enrich the user’s query by adding semantically similar terms. In the context of geoportals and geographic information retrieval, we advocate the idea of semantically enriching a user’s query from both geospatial and thematic perspectives. In the geospatial aspect, we propose to enrich a query by using both place partonomy and distance decay. In terms of the thematic aspect, concept expansion and embedding-based document similarity are used to infer the implicit information hidden in a user’s query. This semantic query expansion 1 2 G. Mai et al. framework is implemented as a semantically-enriched search engine using ArcGIS Online as a case study. A benchmark dataset is constructed to evaluate the proposed framework. Our evaluation results show that the proposed semantic query expansion framework is very effective in capturing a user’s search intention and significantly outperforms a well-established baseline-Lucene’s practical scoring function-with more than 3.0 increments in DCG@K (K=3,5,10). |
Tasks | Information Retrieval |
Published | 2020-03-14 |
URL | https://arxiv.org/abs/2003.06561v1 |
https://arxiv.org/pdf/2003.06561v1.pdf | |
PWC | https://paperswithcode.com/paper/semantically-enriched-search-engine-for |
Repo | https://github.com/gengchenmai/arcgis-online-search-engine |
Framework | none |