Paper Group AWR 31
tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification. CobWeb: A Research Prototype for Exploring User Bias in Political Fact-Checking. Mixture Content Selection for Diverse Sequence Generation. Dually Interactive Matching Network for Personalized Response Selection in Retrieval-Based Chatbots. A Wind of Change: …
tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification
Title | tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification |
Authors | Blaž Škrlj, Matej Martinc, Jan Kralj, Nada Lavrač, Senja Pollak |
Abstract | The use of background knowledge remains largely unexploited in many text classification tasks. In this work, we explore word taxonomies as means for constructing new semantic features, which may improve the performance and robustness of the learned classifiers. We propose tax2vec, a parallel algorithm for constructing taxonomy based features, and demonstrate its use on six short-text classification problems, including gender, age and personality type prediction, drug effectiveness and side effect prediction, and news topic prediction. The experimental results indicate that the interpretable features constructed using tax2vec can notably improve the performance of classifiers; the constructed features, in combination with fast, linear classifiers tested against strong baselines, such as hierarchical attention neural networks, achieved comparable or better classification results on short documents. Further, tax2vec can also serve for extraction of corpus-specific keywords. Finally, we investigated the semantic space of potential features where we observe a similarity with the well known Zipf’s law. |
Tasks | Text Classification |
Published | 2019-02-01 |
URL | http://arxiv.org/abs/1902.00438v2 |
http://arxiv.org/pdf/1902.00438v2.pdf | |
PWC | https://paperswithcode.com/paper/tax2vec-constructing-interpretable-features |
Repo | https://github.com/SkBlaz/tax2vec |
Framework | tf |
CobWeb: A Research Prototype for Exploring User Bias in Political Fact-Checking
Title | CobWeb: A Research Prototype for Exploring User Bias in Political Fact-Checking |
Authors | Anubrata Das, Kunjan Mehta, Matthew Lease |
Abstract | The effect of user bias in fact-checking has not been explored extensively from a user-experience perspective. We estimate the user bias as a function of the user’s perceived reputation of the news sources (e.g., a user with liberal beliefs may tend to trust liberal sources). We build an interface to communicate the role of estimated user bias in the context of a fact-checking task. We also explore the utility of helping users visualize their detected level of bias. 80% of the users of our system find that the presence of an indicator for user bias is useful in judging the veracity of a political claim. |
Tasks | |
Published | 2019-07-08 |
URL | https://arxiv.org/abs/1907.03718v1 |
https://arxiv.org/pdf/1907.03718v1.pdf | |
PWC | https://paperswithcode.com/paper/cobweb-a-research-prototype-for-exploring |
Repo | https://github.com/anubrata/anubrata.github.io |
Framework | none |
Mixture Content Selection for Diverse Sequence Generation
Title | Mixture Content Selection for Diverse Sequence Generation |
Authors | Jaemin Cho, Minjoon Seo, Hannaneh Hajishirzi |
Abstract | Generating diverse sequences is important in many NLP applications such as question generation or summarization that exhibit semantically one-to-many relationships between source and the target sequences. We present a method to explicitly separate diversification from generation using a general plug-and-play module (called SELECTOR) that wraps around and guides an existing encoder-decoder model. The diversification stage uses a mixture of experts to sample different binary masks on the source sequence for diverse content selection. The generation stage uses a standard encoder-decoder model given each selected content from the source sequence. Due to the non-differentiable nature of discrete sampling and the lack of ground truth labels for binary mask, we leverage a proxy for ground truth mask and adopt stochastic hard-EM for training. In question generation (SQuAD) and abstractive summarization (CNN-DM), our method demonstrates significant improvements in accuracy, diversity and training efficiency, including state-of-the-art top-1 accuracy in both datasets, 6% gain in top-5 accuracy, and 3.7 times faster training over a state of the art model. Our code is publicly available at https://github.com/clovaai/FocusSeq2Seq. |
Tasks | Abstractive Text Summarization, Document Summarization, Question Generation |
Published | 2019-09-04 |
URL | https://arxiv.org/abs/1909.01953v1 |
https://arxiv.org/pdf/1909.01953v1.pdf | |
PWC | https://paperswithcode.com/paper/mixture-content-selection-for-diverse |
Repo | https://github.com/clovaai/FocusSeq2Seq |
Framework | pytorch |
Dually Interactive Matching Network for Personalized Response Selection in Retrieval-Based Chatbots
Title | Dually Interactive Matching Network for Personalized Response Selection in Retrieval-Based Chatbots |
Authors | Jia-Chen Gu, Zhen-Hua Ling, Xiaodan Zhu, Quan Liu |
Abstract | This paper proposes a dually interactive matching network (DIM) for presenting the personalities of dialogue agents in retrieval-based chatbots. This model develops from the interactive matching network (IMN) which models the matching degree between a context composed of multiple utterances and a response candidate. Compared with previous persona fusion approaches which enhance the representation of a context by calculating its similarity with a given persona, the DIM model adopts a dual matching architecture, which performs interactive matching between responses and contexts and between responses and personas respectively for ranking response candidates. Experimental results on PERSONA-CHAT dataset show that the DIM model outperforms its baseline model, i.e., IMN with persona fusion, by a margin of 14.5% and outperforms the current state-of-the-art model by a margin of 27.7% in terms of top-1 accuracy hits@1. |
Tasks | |
Published | 2019-08-16 |
URL | https://arxiv.org/abs/1908.05859v3 |
https://arxiv.org/pdf/1908.05859v3.pdf | |
PWC | https://paperswithcode.com/paper/dually-interactive-matching-network-for |
Repo | https://github.com/JasonForJoy/DIM |
Framework | tf |
A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains
Title | A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains |
Authors | Dominik Schlechtweg, Anna Hätty, Marco del Tredici, Sabine Schulte im Walde |
Abstract | We perform an interdisciplinary large-scale evaluation for detecting lexical semantic divergences in a diachronic and in a synchronic task: semantic sense changes across time, and semantic sense changes across domains. Our work addresses the superficialness and lack of comparison in assessing models of diachronic lexical change, by bringing together and extending benchmark models on a common state-of-the-art evaluation task. In addition, we demonstrate that the same evaluation task and modelling approaches can successfully be utilised for the synchronic detection of domain-specific sense divergences in the field of term extraction. |
Tasks | |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.02979v1 |
https://arxiv.org/pdf/1906.02979v1.pdf | |
PWC | https://paperswithcode.com/paper/a-wind-of-change-detecting-and-evaluating |
Repo | https://github.com/Garrafao/LSCDetection |
Framework | none |
Large-Scale Long-Tailed Recognition in an Open World
Title | Large-Scale Long-Tailed Recognition in an Open World |
Authors | Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, Boqing Gong, Stella X. Yu |
Abstract | Real world data often have a long-tailed and open-ended distribution. A practical recognition system must classify among majority and minority classes, generalize from a few known instances, and acknowledge novelty upon a never seen instance. We define Open Long-Tailed Recognition (OLTR) as learning from such naturally distributed data and optimizing the classification accuracy over a balanced test set which include head, tail, and open classes. OLTR must handle imbalanced classification, few-shot learning, and open-set recognition in one integrated algorithm, whereas existing classification approaches focus only on one aspect and deliver poorly over the entire class spectrum. The key challenges are how to share visual knowledge between head and tail classes and how to reduce confusion between tail and open classes. We develop an integrated OLTR algorithm that maps an image to a feature space such that visual concepts can easily relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world. Our so-called dynamic meta-embedding combines a direct image feature and an associated memory feature, with the feature norm indicating the familiarity to known classes. On three large-scale OLTR datasets we curate from object-centric ImageNet, scene-centric Places, and face-centric MS1M data, our method consistently outperforms the state-of-the-art. Our code, datasets, and models enable future OLTR research and are publicly available at https://liuziwei7.github.io/projects/LongTail.html. |
Tasks | Few-Shot Learning, Open Set Learning |
Published | 2019-04-10 |
URL | http://arxiv.org/abs/1904.05160v2 |
http://arxiv.org/pdf/1904.05160v2.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-long-tailed-recognition-in-an |
Repo | https://github.com/zhmiao/OpenLongTailRecognition-OLTR |
Framework | pytorch |
Real-Time Emotion Recognition via Attention Gated Hierarchical Memory Network
Title | Real-Time Emotion Recognition via Attention Gated Hierarchical Memory Network |
Authors | Wenxiang Jiao, Michael R. Lyu, Irwin King |
Abstract | Real-time emotion recognition (RTER) in conversations is significant for developing emotionally intelligent chatting machines. Without the future context in RTER, it becomes critical to build the memory bank carefully for capturing historical context and summarize the memories appropriately to retrieve relevant information. We propose an Attention Gated Hierarchical Memory Network (AGHMN) to address the problems of prior work: (1) Commonly used convolutional neural networks (CNNs) for utterance feature extraction are less compatible in the memory modules; (2) Unidirectional gated recurrent units (GRUs) only allow each historical utterance to have context before it, preventing information propagation in the opposite direction; (3) The Soft Attention for summarizing loses the positional and ordering information of memories, regardless of how the memory bank is built. Particularly, we propose a Hierarchical Memory Network (HMN) with a bidirectional GRU (BiGRU) as the utterance reader and a BiGRU fusion layer for the interaction between historical utterances. For memory summarizing, we propose an Attention GRU (AGRU) where we utilize the attention weights to update the internal state of GRU. We further promote the AGRU to a bidirectional variant (BiAGRU) to balance the contextual information from recent memories and that from distant memories. We conduct experiments on two emotion conversation datasets with extensive analysis, demonstrating the efficacy of our AGHMN models. |
Tasks | Emotion Recognition |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.09075v1 |
https://arxiv.org/pdf/1911.09075v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-emotion-recognition-via-attention |
Repo | https://github.com/wxjiao/AGHMN |
Framework | pytorch |
A Hardware Friendly Unsupervised Memristive Neural Network with Weight Sharing Mechanism
Title | A Hardware Friendly Unsupervised Memristive Neural Network with Weight Sharing Mechanism |
Authors | Zhiri Tang, Ruohua Zhu, Peng Lin, Jin He, Hao Wang, Qijun Huang, Sheng Chang, Qiming Ma |
Abstract | Memristive neural networks (MNNs), which use memristors as neurons or synapses, have become a hot research topic recently. However, most memristors are not compatible with mainstream integrated circuit technology and their stabilities in large-scale are not very well so far. In this paper, a hardware friendly MNN circuit is introduced, in which the memristive characteristics are implemented by digital integrated circuit. Through this method, spike timing dependent plasticity (STDP) and unsupervised learning are realized. A weight sharing mechanism is proposed to bridge the gap of network scale and hardware resource. Experiment results show the hardware resource is significantly saved with it, maintaining good recognition accuracy and high speed. Moreover, the tendency of resource increase is slower than the expansion of network scale, which infers our method’s potential on large scale neuromorphic network’s realization. |
Tasks | |
Published | 2019-01-01 |
URL | http://arxiv.org/abs/1901.00100v1 |
http://arxiv.org/pdf/1901.00100v1.pdf | |
PWC | https://paperswithcode.com/paper/a-hardware-friendly-unsupervised-memristive |
Repo | https://github.com/GerinTang/InnovateFPGA2018_PR039 |
Framework | none |
Lifelong Sequential Modeling with Personalized Memorization for User Response Prediction
Title | Lifelong Sequential Modeling with Personalized Memorization for User Response Prediction |
Authors | Kan Ren, Jiarui Qin, Yuchen Fang, Weinan Zhang, Lei Zheng, Weijie Bian, Guorui Zhou, Jian Xu, Yong Yu, Xiaoqiang Zhu, Kun Gai |
Abstract | User response prediction, which models the user preference w.r.t. the presented items, plays a key role in online services. With two-decade rapid development, nowadays the cumulated user behavior sequences on mature Internet service platforms have become extremely long since the user’s first registration. Each user not only has intrinsic tastes, but also keeps changing her personal interests during lifetime. Hence, it is challenging to handle such lifelong sequential modeling for each individual user. Existing methodologies for sequential modeling are only capable of dealing with relatively recent user behaviors, which leaves huge space for modeling long-term especially lifelong sequential patterns to facilitate user modeling. Moreover, one user’s behavior may be accounted for various previous behaviors within her whole online activity history, i.e., long-term dependency with multi-scale sequential patterns. In order to tackle these challenges, in this paper, we propose a Hierarchical Periodic Memory Network for lifelong sequential modeling with personalized memorization of sequential patterns for each user. The model also adopts a hierarchical and periodical updating mechanism to capture multi-scale sequential patterns of user interests while supporting the evolving user behavior logs. The experimental results over three large-scale real-world datasets have demonstrated the advantages of our proposed model with significant improvement in user response prediction performance against the state-of-the-arts. |
Tasks | |
Published | 2019-05-02 |
URL | https://arxiv.org/abs/1905.00758v2 |
https://arxiv.org/pdf/1905.00758v2.pdf | |
PWC | https://paperswithcode.com/paper/lifelong-sequential-modeling-with |
Repo | https://github.com/alimamarankgroup/HPMN |
Framework | tf |
DeepSwarm: Optimising Convolutional Neural Networks using Swarm Intelligence
Title | DeepSwarm: Optimising Convolutional Neural Networks using Swarm Intelligence |
Authors | Edvinas Byla, Wei Pang |
Abstract | In this paper we propose DeepSwarm, a novel neural architecture search (NAS) method based on Swarm Intelligence principles. At its core DeepSwarm uses Ant Colony Optimization (ACO) to generate ant population which uses the pheromone information to collectively search for the best neural architecture. Furthermore, by using local and global pheromone update rules our method ensures the balance between exploitation and exploration. On top of this, to make our method more efficient we combine progressive neural architecture search with weight reusability. Furthermore, due to the nature of ACO our method can incorporate heuristic information which can further speed up the search process. After systematic and extensive evaluation, we discover that on three different datasets (MNIST, Fashion-MNIST, and CIFAR-10) when compared to existing systems our proposed method demonstrates competitive performance. Finally, we open source DeepSwarm as a NAS library and hope it can be used by more deep learning researchers and practitioners. |
Tasks | Neural Architecture Search |
Published | 2019-05-17 |
URL | https://arxiv.org/abs/1905.07350v1 |
https://arxiv.org/pdf/1905.07350v1.pdf | |
PWC | https://paperswithcode.com/paper/deepswarm-optimising-convolutional-neural |
Repo | https://github.com/Pattio/DeepSwarm |
Framework | tf |
On the Evaluation of Conditional GANs
Title | On the Evaluation of Conditional GANs |
Authors | Terrance DeVries, Adriana Romero, Luis Pineda, Graham W. Taylor, Michal Drozdzal |
Abstract | Conditional Generative Adversarial Networks (cGANs) are finding increasingly widespread use in many application domains. Despite outstanding progress, quantitative evaluation of such models often involves multiple distinct metrics to assess different desirable properties, such as image quality, conditional consistency, and intra-conditioning diversity. In this setting, model benchmarking becomes a challenge, as each metric may indicate a different “best” model. In this paper, we propose the Frechet Joint Distance (FJD), which is defined as the Frechet distance between joint distributions of images and conditioning, allowing it to implicitly capture the aforementioned properties in a single metric. We conduct proof-of-concept experiments on a controllable synthetic dataset, which consistently highlight the benefits of FJD when compared to currently established metrics. Moreover, we use the newly introduced metric to compare existing cGAN-based models for a variety of conditioning modalities (e.g. class labels, object masks, bounding boxes, images, and text captions). We show that FJD can be used as a promising single metric for cGAN benchmarking and model selection. Code can be found at https://github.com/facebookresearch/fjd. |
Tasks | Model Selection |
Published | 2019-07-11 |
URL | https://arxiv.org/abs/1907.08175v3 |
https://arxiv.org/pdf/1907.08175v3.pdf | |
PWC | https://paperswithcode.com/paper/on-the-evaluation-of-conditional-gans |
Repo | https://github.com/facebookresearch/fjd |
Framework | pytorch |
Learning Priors in High-frequency Domain for Inverse Imaging Reconstruction
Title | Learning Priors in High-frequency Domain for Inverse Imaging Reconstruction |
Authors | Zhuonan He, Jinjie Zhou, Dong Liang, Yuhao Wang, Qiegen Liu |
Abstract | Ill-posed inverse problems in imaging remain an active research topic in several decades, with new approaches constantly emerging. Recognizing that the popular dictionary learning and convolutional sparse coding are both essentially modeling the high-frequency component of an image, which convey most of the semantic information such as texture details, in this work we propose a novel multi-profile high-frequency transform-guided denoising autoencoder as prior (HF-DAEP). To achieve this goal, we first extract a set of multi-profile high-frequency components via a specific transformation and add the artificial Gaussian noise to these high-frequency components as training samples. Then, as the high-frequency prior information is learned, we incorporate it into classical iterative reconstruction process by proximal gradient descent technique. Preliminary results on highly under-sampled magnetic resonance imaging and sparse-view computed tomography reconstruction demonstrate that the proposed method can efficiently reconstruct feature details and present advantages over state-of-the-arts. |
Tasks | Denoising, Dictionary Learning |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.11148v1 |
https://arxiv.org/pdf/1910.11148v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-priors-in-high-frequency-domain-for |
Repo | https://github.com/yqx7150/HFDAEP |
Framework | none |
Neural Machine Translating from Natural Language to SPARQL
Title | Neural Machine Translating from Natural Language to SPARQL |
Authors | Xiaoyu Yin, Dagmar Gromann, Sebastian Rudolph |
Abstract | SPARQL is a highly powerful query language for an ever-growing number of Linked Data resources and Knowledge Graphs. Using it requires a certain familiarity with the entities in the domain to be queried as well as expertise in the language’s syntax and semantics, none of which average human web users can be assumed to possess. To overcome this limitation, automatically translating natural language questions to SPARQL queries has been a vibrant field of research. However, to this date, the vast success of deep learning methods has not yet been fully propagated to this research problem. This paper contributes to filling this gap by evaluating the utilization of eight different Neural Machine Translation (NMT) models for the task of translating from natural language to the structured query language SPARQL. While highlighting the importance of high-quantity and high-quality datasets, the results show a dominance of a CNN-based architecture with a BLEU score of up to 98 and accuracy of up to 94%. |
Tasks | Knowledge Graphs, Machine Translation |
Published | 2019-06-21 |
URL | https://arxiv.org/abs/1906.09302v1 |
https://arxiv.org/pdf/1906.09302v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-machine-translating-from-natural |
Repo | https://github.com/AKSW/DBNQA |
Framework | none |
Integrals over Gaussians under Linear Domain Constraints
Title | Integrals over Gaussians under Linear Domain Constraints |
Authors | Alexandra Gessner, Oindrila Kanjilal, Philipp Hennig |
Abstract | Integrals of linearly constrained multivariate Gaussian densities are a frequent problem in machine learning and statistics, arising in tasks like generalized linear models and Bayesian optimization. Yet they are notoriously hard to compute, and to further complicate matters, the numerical values of such integrals may be very small. We present an efficient black-box algorithm that exploits geometry for the estimation of integrals over a small, truncated Gaussian volume, and to simulate therefrom. Our algorithm uses the Holmes-Diaconis-Ross (HDR) method combined with an analytic version of elliptical slice sampling (ESS). Adapted to the linear setting, ESS allows for rejection-free sampling, because intersections of ellipses and domain boundaries have closed-form solutions. The key idea of HDR is to decompose the integral into easier-to-compute conditional probabilities by using a sequence of nested domains. Remarkably, it allows for direct computation of the logarithm of the integral value and thus enables the computation of extremely small probability masses. We demonstrate the effectiveness of our tailored combination of HDR and ESS on high-dimensional integrals and on entropy search for Bayesian optimization. |
Tasks | |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09328v2 |
https://arxiv.org/pdf/1910.09328v2.pdf | |
PWC | https://paperswithcode.com/paper/integrals-over-gaussians-under-linear-domain |
Repo | https://github.com/alpiges/LinConGauss |
Framework | none |
Learning Conditional Deformable Templates with Convolutional Networks
Title | Learning Conditional Deformable Templates with Convolutional Networks |
Authors | Adrian V. Dalca, Marianne Rakic, John Guttag, Mert R. Sabuncu |
Abstract | We develop a learning framework for building deformable templates, which play a fundamental role in many image analysis and computational anatomy tasks. Conventional methods for template creation and image alignment to the template have undergone decades of rich technical development. In these frameworks, templates are constructed using an iterative process of template estimation and alignment, which is often computationally very expensive. Due in part to this shortcoming, most methods compute a single template for the entire population of images, or a few templates for specific sub-groups of the data. In this work, we present a probabilistic model and efficient learning strategy that yields either universal or conditional templates, jointly with a neural network that provides efficient alignment of the images to these templates. We demonstrate the usefulness of this method on a variety of domains, with a special focus on neuroimaging. This is particularly useful for clinical applications where a pre-existing template does not exist, or creating a new one with traditional methods can be prohibitively expensive. Our code and atlases are available online as part of the VoxelMorph library at http://voxelmorph.csail.mit.edu. |
Tasks | Deformable Medical Image Registration, Medical Image Registration |
Published | 2019-08-07 |
URL | https://arxiv.org/abs/1908.02738v2 |
https://arxiv.org/pdf/1908.02738v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-conditional-deformable-templates |
Repo | https://github.com/voxelmorph/voxelmorph |
Framework | tf |