Paper Group AWR 195
Robust Lexical Features for Improved Neural Network Named-Entity Recognition. Multi-view Banded Spectral Clustering with Application to ICD9 Clustering. Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia. SESR: Single Image Super Resolution with Recursive Squeeze and Excitation Netwo …
Robust Lexical Features for Improved Neural Network Named-Entity Recognition
Title | Robust Lexical Features for Improved Neural Network Named-Entity Recognition |
Authors | Abbas Ghaddar, Philippe Langlais |
Abstract | Neural network approaches to Named-Entity Recognition reduce the need for carefully hand-crafted features. While some features do remain in state-of-the-art systems, lexical features have been mostly discarded, with the exception of gazetteers. In this work, we show that this is unfair: lexical features are actually quite useful. We propose to embed words and entity types into a low-dimensional vector space we train from annotated data produced by distant supervision thanks to Wikipedia. From this, we compute - offline - a feature vector representing each word. When used with a vanilla recurrent neural network model, this representation yields substantial improvements. We establish a new state-of-the-art F1 score of 87.95 on ONTONOTES 5.0, while matching state-of-the-art performance with a F1 score of 91.73 on the over-studied CONLL-2003 dataset. |
Tasks | Named Entity Recognition |
Published | 2018-06-09 |
URL | http://arxiv.org/abs/1806.03489v1 |
http://arxiv.org/pdf/1806.03489v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-lexical-features-for-improved-neural |
Repo | https://github.com/ghaddarAbs/NER-with-LS |
Framework | tf |
Multi-view Banded Spectral Clustering with Application to ICD9 Clustering
Title | Multi-view Banded Spectral Clustering with Application to ICD9 Clustering |
Authors | Luwan Zhang, Katherine Liao, Issac Kohane, Tianxi Cai |
Abstract | Despite recent development in methodology, community detection remains a challenging problem. Existing literature largely focuses on the standard setting where a network is learned using an observed adjacency matrix from a single data source. Constructing a shared network from multiple data sources is more challenging due to the heterogeneity across populations. Additionally, no existing method leverages the prior distance knowledge available in many domains to help the discovery of the network structure. To bridge this gap, in this paper we propose a novel spectral clustering method that optimally combines multiple data sources while leveraging the prior distance knowledge. The proposed method combines a banding step guided by the distance knowledge with a subsequent weighting step to maximize consensus across multiple sources. Its statistical performance is thoroughly studied under a multi-view stochastic block model. We also provide a simple yet optimal rule of choosing weights in practice. The efficacy and robustness of the method is fully demonstrated through extensive simulations. Finally, we apply the method to cluster the International classification of diseases, ninth revision (ICD9), codes and yield a very insightful clustering structure by integrating information from a large claim database and two healthcare systems. |
Tasks | Community Detection |
Published | 2018-04-06 |
URL | http://arxiv.org/abs/1804.02097v2 |
http://arxiv.org/pdf/1804.02097v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-view-banded-spectral-clustering-with |
Repo | https://github.com/celehs/mvBSC |
Framework | none |
Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia
Title | Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia |
Authors | Ikuya Yamada, Akari Asai, Jin Sakuma, Hiroyuki Shindo, Hideaki Takeda, Yoshiyasu Takefuji, Yuji Matsumoto |
Abstract | The embeddings of entities in a large knowledge base (e.g., Wikipedia) are highly beneficial for solving various natural language tasks that involve real world knowledge. In this paper, we present Wikipedia2Vec, a Python-based open-source tool for learning the embeddings of words and entities from Wikipedia. The proposed tool enables users to learn the embeddings efficiently by issuing a single command with a Wikipedia dump file as an argument. We also introduce a web-based demonstration of our tool that allows users to visualize and explore the learned embeddings. In our experiments, our tool achieved a state-of-the-art result on the KORE entity relatedness dataset, and competitive results on various standard benchmark datasets. Furthermore, our tool has been used as a key component in various recent studies. We publicize the source code, demonstration, and the pretrained embeddings for 12 languages at https://wikipedia2vec.github.io/. |
Tasks | |
Published | 2018-12-15 |
URL | https://arxiv.org/abs/1812.06280v3 |
https://arxiv.org/pdf/1812.06280v3.pdf | |
PWC | https://paperswithcode.com/paper/wikipedia2vec-an-optimized-tool-for-learning |
Repo | https://github.com/wikipedia2vec/wikipedia2vec |
Framework | none |
SESR: Single Image Super Resolution with Recursive Squeeze and Excitation Networks
Title | SESR: Single Image Super Resolution with Recursive Squeeze and Excitation Networks |
Authors | Xi Cheng, Xiang Li, Ying Tai, Jian Yang |
Abstract | Single image super resolution is a very important computer vision task, with a wide range of applications. In recent years, the depth of the super-resolution model has been constantly increasing, but with a small increase in performance, it has brought a huge amount of computation and memory consumption. In this work, in order to make the super resolution models more effective, we proposed a novel single image super resolution method via recursive squeeze and excitation networks (SESR). By introducing the squeeze and excitation module, our SESR can model the interdependencies and relationships between channels and that makes our model more efficiency. In addition, the recursive structure and progressive reconstruction method in our model minimized the layers and parameters and enabled SESR to simultaneously train multi-scale super resolution in a single model. After evaluating on four benchmark test sets, our model is proved to be above the state-of-the-art methods in terms of speed and accuracy. |
Tasks | Image Super-Resolution, Super-Resolution |
Published | 2018-01-31 |
URL | http://arxiv.org/abs/1801.10319v1 |
http://arxiv.org/pdf/1801.10319v1.pdf | |
PWC | https://paperswithcode.com/paper/sesr-single-image-super-resolution-with |
Repo | https://github.com/opteroncx/SESR |
Framework | pytorch |
Improving Text-to-SQL Evaluation Methodology
Title | Improving Text-to-SQL Evaluation Methodology |
Authors | Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, Dragomir Radev |
Abstract | To be informative, an evaluation must measure how well systems generalize to realistic unseen data. We identify limitations of and propose improvements to current evaluations of text-to-SQL systems. First, we compare human-generated and automatically generated questions, characterizing properties of queries necessary for real-world applications. To facilitate evaluation on multiple datasets, we release standardized and improved versions of seven existing datasets and one new text-to-SQL dataset. Second, we show that the current division of data into training and test sets measures robustness to variations in the way questions are asked, but only partially tests how well systems generalize to new queries; therefore, we propose a complementary dataset split for evaluation of future work. Finally, we demonstrate how the common practice of anonymizing variables during evaluation removes an important challenge of the task. Our observations highlight key difficulties, and our methodology enables effective measurement of future development. |
Tasks | Text-To-Sql |
Published | 2018-06-23 |
URL | http://arxiv.org/abs/1806.09029v1 |
http://arxiv.org/pdf/1806.09029v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-text-to-sql-evaluation-methodology |
Repo | https://github.com/jkkummerfeld/text2sql-data |
Framework | tf |
TypeSQL: Knowledge-based Type-Aware Neural Text-to-SQL Generation
Title | TypeSQL: Knowledge-based Type-Aware Neural Text-to-SQL Generation |
Authors | Tao Yu, Zifan Li, Zilin Zhang, Rui Zhang, Dragomir Radev |
Abstract | Interacting with relational databases through natural language helps users of any background easily query and analyze a vast amount of data. This requires a system that understands users’ questions and converts them to SQL queries automatically. In this paper we present a novel approach, TypeSQL, which views this problem as a slot filling task. Additionally, TypeSQL utilizes type information to better understand rare entities and numbers in natural language questions. We test this idea on the WikiSQL dataset and outperform the prior state-of-the-art by 5.5% in much less time. We also show that accessing the content of databases can significantly improve the performance when users’ queries are not well-formed. TypeSQL gets 82.6% accuracy, a 17.5% absolute improvement compared to the previous content-sensitive model. |
Tasks | Slot Filling, Text-To-Sql |
Published | 2018-04-25 |
URL | http://arxiv.org/abs/1804.09769v1 |
http://arxiv.org/pdf/1804.09769v1.pdf | |
PWC | https://paperswithcode.com/paper/typesql-knowledge-based-type-aware-neural |
Repo | https://github.com/taoyds/typesql |
Framework | pytorch |
You May Not Need Attention
Title | You May Not Need Attention |
Authors | Ofir Press, Noah A. Smith |
Abstract | In NMT, how far can we get without attention and without separate encoding and decoding? To answer that question, we introduce a recurrent neural translation model that does not use attention and does not have a separate encoder and decoder. Our eager translation model is low-latency, writing target tokens as soon as it reads the first source token, and uses constant memory during decoding. It performs on par with the standard attention-based model of Bahdanau et al. (2014), and better on long sentences. |
Tasks | |
Published | 2018-10-31 |
URL | http://arxiv.org/abs/1810.13409v1 |
http://arxiv.org/pdf/1810.13409v1.pdf | |
PWC | https://paperswithcode.com/paper/you-may-not-need-attention |
Repo | https://github.com/ofirpress/YouMayNotNeedAttention |
Framework | pytorch |
Multi-Adversarial Domain Adaptation
Title | Multi-Adversarial Domain Adaptation |
Authors | Zhongyi Pei, Zhangjie Cao, Mingsheng Long, Jianmin Wang |
Abstract | Recent advances in deep domain adaptation reveal that adversarial learning can be embedded into deep networks to learn transferable features that reduce distribution discrepancy between the source and target domains. Existing domain adversarial adaptation methods based on single domain discriminator only align the source and target data distributions without exploiting the complex multimode structures. In this paper, we present a multi-adversarial domain adaptation (MADA) approach, which captures multimode structures to enable fine-grained alignment of different data distributions based on multiple domain discriminators. The adaptation can be achieved by stochastic gradient descent with the gradients computed by back-propagation in linear-time. Empirical evidence demonstrates that the proposed model outperforms state of the art methods on standard domain adaptation datasets. |
Tasks | Domain Adaptation |
Published | 2018-09-04 |
URL | http://arxiv.org/abs/1809.02176v1 |
http://arxiv.org/pdf/1809.02176v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-adversarial-domain-adaptation |
Repo | https://github.com/arthurdouillard/mada.pytorch |
Framework | pytorch |
Algebraic Equivalence of Linear Structural Equation Models
Title | Algebraic Equivalence of Linear Structural Equation Models |
Authors | Thijs van Ommen, Joris M. Mooij |
Abstract | Despite their popularity, many questions about the algebraic constraints imposed by linear structural equation models remain open problems. For causal discovery, two of these problems are especially important: the enumeration of the constraints imposed by a model, and deciding whether two graphs define the same statistical model. We show how the half-trek criterion can be used to make progress in both of these problems. We apply our theoretical results to a small-scale model selection problem, and find that taking the additional algebraic constraints into account may lead to significant improvements in model selection accuracy. |
Tasks | Causal Discovery, Model Selection |
Published | 2018-07-10 |
URL | http://arxiv.org/abs/1807.03527v1 |
http://arxiv.org/pdf/1807.03527v1.pdf | |
PWC | https://paperswithcode.com/paper/algebraic-equivalence-of-linear-structural |
Repo | https://github.com/caus-am/aelsem |
Framework | none |
A Miniaturized Semantic Segmentation Method for Remote Sensing Image
Title | A Miniaturized Semantic Segmentation Method for Remote Sensing Image |
Authors | Shou-Yu Chen, Guang-Sheng Chen, Wei-Peng Jing |
Abstract | In order to save the memory, we propose a miniaturization method for neural network to reduce the parameter quantity existed in remote sensing (RS) image semantic segmentation model. The compact convolution optimization method is first used for standard U-Net to reduce the weights quantity. With the purpose of decreasing model performance loss caused by miniaturization and based on the characteristics of remote sensing image, fewer down-samplings and improved cascade atrous convolution are then used to improve the performance of the miniaturized U-Net. Compared with U-Net, our proposed Micro-Net not only achieves 29.26 times model compression, but also basically maintains the performance unchanged on the public dataset. We provide a Keras and Tensorflow hybrid programming implementation for our model: https://github.com/Isnot2bad/Micro-Net |
Tasks | Model Compression, Semantic Segmentation |
Published | 2018-10-27 |
URL | http://arxiv.org/abs/1810.11603v1 |
http://arxiv.org/pdf/1810.11603v1.pdf | |
PWC | https://paperswithcode.com/paper/a-miniaturized-semantic-segmentation-method |
Repo | https://github.com/Isnot2bad/Micro-Net |
Framework | tf |
Frame-Recurrent Video Super-Resolution
Title | Frame-Recurrent Video Super-Resolution |
Authors | Mehdi S. M. Sajjadi, Raviteja Vemulapalli, Matthew Brown |
Abstract | Recent advances in video super-resolution have shown that convolutional neural networks combined with motion compensation are able to merge information from multiple low-resolution (LR) frames to generate high-quality images. Current state-of-the-art methods process a batch of LR frames to generate a single high-resolution (HR) frame and run this scheme in a sliding window fashion over the entire video, effectively treating the problem as a large number of separate multi-frame super-resolution tasks. This approach has two main weaknesses: 1) Each input frame is processed and warped multiple times, increasing the computational cost, and 2) each output frame is estimated independently conditioned on the input frames, limiting the system’s ability to produce temporally consistent results. In this work, we propose an end-to-end trainable frame-recurrent video super-resolution framework that uses the previously inferred HR estimate to super-resolve the subsequent frame. This naturally encourages temporally consistent results and reduces the computational cost by warping only one image in each step. Furthermore, due to its recurrent nature, the proposed method has the ability to assimilate a large number of previous frames without increased computational demands. Extensive evaluations and comparisons with previous methods validate the strengths of our approach and demonstrate that the proposed framework is able to significantly outperform the current state of the art. |
Tasks | Motion Compensation, Multi-Frame Super-Resolution, Super-Resolution, Video Super-Resolution |
Published | 2018-01-14 |
URL | http://arxiv.org/abs/1801.04590v4 |
http://arxiv.org/pdf/1801.04590v4.pdf | |
PWC | https://paperswithcode.com/paper/frame-recurrent-video-super-resolution |
Repo | https://github.com/msmsajjadi/frvsr |
Framework | none |
Revisiting Image-Language Networks for Open-ended Phrase Detection
Title | Revisiting Image-Language Networks for Open-ended Phrase Detection |
Authors | Bryan A. Plummer, Kevin J. Shih, Yichen Li, Ke Xu, Svetlana Lazebnik, Stan Sclaroff, Kate Saenko |
Abstract | Most existing work that grounds natural language phrases in images starts with the assumption that the phrase in question is relevant to the image. In this paper we address a more realistic version of the natural language grounding task where we must both identify whether the phrase is relevant to an image and localize the phrase. This can also be viewed as a generalization of object detection to an open-ended vocabulary, introducing elements of few- and zero-shot detection. We propose an approach for this task that extends Faster R-CNN to relate image regions and phrases. By carefully initializing the classification layers of our network using canonical correlation analysis (CCA), we encourage a solution that is more discerning when reasoning between similar phrases, resulting in over double the performance compared to a naive adaptation on two popular phrase grounding datasets, Flickr30K Entities and ReferIt Game, with test-time phrase vocabulary sizes of 5K and 32K, respectively. |
Tasks | Object Detection, Phrase Grounding |
Published | 2018-11-17 |
URL | https://arxiv.org/abs/1811.07212v2 |
https://arxiv.org/pdf/1811.07212v2.pdf | |
PWC | https://paperswithcode.com/paper/open-vocabulary-phrase-detection |
Repo | https://github.com/BryanPlummer/cite |
Framework | tf |
Fast Video Shot Transition Localization with Deep Structured Models
Title | Fast Video Shot Transition Localization with Deep Structured Models |
Authors | Shitao Tang, Litong Feng, Zhangkui Kuang, Yimin Chen, Wei Zhang |
Abstract | Detection of video shot transition is a crucial pre-processing step in video analysis. Previous studies are restricted on detecting sudden content changes between frames through similarity measurement and multi-scale operations are widely utilized to deal with transitions of various lengths. However, localization of gradual transitions are still under-explored due to the high visual similarity between adjacent frames. Cut shot transitions are abrupt semantic breaks while gradual shot transitions contain low-level spatial-temporal patterns caused by video effects in addition to the gradual semantic breaks, e.g. dissolve. In order to address the problem, we propose a structured network which is able to detect these two shot transitions using targeted models separately. Considering speed performance trade-offs, we design a smart framework. With one TITAN GPU, the proposed method can achieve a 30(\times) real-time speed. Experiments on public TRECVID07 and RAI databases show that our method outperforms the state-of-the-art methods. In order to train a high-performance shot transition detector, we contribute a new database ClipShots, which contains 128636 cut transitions and 38120 gradual transitions from 4039 online videos. ClipShots intentionally collect short videos for more hard cases caused by hand-held camera vibrations, large object motions, and occlusion. |
Tasks | |
Published | 2018-08-13 |
URL | http://arxiv.org/abs/1808.04234v1 |
http://arxiv.org/pdf/1808.04234v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-video-shot-transition-localization-with |
Repo | https://github.com/Tangshitao/ClipShots |
Framework | none |
Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows
Title | Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows |
Authors | George Papamakarios, David C. Sterratt, Iain Murray |
Abstract | We present Sequential Neural Likelihood (SNL), a new method for Bayesian inference in simulator models, where the likelihood is intractable but simulating data from the model is possible. SNL trains an autoregressive flow on simulated data in order to learn a model of the likelihood in the region of high posterior density. A sequential training procedure guides simulations and reduces simulation cost by orders of magnitude. We show that SNL is more robust, more accurate and requires less tuning than related neural-based methods, and we discuss diagnostics for assessing calibration, convergence and goodness-of-fit. |
Tasks | Bayesian Inference, Calibration |
Published | 2018-05-18 |
URL | http://arxiv.org/abs/1805.07226v2 |
http://arxiv.org/pdf/1805.07226v2.pdf | |
PWC | https://paperswithcode.com/paper/sequential-neural-likelihood-fast-likelihood |
Repo | https://github.com/mackelab/delfi |
Framework | none |
A Flexible Procedure for Mixture Proportion Estimation in Positive-Unlabeled Learning
Title | A Flexible Procedure for Mixture Proportion Estimation in Positive-Unlabeled Learning |
Authors | Zhenfeng Lin, James P. Long |
Abstract | Positive–unlabeled (PU) learning considers two samples, a positive set P with observations from only one class and an unlabeled set U with observations from two classes. The goal is to classify observations in U. Class mixture proportion estimation (MPE) in U is a key step in PU learning. Blanchard et al. [2010] showed that MPE in PU learning is a generalization of the problem of estimating the proportion of true null hypotheses in multiple testing problems. Motivated by this idea, we propose reducing the problem to one dimension via construction of a probabilistic classifier trained on the P and U data sets followed by application of a one–dimensional mixture proportion method from the multiple testing literature to the observation class probabilities. The flexibility of this framework lies in the freedom to choose the classifier and the one–dimensional MPE method. We prove consistency of two mixture proportion estimators using bounds from empirical process theory, develop tuning parameter free implementations, and demonstrate that they have competitive performance on simulated waveform data and a protein signaling problem. |
Tasks | |
Published | 2018-01-30 |
URL | https://arxiv.org/abs/1801.09834v4 |
https://arxiv.org/pdf/1801.09834v4.pdf | |
PWC | https://paperswithcode.com/paper/a-flexible-procedure-for-mixture-proportion |
Repo | https://github.com/zflin/PU_learning |
Framework | none |