Paper Group AWR 138
Robust Multilingual Named Entity Recognition with Shallow Semi-Supervised Features. Poisson–Gamma Dynamical Systems. An Implementation of Faster RCNN with Study for Region Sampling. Learning Sparse Neural Networks through $L_0$ Regularization. Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks. Spatial Memory …
Robust Multilingual Named Entity Recognition with Shallow Semi-Supervised Features
Title | Robust Multilingual Named Entity Recognition with Shallow Semi-Supervised Features |
Authors | Rodrigo Agerri, German Rigau |
Abstract | We present a multilingual Named Entity Recognition approach based on a robust and general set of features across languages and datasets. Our system combines shallow local information with clustering semi-supervised features induced on large amounts of unlabeled text. Understanding via empirical experimentation how to effectively combine various types of clustering features allows us to seamlessly export our system to other datasets and languages. The result is a simple but highly competitive system which obtains state of the art results across five languages and twelve datasets. The results are reported on standard shared task evaluation data such as CoNLL for English, Spanish and Dutch. Furthermore, and despite the lack of linguistically motivated features, we also report best results for languages such as Basque and German. In addition, we demonstrate that our method also obtains very competitive results even when the amount of supervised data is cut by half, alleviating the dependency on manually annotated data. Finally, the results show that our emphasis on clustering features is crucial to develop robust out-of-domain models. The system and models are freely available to facilitate its use and guarantee the reproducibility of results. |
Tasks | Named Entity Recognition |
Published | 2017-01-31 |
URL | http://arxiv.org/abs/1701.09123v1 |
http://arxiv.org/pdf/1701.09123v1.pdf | |
PWC | https://paperswithcode.com/paper/robust-multilingual-named-entity-recognition |
Repo | https://github.com/ixa-ehu/ixa-pipe-nerc |
Framework | none |
Poisson–Gamma Dynamical Systems
Title | Poisson–Gamma Dynamical Systems |
Authors | Aaron Schein, Mingyuan Zhou, Hanna Wallach |
Abstract | We introduce a new dynamical system for sequentially observed multivariate count data. This model is based on the gamma–Poisson construction—a natural choice for count data—and relies on a novel Bayesian nonparametric prior that ties and shrinks the model parameters, thus avoiding overfitting. We present an efficient MCMC inference algorithm that advances recent work on augmentation schemes for inference in negative binomial models. Finally, we demonstrate the model’s inductive bias using a variety of real-world data sets, showing that it exhibits superior predictive performance over other models and infers highly interpretable latent structure. |
Tasks | |
Published | 2017-01-19 |
URL | http://arxiv.org/abs/1701.05573v1 |
http://arxiv.org/pdf/1701.05573v1.pdf | |
PWC | https://paperswithcode.com/paper/poisson-gamma-dynamical-systems |
Repo | https://github.com/aschein/pgds |
Framework | none |
An Implementation of Faster RCNN with Study for Region Sampling
Title | An Implementation of Faster RCNN with Study for Region Sampling |
Authors | Xinlei Chen, Abhinav Gupta |
Abstract | We adapted the join-training scheme of Faster RCNN framework from Caffe to TensorFlow as a baseline implementation for object detection. Our code is made publicly available. This report documents the simplifications made to the original pipeline, with justifications from ablation analysis on both PASCAL VOC 2007 and COCO 2014. We further investigated the role of non-maximal suppression (NMS) in selecting regions-of-interest (RoIs) for region classification, and found that a biased sampling toward small regions helps performance and can achieve on-par mAP to NMS-based sampling when converged sufficiently. |
Tasks | Object Detection |
Published | 2017-02-07 |
URL | http://arxiv.org/abs/1702.02138v2 |
http://arxiv.org/pdf/1702.02138v2.pdf | |
PWC | https://paperswithcode.com/paper/an-implementation-of-faster-rcnn-with-study |
Repo | https://github.com/PengchengAi/tf-faster-rcnn-pcai |
Framework | tf |
Learning Sparse Neural Networks through $L_0$ Regularization
Title | Learning Sparse Neural Networks through $L_0$ Regularization |
Authors | Christos Louizos, Max Welling, Diederik P. Kingma |
Abstract | We propose a practical method for $L_0$ norm regularization for neural networks: pruning the network during training by encouraging weights to become exactly zero. Such regularization is interesting since (1) it can greatly speed up training and inference, and (2) it can improve generalization. AIC and BIC, well-known model selection criteria, are special cases of $L_0$ regularization. However, since the $L_0$ norm of weights is non-differentiable, we cannot incorporate it directly as a regularization term in the objective function. We propose a solution through the inclusion of a collection of non-negative stochastic gates, which collectively determine which weights to set to zero. We show that, somewhat surprisingly, for certain distributions over the gates, the expected $L_0$ norm of the resulting gated weights is differentiable with respect to the distribution parameters. We further propose the \emph{hard concrete} distribution for the gates, which is obtained by “stretching” a binary concrete distribution and then transforming its samples with a hard-sigmoid. The parameters of the distribution over the gates can then be jointly optimized with the original network parameters. As a result our method allows for straightforward and efficient learning of model structures with stochastic gradient descent and allows for conditional computation in a principled way. We perform various experiments to demonstrate the effectiveness of the resulting approach and regularizer. |
Tasks | Model Selection |
Published | 2017-12-04 |
URL | http://arxiv.org/abs/1712.01312v2 |
http://arxiv.org/pdf/1712.01312v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-sparse-neural-networks-through-l_0 |
Repo | https://github.com/bryankim96/stux-DNN |
Framework | tf |
Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks
Title | Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks |
Authors | Marwin H. S. Segler, Thierry Kogej, Christian Tyrchan, Mark P. Waller |
Abstract | In de novo drug design, computational strategies are used to generate novel molecules with good affinity to the desired biological target. In this work, we show that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated molecules correlate very well with the properties of the molecules used to train the model. In order to enrich libraries with molecules active towards a given biological target, we propose to fine-tune the model with small sets of molecules, which are known to be active against that target. Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test molecules that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria) it reproduced 28% of 1240 test molecules. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery. |
Tasks | Drug Discovery |
Published | 2017-01-05 |
URL | http://arxiv.org/abs/1701.01329v1 |
http://arxiv.org/pdf/1701.01329v1.pdf | |
PWC | https://paperswithcode.com/paper/generating-focussed-molecule-libraries-for |
Repo | https://github.com/benevolentAI/guacamol_baselines |
Framework | pytorch |
Spatial Memory for Context Reasoning in Object Detection
Title | Spatial Memory for Context Reasoning in Object Detection |
Authors | Xinlei Chen, Abhinav Gupta |
Abstract | Modeling instance-level context and object-object relationships is extremely challenging. It requires reasoning about bounding boxes of different classes, locations \etc. Above all, instance-level spatial reasoning inherently requires modeling conditional distributions on previous detections. Unfortunately, our current object detection systems do not have any {\bf memory} to remember what to condition on! The state-of-the-art object detectors still detect all object in parallel followed by non-maximal suppression (NMS). While memory has been used for tasks such as captioning, they mostly use image-level memory cells without capturing the spatial layout. On the other hand, modeling object-object relationships requires {\bf spatial} reasoning – not only do we need a memory to store the spatial layout, but also a effective reasoning module to extract spatial patterns. This paper presents a conceptually simple yet powerful solution – Spatial Memory Network (SMN), to model the instance-level context efficiently and effectively. Our spatial memory essentially assembles object instances back into a pseudo “image” representation that is easy to be fed into another ConvNet for object-object context reasoning. This leads to a new sequential reasoning architecture where image and memory are processed in parallel to obtain detections which update the memory again. We show our SMN direction is promising as it provides 2.2% improvement over baseline Faster RCNN on the COCO dataset so far. |
Tasks | Object Detection |
Published | 2017-04-13 |
URL | http://arxiv.org/abs/1704.04224v1 |
http://arxiv.org/pdf/1704.04224v1.pdf | |
PWC | https://paperswithcode.com/paper/spatial-memory-for-context-reasoning-in |
Repo | https://github.com/daxiapazi/faster-rcnn |
Framework | tf |
Stochastic Conjugate Gradient Algorithm with Variance Reduction
Title | Stochastic Conjugate Gradient Algorithm with Variance Reduction |
Authors | Xiao-Bo Jin, Xu-Yao Zhang, Kaizhu Huang, Guang-Gang Geng |
Abstract | Conjugate gradient (CG) methods are a class of important methods for solving linear equations and nonlinear optimization problems. In this paper, we propose a new stochastic CG algorithm with variance reduction and we prove its linear convergence with the Fletcher and Reeves method for strongly convex and smooth functions. We experimentally demonstrate that the CG with variance reduction algorithm converges faster than its counterparts for four learning models, which may be convex, nonconvex or nonsmooth. In addition, its area under the curve performance on six large-scale data sets is comparable to that of the LIBLINEAR solver for the L2-regularized L2-loss but with a significant improvement in computational efficiency |
Tasks | |
Published | 2017-10-27 |
URL | http://arxiv.org/abs/1710.09979v2 |
http://arxiv.org/pdf/1710.09979v2.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-conjugate-gradient-algorithm-with |
Repo | https://github.com/xbjin/cgvr |
Framework | none |
Sobolev GAN
Title | Sobolev GAN |
Authors | Youssef Mroueh, Chun-Liang Li, Tom Sercu, Anant Raj, Yu Cheng |
Abstract | We propose a new Integral Probability Metric (IPM) between distributions: the Sobolev IPM. The Sobolev IPM compares the mean discrepancy of two distributions for functions (critic) restricted to a Sobolev ball defined with respect to a dominant measure $\mu$. We show that the Sobolev IPM compares two distributions in high dimensions based on weighted conditional Cumulative Distribution Functions (CDF) of each coordinate on a leave one out basis. The Dominant measure $\mu$ plays a crucial role as it defines the support on which conditional CDFs are compared. Sobolev IPM can be seen as an extension of the one dimensional Von-Mises Cram'er statistics to high dimensional distributions. We show how Sobolev IPM can be used to train Generative Adversarial Networks (GANs). We then exploit the intrinsic conditioning implied by Sobolev IPM in text generation. Finally we show that a variant of Sobolev GAN achieves competitive results in semi-supervised learning on CIFAR-10, thanks to the smoothness enforced on the critic by Sobolev GAN which relates to Laplacian regularization. |
Tasks | Text Generation |
Published | 2017-11-14 |
URL | http://arxiv.org/abs/1711.04894v1 |
http://arxiv.org/pdf/1711.04894v1.pdf | |
PWC | https://paperswithcode.com/paper/sobolev-gan |
Repo | https://github.com/chanshing/sobolev_gan |
Framework | pytorch |
Neural End-to-End Learning for Computational Argumentation Mining
Title | Neural End-to-End Learning for Computational Argumentation Mining |
Authors | Steffen Eger, Johannes Daxenberger, Iryna Gurevych |
Abstract | We investigate neural techniques for end-to-end computational argumentation mining (AM). We frame AM both as a token-based dependency parsing and as a token-based sequence tagging problem, including a multi-task learning setup. Contrary to models that operate on the argument component level, we find that framing AM as dependency parsing leads to subpar performance results. In contrast, less complex (local) tagging models based on BiLSTMs perform robustly across classification scenarios, being able to catch long-range dependencies inherent to the AM problem. Moreover, we find that jointly learning ‘natural’ subtasks, in a multi-task learning setup, improves performance. |
Tasks | Dependency Parsing, Multi-Task Learning |
Published | 2017-04-20 |
URL | http://arxiv.org/abs/1704.06104v2 |
http://arxiv.org/pdf/1704.06104v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-end-to-end-learning-for-computational |
Repo | https://github.com/UKPLab/acl2017-neural_end2end_AM |
Framework | none |
Online algorithms for POMDPs with continuous state, action, and observation spaces
Title | Online algorithms for POMDPs with continuous state, action, and observation spaces |
Authors | Zachary Sunberg, Mykel Kochenderfer |
Abstract | Online solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is suboptimal regardless of the computation time. This paper proposes and evaluates two new algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using weighted particle filtering. Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail. |
Tasks | |
Published | 2017-09-18 |
URL | http://arxiv.org/abs/1709.06196v6 |
http://arxiv.org/pdf/1709.06196v6.pdf | |
PWC | https://paperswithcode.com/paper/online-algorithms-for-pomdps-with-continuous |
Repo | https://github.com/JuliaPOMDP/POMCPOW.jl |
Framework | none |
IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models
Title | IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models |
Authors | Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, Dell Zhang |
Abstract | This paper provides a unified account of two schools of thinking in information retrieval modelling: the generative retrieval focusing on predicting relevant documents given a query, and the discriminative retrieval focusing on predicting relevancy given a query-document pair. We propose a game theoretical minimax game to iteratively optimise both models. On one hand, the discriminative model, aiming to mine signals from labelled and unlabelled data, provides guidance to train the generative model towards fitting the underlying relevance distribution over documents given the query. On the other hand, the generative model, acting as an attacker to the current discriminative model, generates difficult examples for the discriminative model in an adversarial way by minimising its discrimination objective. With the competition between these two models, we show that the unified framework takes advantage of both schools of thinking: (i) the generative model learns to fit the relevance distribution over documents via the signals from the discriminative model, and (ii) the discriminative model is able to exploit the unlabelled data selected by the generative model to achieve a better estimation for document ranking. Our experimental results have demonstrated significant performance gains as much as 23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of applications including web search, item recommendation, and question answering. |
Tasks | Ad-Hoc Information Retrieval, Document Ranking, Information Retrieval, Question Answering |
Published | 2017-05-30 |
URL | http://arxiv.org/abs/1705.10513v2 |
http://arxiv.org/pdf/1705.10513v2.pdf | |
PWC | https://paperswithcode.com/paper/irgan-a-minimax-game-for-unifying-generative |
Repo | https://github.com/geek-ai/irgan |
Framework | tf |
OBTAIN: Real-Time Beat Tracking in Audio Signals
Title | OBTAIN: Real-Time Beat Tracking in Audio Signals |
Authors | Ali Mottaghi, Kayhan Behdin, Ashkan Esmaeili, Mohammadreza Heydari, Farokh Marvasti |
Abstract | In this paper, we design a system in order to perform the real-time beat tracking for an audio signal. We use Onset Strength Signal (OSS) to detect the onsets and estimate the tempos. Then, we form Cumulative Beat Strength Signal (CBSS) by taking advantage of OSS and estimated tempos. Next, we perform peak detection by extracting the periodic sequence of beats among all CBSS peaks. In simulations, we can see that our proposed algorithm, Online Beat TrAckINg (OBTAIN), outperforms state-of-art results in terms of prediction accuracy while maintaining comparable and practical computational complexity. The real-time performance is tractable visually as illustrated in the simulations. |
Tasks | |
Published | 2017-04-07 |
URL | http://arxiv.org/abs/1704.02216v2 |
http://arxiv.org/pdf/1704.02216v2.pdf | |
PWC | https://paperswithcode.com/paper/obtain-real-time-beat-tracking-in-audio |
Repo | https://github.com/michaelkrzyzaniak/Beat-and-Tempo-Tracking |
Framework | none |
Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods
Title | Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods |
Authors | Sumeet Singh, Jonathan Lacotte, Anirudha Majumdar, Marco Pavone |
Abstract | The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take actions in order to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive IRL in order to explicitly account for a human’s risk sensitivity. To this end, we propose a flexible class of models based on coherent risk measures, which allow us to capture an entire spectrum of risk preferences from risk-neutral to worst-case. We propose efficient non-parametric algorithms based on linear programming and semi-parametric algorithms based on maximum likelihood for inferring a human’s underlying risk measure and cost function for a rich class of static and dynamic decision-making settings. The resulting approach is demonstrated on a simulated driving game with ten human participants. Our method is able to infer and mimic a wide range of qualitatively different driving styles from highly risk-averse to risk-neutral in a data-efficient manner. Moreover, comparisons of the Risk-Sensitive (RS) IRL approach with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively, especially in scenarios where catastrophic outcomes such as collisions can occur. |
Tasks | Decision Making |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10055v2 |
http://arxiv.org/pdf/1711.10055v2.pdf | |
PWC | https://paperswithcode.com/paper/risk-sensitive-inverse-reinforcement-learning |
Repo | https://github.com/StanfordASL/RSIRL |
Framework | none |
Tags2Parts: Discovering Semantic Regions from Shape Tags
Title | Tags2Parts: Discovering Semantic Regions from Shape Tags |
Authors | Sanjeev Muralikrishnan, Vladimir G. Kim, Siddhartha Chaudhuri |
Abstract | We propose a novel method for discovering shape regions that strongly correlate with user-prescribed tags. For example, given a collection of chairs tagged as either “has armrest” or “lacks armrest”, our system correctly highlights the armrest regions as the main distinctive parts between the two chair types. To obtain point-wise predictions from shape-wise tags we develop a novel neural network architecture that is trained with tag classification loss, but is designed to rely on segmentation to predict the tag. Our network is inspired by U-Net, but we replicate shallow U structures several times with new skip connections and pooling layers, and call the resulting architecture “WU-Net”. We test our method on segmentation benchmarks and show that even with weak supervision of whole shape tags, our method can infer meaningful semantic regions, without ever observing shape segmentations. Further, once trained, the model can process shapes for which the tag is entirely unknown. As a bonus, our architecture is directly operational under full supervision and performs strongly on standard benchmarks. We validate our method through experiments with many variant architectures and prior baselines, and demonstrate several applications. |
Tasks | |
Published | 2017-08-22 |
URL | http://arxiv.org/abs/1708.06673v3 |
http://arxiv.org/pdf/1708.06673v3.pdf | |
PWC | https://paperswithcode.com/paper/tags2parts-discovering-semantic-regions-from |
Repo | https://github.com/sanjeevmk/Tags2Parts |
Framework | tf |
Discovering Political Topics in Facebook Discussion threads with Graph Contextualization
Title | Discovering Political Topics in Facebook Discussion threads with Graph Contextualization |
Authors | Yilin Zhang, Marie Poux-Berthe, Chris Wells, Karolina Koc-Michalska, Karl Rohe |
Abstract | We propose a graph contextualization method, pairGraphText, to study political engagement on Facebook during the 2012 French presidential election. It is a spectral algorithm that contextualizes graph data with text data for online discussion thread. In particular, we examine the Facebook posts of the eight leading candidates and the comments beneath these posts. We find evidence of both (i) candidate-centered structure, where citizens primarily comment on the wall of one candidate and (ii) issue-centered structure (i.e. on political topics), where citizens’ attention and expression is primarily directed towards a specific set of issues (e.g. economics, immigration, etc). To identify issue-centered structure, we develop pairGraphText, to analyze a network with high-dimensional features on the interactions (i.e. text). This technique scales to hundreds of thousands of nodes and thousands of unique words. In the Facebook data, spectral clustering without the contextualizing text information finds a mixture of (i) candidate and (ii) issue clusters. The contextualized information with text data helps to separate these two structures. We conclude by showing that the novel methodology is consistent under a statistical model. |
Tasks | |
Published | 2017-08-23 |
URL | http://arxiv.org/abs/1708.06872v3 |
http://arxiv.org/pdf/1708.06872v3.pdf | |
PWC | https://paperswithcode.com/paper/discovering-political-topics-in-facebook |
Repo | https://github.com/yzhang672/Spectral-Contextualization |
Framework | none |