July 29, 2019

3000 words 15 mins read

Paper Group AWR 138

Robust Multilingual Named Entity Recognition with Shallow Semi-Supervised Features. Poisson–Gamma Dynamical Systems. An Implementation of Faster RCNN with Study for Region Sampling. Learning Sparse Neural Networks through $L_0$ Regularization. Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks. Spatial Memory …

Robust Multilingual Named Entity Recognition with Shallow Semi-Supervised Features


Title	Robust Multilingual Named Entity Recognition with Shallow Semi-Supervised Features
Authors	Rodrigo Agerri, German Rigau
Abstract	We present a multilingual Named Entity Recognition approach based on a robust and general set of features across languages and datasets. Our system combines shallow local information with clustering semi-supervised features induced on large amounts of unlabeled text. Understanding via empirical experimentation how to effectively combine various types of clustering features allows us to seamlessly export our system to other datasets and languages. The result is a simple but highly competitive system which obtains state of the art results across five languages and twelve datasets. The results are reported on standard shared task evaluation data such as CoNLL for English, Spanish and Dutch. Furthermore, and despite the lack of linguistically motivated features, we also report best results for languages such as Basque and German. In addition, we demonstrate that our method also obtains very competitive results even when the amount of supervised data is cut by half, alleviating the dependency on manually annotated data. Finally, the results show that our emphasis on clustering features is crucial to develop robust out-of-domain models. The system and models are freely available to facilitate its use and guarantee the reproducibility of results.
Tasks	Named Entity Recognition
Published	2017-01-31
URL	http://arxiv.org/abs/1701.09123v1
PDF	http://arxiv.org/pdf/1701.09123v1.pdf
PWC	https://paperswithcode.com/paper/robust-multilingual-named-entity-recognition
Repo	https://github.com/ixa-ehu/ixa-pipe-nerc
Framework	none

Poisson–Gamma Dynamical Systems


Title	Poisson–Gamma Dynamical Systems
Authors	Aaron Schein, Mingyuan Zhou, Hanna Wallach
Abstract	We introduce a new dynamical system for sequentially observed multivariate count data. This model is based on the gamma–Poisson construction—a natural choice for count data—and relies on a novel Bayesian nonparametric prior that ties and shrinks the model parameters, thus avoiding overfitting. We present an efficient MCMC inference algorithm that advances recent work on augmentation schemes for inference in negative binomial models. Finally, we demonstrate the model’s inductive bias using a variety of real-world data sets, showing that it exhibits superior predictive performance over other models and infers highly interpretable latent structure.
Tasks
Published	2017-01-19
URL	http://arxiv.org/abs/1701.05573v1
PDF	http://arxiv.org/pdf/1701.05573v1.pdf
PWC	https://paperswithcode.com/paper/poisson-gamma-dynamical-systems
Repo	https://github.com/aschein/pgds
Framework	none

An Implementation of Faster RCNN with Study for Region Sampling


Title	An Implementation of Faster RCNN with Study for Region Sampling
Authors	Xinlei Chen, Abhinav Gupta
Abstract	We adapted the join-training scheme of Faster RCNN framework from Caffe to TensorFlow as a baseline implementation for object detection. Our code is made publicly available. This report documents the simplifications made to the original pipeline, with justifications from ablation analysis on both PASCAL VOC 2007 and COCO 2014. We further investigated the role of non-maximal suppression (NMS) in selecting regions-of-interest (RoIs) for region classification, and found that a biased sampling toward small regions helps performance and can achieve on-par mAP to NMS-based sampling when converged sufficiently.
Tasks	Object Detection
Published	2017-02-07
URL	http://arxiv.org/abs/1702.02138v2
PDF	http://arxiv.org/pdf/1702.02138v2.pdf
PWC	https://paperswithcode.com/paper/an-implementation-of-faster-rcnn-with-study
Repo	https://github.com/PengchengAi/tf-faster-rcnn-pcai
Framework	tf

Learning Sparse Neural Networks through $L_0$ Regularization


Title	Learning Sparse Neural Networks through $L_0$ Regularization
Authors	Christos Louizos, Max Welling, Diederik P. Kingma
Abstract	We propose a practical method for $L_0$ norm regularization for neural networks: pruning the network during training by encouraging weights to become exactly zero. Such regularization is interesting since (1) it can greatly speed up training and inference, and (2) it can improve generalization. AIC and BIC, well-known model selection criteria, are special cases of $L_0$ regularization. However, since the $L_0$ norm of weights is non-differentiable, we cannot incorporate it directly as a regularization term in the objective function. We propose a solution through the inclusion of a collection of non-negative stochastic gates, which collectively determine which weights to set to zero. We show that, somewhat surprisingly, for certain distributions over the gates, the expected $L_0$ norm of the resulting gated weights is differentiable with respect to the distribution parameters. We further propose the \emph{hard concrete} distribution for the gates, which is obtained by “stretching” a binary concrete distribution and then transforming its samples with a hard-sigmoid. The parameters of the distribution over the gates can then be jointly optimized with the original network parameters. As a result our method allows for straightforward and efficient learning of model structures with stochastic gradient descent and allows for conditional computation in a principled way. We perform various experiments to demonstrate the effectiveness of the resulting approach and regularizer.
Tasks	Model Selection
Published	2017-12-04
URL	http://arxiv.org/abs/1712.01312v2
PDF	http://arxiv.org/pdf/1712.01312v2.pdf
PWC	https://paperswithcode.com/paper/learning-sparse-neural-networks-through-l_0
Repo	https://github.com/bryankim96/stux-DNN
Framework	tf

Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks


Title	Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks
Authors	Marwin H. S. Segler, Thierry Kogej, Christian Tyrchan, Mark P. Waller
Abstract	In de novo drug design, computational strategies are used to generate novel molecules with good affinity to the desired biological target. In this work, we show that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated molecules correlate very well with the properties of the molecules used to train the model. In order to enrich libraries with molecules active towards a given biological target, we propose to fine-tune the model with small sets of molecules, which are known to be active against that target. Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test molecules that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria) it reproduced 28% of 1240 test molecules. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery.
Tasks	Drug Discovery
Published	2017-01-05
URL	http://arxiv.org/abs/1701.01329v1
PDF	http://arxiv.org/pdf/1701.01329v1.pdf
PWC	https://paperswithcode.com/paper/generating-focussed-molecule-libraries-for
Repo	https://github.com/benevolentAI/guacamol_baselines
Framework	pytorch

Spatial Memory for Context Reasoning in Object Detection


Title	Spatial Memory for Context Reasoning in Object Detection
Authors	Xinlei Chen, Abhinav Gupta
Abstract	Modeling instance-level context and object-object relationships is extremely challenging. It requires reasoning about bounding boxes of different classes, locations \etc. Above all, instance-level spatial reasoning inherently requires modeling conditional distributions on previous detections. Unfortunately, our current object detection systems do not have any {\bf memory} to remember what to condition on! The state-of-the-art object detectors still detect all object in parallel followed by non-maximal suppression (NMS). While memory has been used for tasks such as captioning, they mostly use image-level memory cells without capturing the spatial layout. On the other hand, modeling object-object relationships requires {\bf spatial} reasoning – not only do we need a memory to store the spatial layout, but also a effective reasoning module to extract spatial patterns. This paper presents a conceptually simple yet powerful solution – Spatial Memory Network (SMN), to model the instance-level context efficiently and effectively. Our spatial memory essentially assembles object instances back into a pseudo “image” representation that is easy to be fed into another ConvNet for object-object context reasoning. This leads to a new sequential reasoning architecture where image and memory are processed in parallel to obtain detections which update the memory again. We show our SMN direction is promising as it provides 2.2% improvement over baseline Faster RCNN on the COCO dataset so far.
Tasks	Object Detection
Published	2017-04-13
URL	http://arxiv.org/abs/1704.04224v1
PDF	http://arxiv.org/pdf/1704.04224v1.pdf
PWC	https://paperswithcode.com/paper/spatial-memory-for-context-reasoning-in
Repo	https://github.com/daxiapazi/faster-rcnn
Framework	tf

Stochastic Conjugate Gradient Algorithm with Variance Reduction


Title	Stochastic Conjugate Gradient Algorithm with Variance Reduction
Authors	Xiao-Bo Jin, Xu-Yao Zhang, Kaizhu Huang, Guang-Gang Geng
Abstract	Conjugate gradient (CG) methods are a class of important methods for solving linear equations and nonlinear optimization problems. In this paper, we propose a new stochastic CG algorithm with variance reduction and we prove its linear convergence with the Fletcher and Reeves method for strongly convex and smooth functions. We experimentally demonstrate that the CG with variance reduction algorithm converges faster than its counterparts for four learning models, which may be convex, nonconvex or nonsmooth. In addition, its area under the curve performance on six large-scale data sets is comparable to that of the LIBLINEAR solver for the L2-regularized L2-loss but with a significant improvement in computational efficiency
Tasks
Published	2017-10-27
URL	http://arxiv.org/abs/1710.09979v2
PDF	http://arxiv.org/pdf/1710.09979v2.pdf
PWC	https://paperswithcode.com/paper/stochastic-conjugate-gradient-algorithm-with
Repo	https://github.com/xbjin/cgvr
Framework	none

Sobolev GAN


Title	Sobolev GAN
Authors	Youssef Mroueh, Chun-Liang Li, Tom Sercu, Anant Raj, Yu Cheng
Abstract	We propose a new Integral Probability Metric (IPM) between distributions: the Sobolev IPM. The Sobolev IPM compares the mean discrepancy of two distributions for functions (critic) restricted to a Sobolev ball defined with respect to a dominant measure $\mu$. We show that the Sobolev IPM compares two distributions in high dimensions based on weighted conditional Cumulative Distribution Functions (CDF) of each coordinate on a leave one out basis. The Dominant measure $\mu$ plays a crucial role as it defines the support on which conditional CDFs are compared. Sobolev IPM can be seen as an extension of the one dimensional Von-Mises Cram'er statistics to high dimensional distributions. We show how Sobolev IPM can be used to train Generative Adversarial Networks (GANs). We then exploit the intrinsic conditioning implied by Sobolev IPM in text generation. Finally we show that a variant of Sobolev GAN achieves competitive results in semi-supervised learning on CIFAR-10, thanks to the smoothness enforced on the critic by Sobolev GAN which relates to Laplacian regularization.
Tasks	Text Generation
Published	2017-11-14
URL	http://arxiv.org/abs/1711.04894v1
PDF	http://arxiv.org/pdf/1711.04894v1.pdf
PWC	https://paperswithcode.com/paper/sobolev-gan
Repo	https://github.com/chanshing/sobolev_gan
Framework	pytorch

Neural End-to-End Learning for Computational Argumentation Mining


Title	Neural End-to-End Learning for Computational Argumentation Mining
Authors	Steffen Eger, Johannes Daxenberger, Iryna Gurevych
Abstract	We investigate neural techniques for end-to-end computational argumentation mining (AM). We frame AM both as a token-based dependency parsing and as a token-based sequence tagging problem, including a multi-task learning setup. Contrary to models that operate on the argument component level, we find that framing AM as dependency parsing leads to subpar performance results. In contrast, less complex (local) tagging models based on BiLSTMs perform robustly across classification scenarios, being able to catch long-range dependencies inherent to the AM problem. Moreover, we find that jointly learning ‘natural’ subtasks, in a multi-task learning setup, improves performance.
Tasks	Dependency Parsing, Multi-Task Learning
Published	2017-04-20
URL	http://arxiv.org/abs/1704.06104v2
PDF	http://arxiv.org/pdf/1704.06104v2.pdf
PWC	https://paperswithcode.com/paper/neural-end-to-end-learning-for-computational
Repo	https://github.com/UKPLab/acl2017-neural_end2end_AM
Framework	none

Online algorithms for POMDPs with continuous state, action, and observation spaces


Title	Online algorithms for POMDPs with continuous state, action, and observation spaces
Authors	Zachary Sunberg, Mykel Kochenderfer
Abstract	Online solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is suboptimal regardless of the computation time. This paper proposes and evaluates two new algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using weighted particle filtering. Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail.
Tasks
Published	2017-09-18
URL	http://arxiv.org/abs/1709.06196v6
PDF	http://arxiv.org/pdf/1709.06196v6.pdf
PWC	https://paperswithcode.com/paper/online-algorithms-for-pomdps-with-continuous
Repo	https://github.com/JuliaPOMDP/POMCPOW.jl
Framework	none

IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models


Title	IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models
Authors	Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang, Peng Zhang, Dell Zhang
Abstract	This paper provides a unified account of two schools of thinking in information retrieval modelling: the generative retrieval focusing on predicting relevant documents given a query, and the discriminative retrieval focusing on predicting relevancy given a query-document pair. We propose a game theoretical minimax game to iteratively optimise both models. On one hand, the discriminative model, aiming to mine signals from labelled and unlabelled data, provides guidance to train the generative model towards fitting the underlying relevance distribution over documents given the query. On the other hand, the generative model, acting as an attacker to the current discriminative model, generates difficult examples for the discriminative model in an adversarial way by minimising its discrimination objective. With the competition between these two models, we show that the unified framework takes advantage of both schools of thinking: (i) the generative model learns to fit the relevance distribution over documents via the signals from the discriminative model, and (ii) the discriminative model is able to exploit the unlabelled data selected by the generative model to achieve a better estimation for document ranking. Our experimental results have demonstrated significant performance gains as much as 23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of applications including web search, item recommendation, and question answering.
Tasks	Ad-Hoc Information Retrieval, Document Ranking, Information Retrieval, Question Answering
Published	2017-05-30
URL	http://arxiv.org/abs/1705.10513v2
PDF	http://arxiv.org/pdf/1705.10513v2.pdf
PWC	https://paperswithcode.com/paper/irgan-a-minimax-game-for-unifying-generative
Repo	https://github.com/geek-ai/irgan
Framework	tf

OBTAIN: Real-Time Beat Tracking in Audio Signals


Title	OBTAIN: Real-Time Beat Tracking in Audio Signals
Authors	Ali Mottaghi, Kayhan Behdin, Ashkan Esmaeili, Mohammadreza Heydari, Farokh Marvasti
Abstract	In this paper, we design a system in order to perform the real-time beat tracking for an audio signal. We use Onset Strength Signal (OSS) to detect the onsets and estimate the tempos. Then, we form Cumulative Beat Strength Signal (CBSS) by taking advantage of OSS and estimated tempos. Next, we perform peak detection by extracting the periodic sequence of beats among all CBSS peaks. In simulations, we can see that our proposed algorithm, Online Beat TrAckINg (OBTAIN), outperforms state-of-art results in terms of prediction accuracy while maintaining comparable and practical computational complexity. The real-time performance is tractable visually as illustrated in the simulations.
Tasks
Published	2017-04-07
URL	http://arxiv.org/abs/1704.02216v2
PDF	http://arxiv.org/pdf/1704.02216v2.pdf
PWC	https://paperswithcode.com/paper/obtain-real-time-beat-tracking-in-audio
Repo	https://github.com/michaelkrzyzaniak/Beat-and-Tempo-Tracking
Framework	none

Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods


Title	Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods
Authors	Sumeet Singh, Jonathan Lacotte, Anirudha Majumdar, Marco Pavone
Abstract	The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take actions in order to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive IRL in order to explicitly account for a human’s risk sensitivity. To this end, we propose a flexible class of models based on coherent risk measures, which allow us to capture an entire spectrum of risk preferences from risk-neutral to worst-case. We propose efficient non-parametric algorithms based on linear programming and semi-parametric algorithms based on maximum likelihood for inferring a human’s underlying risk measure and cost function for a rich class of static and dynamic decision-making settings. The resulting approach is demonstrated on a simulated driving game with ten human participants. Our method is able to infer and mimic a wide range of qualitatively different driving styles from highly risk-averse to risk-neutral in a data-efficient manner. Moreover, comparisons of the Risk-Sensitive (RS) IRL approach with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively, especially in scenarios where catastrophic outcomes such as collisions can occur.
Tasks	Decision Making
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10055v2
PDF	http://arxiv.org/pdf/1711.10055v2.pdf
PWC	https://paperswithcode.com/paper/risk-sensitive-inverse-reinforcement-learning
Repo	https://github.com/StanfordASL/RSIRL
Framework	none

Tags2Parts: Discovering Semantic Regions from Shape Tags


Title	Tags2Parts: Discovering Semantic Regions from Shape Tags
Authors	Sanjeev Muralikrishnan, Vladimir G. Kim, Siddhartha Chaudhuri
Abstract	We propose a novel method for discovering shape regions that strongly correlate with user-prescribed tags. For example, given a collection of chairs tagged as either “has armrest” or “lacks armrest”, our system correctly highlights the armrest regions as the main distinctive parts between the two chair types. To obtain point-wise predictions from shape-wise tags we develop a novel neural network architecture that is trained with tag classification loss, but is designed to rely on segmentation to predict the tag. Our network is inspired by U-Net, but we replicate shallow U structures several times with new skip connections and pooling layers, and call the resulting architecture “WU-Net”. We test our method on segmentation benchmarks and show that even with weak supervision of whole shape tags, our method can infer meaningful semantic regions, without ever observing shape segmentations. Further, once trained, the model can process shapes for which the tag is entirely unknown. As a bonus, our architecture is directly operational under full supervision and performs strongly on standard benchmarks. We validate our method through experiments with many variant architectures and prior baselines, and demonstrate several applications.
Tasks
Published	2017-08-22
URL	http://arxiv.org/abs/1708.06673v3
PDF	http://arxiv.org/pdf/1708.06673v3.pdf
PWC	https://paperswithcode.com/paper/tags2parts-discovering-semantic-regions-from
Repo	https://github.com/sanjeevmk/Tags2Parts
Framework	tf

Discovering Political Topics in Facebook Discussion threads with Graph Contextualization


Title	Discovering Political Topics in Facebook Discussion threads with Graph Contextualization
Authors	Yilin Zhang, Marie Poux-Berthe, Chris Wells, Karolina Koc-Michalska, Karl Rohe
Abstract	We propose a graph contextualization method, pairGraphText, to study political engagement on Facebook during the 2012 French presidential election. It is a spectral algorithm that contextualizes graph data with text data for online discussion thread. In particular, we examine the Facebook posts of the eight leading candidates and the comments beneath these posts. We find evidence of both (i) candidate-centered structure, where citizens primarily comment on the wall of one candidate and (ii) issue-centered structure (i.e. on political topics), where citizens’ attention and expression is primarily directed towards a specific set of issues (e.g. economics, immigration, etc). To identify issue-centered structure, we develop pairGraphText, to analyze a network with high-dimensional features on the interactions (i.e. text). This technique scales to hundreds of thousands of nodes and thousands of unique words. In the Facebook data, spectral clustering without the contextualizing text information finds a mixture of (i) candidate and (ii) issue clusters. The contextualized information with text data helps to separate these two structures. We conclude by showing that the novel methodology is consistent under a statistical model.
Tasks
Published	2017-08-23
URL	http://arxiv.org/abs/1708.06872v3
PDF	http://arxiv.org/pdf/1708.06872v3.pdf
PWC	https://paperswithcode.com/paper/discovering-political-topics-in-facebook
Repo	https://github.com/yzhang672/Spectral-Contextualization
Framework	none