January 25, 2020

3179 words 15 mins read

Paper Group NAWR 11

Paper Group NAWR 11

k-Means Clustering of Lines for Big Data. Revisiting NMT for Normalization of Early English Letters. Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with Dirichlet calibration. MedNorm: A Corpus and Embeddings for Cross-terminology Medical Concept Normalisation. Frame-Consistent Recurrent Video Deraining With Dual-Le …

k-Means Clustering of Lines for Big Data

Title k-Means Clustering of Lines for Big Data
Authors Yair Marom, Dan Feldman
Abstract The input to the \emph{$k$-mean for lines} problem is a set $L$ of $n$ lines in $\mathbb{R}^d$, and the goal is to compute a set of $k$ centers (points) in $\mathbb{R}^d$ that minimizes the sum of squared distances over every line in $L$ and its nearest center. This is a straightforward generalization of the $k$-mean problem where the input is a set of $n$ points instead of lines. We suggest the first PTAS that computes a $(1+\epsilon)$-approximation to this problem in time $O(n \log n)$ for any constant approximation error $\epsilon \in (0, 1)$, and constant integers $k, d \geq 1$. This is by proving that there is always a weighted subset (called coreset) of $dk^{O(k)}\log (n)/\epsilon^2$ lines in $L$ that approximates the sum of squared distances from $L$ to \emph{any} given set of $k$ points. Using traditional merge-and-reduce technique, this coreset implies results for a streaming set (possibly infinite) of lines to $M$ machines in one pass (e.g. cloud) using memory, update time and communication that is near-logarithmic in $n$, as well as deletion of any line but using linear space. These results generalized for other distance functions such as $k$-median (sum of distances) or ignoring farthest $m$ lines from the given centers to handle outliers. Experimental results on 10 machines on Amazon EC2 cloud show that the algorithm performs well in practice. Open source code for all the algorithms and experiments is also provided.
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/9442-k-means-clustering-of-lines-for-big-data
PDF http://papers.nips.cc/paper/9442-k-means-clustering-of-lines-for-big-data.pdf
PWC https://paperswithcode.com/paper/k-means-clustering-of-lines-for-big-data
Repo https://github.com/YairMarom/k_lines_means
Framework none

Revisiting NMT for Normalization of Early English Letters

Title Revisiting NMT for Normalization of Early English Letters
Authors Mika H{"a}m{"a}l{"a}inen, Tanja S{"a}ily, Jack Rueter, J{"o}rg Tiedemann, Eetu M{"a}kel{"a}
Abstract This paper studies the use of NMT (neural machine translation) as a normalization method for an early English letter corpus. The corpus has previously been normalized so that only less frequent deviant forms are left out without normalization. This paper discusses different methods for improving the normalization of these deviant forms by using different approaches. Adding features to the training data is found to be unhelpful, but using a lexicographical resource to filter the top candidates produced by the NMT model together with lemmatization improves results.
Tasks Lemmatization, Machine Translation
Published 2019-06-01
URL https://www.aclweb.org/anthology/W19-2509/
PDF https://www.aclweb.org/anthology/W19-2509
PWC https://paperswithcode.com/paper/revisiting-nmt-for-normalization-of-early
Repo https://github.com/mikahama/natas
Framework none

Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with Dirichlet calibration

Title Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with Dirichlet calibration
Authors Meelis Kull, Miquel Perello Nieto, Markus Kängsepp, Telmo Silva Filho, Hao Song, Peter Flach
Abstract Class probabilities predicted by most multiclass classifiers are uncalibrated, often tending towards over-confidence. With neural networks, calibration can be improved by temperature scaling, a method to learn a single corrective multiplicative factor for inputs to the last softmax layer. On non-neural models the existing methods apply binary calibration in a pairwise or one-vs-rest fashion. We propose a natively multiclass calibration method applicable to classifiers from any model class, derived from Dirichlet distributions and generalising the beta calibration method from binary classification. It is easily implemented with neural nets since it is equivalent to log-transforming the uncalibrated probabilities, followed by one linear layer and softmax. Experiments demonstrate improved probabilistic predictions according to multiple measures (confidence-ECE, classwise-ECE, log-loss, Brier score) across a wide range of datasets and classifiers. Parameters of the learned Dirichlet calibration map provide insights to the biases in the uncalibrated model.
Tasks Calibration
Published 2019-12-01
URL http://papers.nips.cc/paper/9397-beyond-temperature-scaling-obtaining-well-calibrated-multi-class-probabilities-with-dirichlet-calibration
PDF http://papers.nips.cc/paper/9397-beyond-temperature-scaling-obtaining-well-calibrated-multi-class-probabilities-with-dirichlet-calibration.pdf
PWC https://paperswithcode.com/paper/beyond-temperature-scaling-obtaining-well-1
Repo https://github.com/dirichletcal/dirichletcal.github.io
Framework none

MedNorm: A Corpus and Embeddings for Cross-terminology Medical Concept Normalisation

Title MedNorm: A Corpus and Embeddings for Cross-terminology Medical Concept Normalisation
Authors Maksim Belousov, William G. Dixon, Goran Nenadic
Abstract The medical concept normalisation task aims to map textual descriptions to standard terminologies such as SNOMED-CT or MedDRA. Existing publicly available datasets annotated using different terminologies cannot be simply merged and utilised, and therefore become less valuable when developing machine learning-based concept normalisation systems. To address that, we designed a data harmonisation pipeline and engineered a corpus of 27,979 textual descriptions simultaneously mapped to both MedDRA and SNOMED-CT, sourced from five publicly available datasets across biomedical and social media domains. The pipeline can be used in the future to integrate new datasets into the corpus and also could be applied in relevant data curation tasks. We also described a method to merge different terminologies into a single concept graph preserving their relations and demonstrated that representation learning approach based on random walks on a graph can efficiently encode both hierarchical and equivalent relations and capture semantic similarities not only between concepts inside a given terminology but also between concepts from different terminologies. We believe that making a corpus and embeddings for cross-terminology medical concept normalisation available to the research community would contribute to a better understanding of the task.
Tasks Representation Learning
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-3204/
PDF https://www.aclweb.org/anthology/W19-3204
PWC https://paperswithcode.com/paper/mednorm-a-corpus-and-embeddings-for-cross
Repo https://github.com/mbelousov/MedNorm-corpus
Framework none

Frame-Consistent Recurrent Video Deraining With Dual-Level Flow

Title Frame-Consistent Recurrent Video Deraining With Dual-Level Flow
Authors Wenhan Yang, Jiaying Liu, Jiashi Feng
Abstract In this paper, we address the problem of rain removal from videos by proposing a more comprehensive framework that considers the additional degradation factors in real scenes neglected in previous works. The proposed framework is built upon a two-stage recurrent network with dual-level flow regularizations to perform the inverse recovery process of the rain synthesis model for video deraining. The rain-free frame is estimated from the single rain frame at the first stage. It is then taken as guidance along with previously recovered clean frames to help obtain a more accurate clean frame at the second stage. This two-step architecture is capable of extracting more reliable motion information from the initially estimated rain-free frame at the first stage for better frame alignment and motion modeling at the second stage. Furthermore, to keep the motion consistency between frames that facilitates a frame-consistent deraining model at the second stage, a dual-level flow based regularization is proposed at both coarse flow and fine pixel levels. To better train and evaluate the proposed video deraining network, a novel rain synthesis model is developed to produce more visually authentic paired training and evaluation videos. Extensive experiments on a series of synthetic and real videos verify not only the superiority of the proposed method over state-of-the-art but also the effectiveness of network design and its each component.
Tasks Rain Removal
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Yang_Frame-Consistent_Recurrent_Video_Deraining_With_Dual-Level_Flow_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Yang_Frame-Consistent_Recurrent_Video_Deraining_With_Dual-Level_Flow_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/frame-consistent-recurrent-video-deraining
Repo https://github.com/flyywh/Dual-FLow-Video-Deraining-CVPR-2019
Framework none

Invertible Convolutional Flow

Title Invertible Convolutional Flow
Authors Mahdi Karami, Dale Schuurmans, Jascha Sohl-Dickstein, Laurent Dinh, Daniel Duckworth
Abstract Normalizing flows can be used to construct high quality generative probabilistic models, but training and sample generation require repeated evaluation of Jacobian determinants and function inverses. To make such computations feasible, current approaches employ highly constrained architectures that produce diagonal, triangular, or low rank Jacobian matrices. As an alternative, we investigate a set of novel normalizing flows based on the circular and symmetric convolutions. We show that these transforms admit efficient Jacobian determinant computation and inverse mapping (deconvolution) in O(N log N) time. Additionally, element-wise multiplication, widely used in normalizing flow architectures, can be combined with these transforms to increase modeling flexibility. We further propose an analytic approach to designing nonlinear elementwise bijectors that induce special properties in the intermediate layers, by implicitly introducing specific regularizers in the loss. We show that these transforms allow more effective normalizing flow models to be developed for generative image models.
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/8801-invertible-convolutional-flow
PDF http://papers.nips.cc/paper/8801-invertible-convolutional-flow.pdf
PWC https://paperswithcode.com/paper/invertible-convolutional-flow
Repo https://github.com/Karami-m/Invertible-Convolutional-Flow
Framework none

Identification, Interpretability, and Bayesian Word Embeddings

Title Identification, Interpretability, and Bayesian Word Embeddings
Authors Adam Lauretig
Abstract Social scientists have recently turned to analyzing text using tools from natural language processing like word embeddings to measure concepts like ideology, bias, and affinity. However, word embeddings are difficult to use in the regression framework familiar to social scientists: embeddings are are neither identified, nor directly interpretable. I offer two advances on standard embedding models to remedy these problems. First, I develop Bayesian Word Embeddings with Automatic Relevance Determination priors, relaxing the assumption that all embedding dimensions have equal weight. Second, I apply work identifying latent variable models to anchor embeddings, identifying them, and making them interpretable and usable in a regression. I then apply this model and anchoring approach to two cases, the shift in internationalist rhetoric in the American presidents{'} inaugural addresses, and the relationship between bellicosity in American foreign policy decision-makers{'} deliberations. I find that inaugural addresses became less internationalist after 1945, which goes against the conventional wisdom, and that an increase in bellicosity is associated with an increase in hostile actions by the United States, showing that elite deliberations are not cheap talk, and helping confirm the validity of the model.
Tasks Latent Variable Models, Word Embeddings
Published 2019-06-01
URL https://www.aclweb.org/anthology/W19-2102/
PDF https://www.aclweb.org/anthology/W19-2102
PWC https://paperswithcode.com/paper/identification-interpretability-and-bayesian-1
Repo https://github.com/adamlauretig/bwe
Framework none

Imitation Learning for Sentence Generation with Dilated Convolutions Using Adversarial Training

Title Imitation Learning for Sentence Generation with Dilated Convolutions Using Adversarial Training
Authors Jian-Wei Peng, Min-Chun Hu, Chuan-Wang Chang
Abstract In this work, we consider the sentence generation problem as an imitation learning problem, which aims to learn a policy to mimic the expert. Recent works have showed that adversarial learning can be applied to imitation learning problems. However, it has been indicated that the reward signal from the discriminator is not robust in reinforcement learning (RL) based generative adversarial network (GAN), and estimating state-action value is usually computationally intractable. To deal with this problem, we propose to use two discriminators to provide two different reward signals for constructing a more general imitation learning framework that can be used for sequence generation. Monte Carlo (MC) rollout is therefore not necessary to make our algorithm computationally tractable for generating long sequences. Furthermore, our policy and discriminator networks are integrated by sharing another state encoder network constructed based on dilated convolutions instead of recurrent neural networks (RNNs). In our experiment, we show that the two reward signals control the trade-off between the quality and the diversity of the output sequences.
Tasks Imitation Learning
Published 2019-08-15
URL https://ieeexplore.ieee.org/document/8795047
PDF https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8795047
PWC https://paperswithcode.com/paper/imitation-learning-for-sentence-generation
Repo https://github.com/AndersonPeng/imitation-learning-seq-conv
Framework tf

Text Categorization by Learning Predominant Sense of Words as Auxiliary Task

Title Text Categorization by Learning Predominant Sense of Words as Auxiliary Task
Authors Kazuya Shimura, Jiyi Li, Fumiyo Fukumoto
Abstract Distributions of the senses of words are often highly skewed and give a strong influence of the domain of a document. This paper follows the assumption and presents a method for text categorization by leveraging the predominant sense of words depending on the domain, i.e., domain-specific senses. The key idea is that the features learned from predominant senses are possible to discriminate the domain of the document and thus improve the overall performance of text categorization. We propose multi-task learning framework based on the neural network model, transformer, which trains a model to simultaneously categorize documents and predicts a predominant sense for each word. The experimental results using four benchmark datasets show that our method is comparable to the state-of-the-art categorization approach, especially our model works well for categorization of multi-label documents.
Tasks Multi-Task Learning, Text Categorization
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1105/
PDF https://www.aclweb.org/anthology/P19-1105
PWC https://paperswithcode.com/paper/text-categorization-by-learning-predominant
Repo https://github.com/ShimShim46/TRF_Multitask
Framework none

Heartbeat classification fusing temporal and morphological information of ECGs via ensemble of classifiers

Title Heartbeat classification fusing temporal and morphological information of ECGs via ensemble of classifiers
Authors V.Mondéjar-Guerr, J.Novo, J.Rouco, M.G.Penedo, M.Ortega
Abstract A method for the automatic classification of electrocardiograms (ECG) based on the combination of multiple Support Vector Machines (SVMs) is presented in this work. The method relies on the time intervals between consequent beats and their morphology for the ECG characterisation. Different descriptors based on wavelets, local binary patterns (LBP), higher order statistics (HOS) and several amplitude values were employed. Instead of concatenating all these features to feed a single SVM model, we propose to train specific SVM models for each type of feature. In order to obtain the final prediction, the decisions of the different models are combined with the product, sum, and majority rules. The designed methodology approaches are tested on the public MIT-BIH arrhythmia database, classifying four kinds of abnormal and normal beats. Our approach based on an ensemble of SVMs offered a satisfactory performance, improving the results when compared to a single SVM model using the same features. Additionally, our approach also showed better results in comparison with previous machine learning approaches of the state-of-the-art.
Tasks Electrocardiography (ECG), Heartbeat Classification
Published 2019-01-01
URL https://www.researchgate.net/publication/327263145_Heartbeat_classification_fusing_temporal_and_morphological_information_of_ECGs_via_ensemble_of_classifiers
PDF https://www.researchgate.net/publication/327263145_Heartbeat_classification_fusing_temporal_and_morphological_information_of_ECGs_via_ensemble_of_classifiers
PWC https://paperswithcode.com/paper/heartbeat-classification-fusing-temporal-and
Repo https://github.com/mondejar/ecg-classification
Framework tf

Differentiable Ranking and Sorting using Optimal Transport

Title Differentiable Ranking and Sorting using Optimal Transport
Authors Marco Cuturi, Olivier Teboul, Jean-Philippe Vert
Abstract Sorting is used pervasively in machine learning, either to define elementary algorithms, such as $k$-nearest neighbors ($k$-NN) rules, or to define test-time metrics, such as top-$k$ classification accuracy or ranking losses. Sorting is however a poor match for the end-to-end, automatically differentiable pipelines of deep learning. Indeed, sorting procedures output two vectors, neither of which is differentiable: the vector of sorted values is piecewise linear, while the sorting permutation itself (or its inverse, the vector of ranks) has no differentiable properties to speak of, since it is integer-valued. We propose in this paper to replace the usual \texttt{sort} procedure with a differentiable proxy. Our proxy builds upon the fact that sorting can be seen as an optimal assignment problem, one in which the $n$ values to be sorted are matched to an \emph{auxiliary} probability measure supported on any \emph{increasing} family of $n$ target values. From this observation, we propose extended rank and sort operators by considering optimal transport (OT) problems (the natural relaxation for assignments) where the auxiliary measure can be any weighted measure supported on $m$ increasing values, where $m \ne n$. We recover differentiable operators by regularizing these OT problems with an entropic penalty, and solve them by applying Sinkhorn iterations. Using these smoothed rank and sort operators, we propose differentiable proxies for the classification 0/1 loss as well as for the quantile regression loss.
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/8910-differentiable-ranking-and-sorting-using-optimal-transport
PDF http://papers.nips.cc/paper/8910-differentiable-ranking-and-sorting-using-optimal-transport.pdf
PWC https://paperswithcode.com/paper/differentiable-ranking-and-sorting-using
Repo https://github.com/google-research/google-research
Framework tf

Symbolic Inductive Bias for Visually Grounded Learning of Spoken Language

Title Symbolic Inductive Bias for Visually Grounded Learning of Spoken Language
Authors Grzegorz Chrupa{\l}a
Abstract A widespread approach to processing spoken language is to first automatically transcribe it into text. An alternative is to use an end-to-end approach: recent works have proposed to learn semantic embeddings of spoken language from images with spoken captions, without an intermediate transcription step. We propose to use multitask learning to exploit existing transcribed speech within the end-to-end setting. We describe a three-task architecture which combines the objectives of matching spoken captions with corresponding images, speech with text, and text with images. We show that the addition of the speech/text task leads to substantial performance improvements on image retrieval when compared to training the speech/image task in isolation. We conjecture that this is due to a strong inductive bias transcribed speech provides to the model, and offer supporting evidence for this.
Tasks Image Retrieval
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1647/
PDF https://www.aclweb.org/anthology/P19-1647
PWC https://paperswithcode.com/paper/symbolic-inductive-bias-for-visually-grounded-1
Repo https://github.com/gchrupala/symbolic-bias
Framework none

LT Expertfinder: An Evaluation Framework for Expert Finding Methods

Title LT Expertfinder: An Evaluation Framework for Expert Finding Methods
Authors Tim Fischer, Steffen Remus, Chris Biemann
Abstract Expert finding is the task of ranking persons for a predefined topic or search query. Finding experts for a specified area is an important task and has attracted much attention in the information retrieval community. Most approaches for this task are evaluated in a supervised fashion, which depend on predefined topics of interest as well as gold standard expert rankings. Famous representatives of such datasets are enriched versions of DBLP provided by the ArnetMiner projet or the W3C Corpus of TREC. However, manually ranking experts can be considered highly subjective and detailed rankings are hardly distinguishable. Evaluating these datasets does not necessarily guarantee a good or bad performance of the system. Particularly for dynamic systems, where topics are not predefined but formulated as a search query, we believe a more informative approach is to perform user studies for directly comparing different methods in the same view. In order to accomplish this in a user-friendly way, we present the LT Expert Finder web-application, which is equipped with various query-based expert finding methods that can be easily extended, a detailed expert profile view, detailed evidence in form of relevant documents and statistics, and an evaluation component that allows the qualitative comparison between different rankings.
Tasks Information Retrieval
Published 2019-06-01
URL https://www.aclweb.org/anthology/N19-4017/
PDF https://www.aclweb.org/anthology/N19-4017
PWC https://paperswithcode.com/paper/lt-expertfinder-an-evaluation-framework-for
Repo https://github.com/uhh-lt/lt-expertfinder
Framework none

Privacy-Preserving Q-Learning with Functional Noise in Continuous Spaces

Title Privacy-Preserving Q-Learning with Functional Noise in Continuous Spaces
Authors Baoxiang Wang, Nidhi Hegde
Abstract We consider differentially private algorithms for reinforcement learning in continuous spaces, such that neighboring reward functions are indistinguishable. This protects the reward information from being exploited by methods such as inverse reinforcement learning. Existing studies that guarantee differential privacy are not extendable to infinite state spaces, as the noise level to ensure privacy will scale accordingly to infinity. Our aim is to protect the value function approximator, without regard to the number of states queried to the function. It is achieved by adding functional noise to the value function iteratively in the training. We show rigorous privacy guarantees by a series of analyses on the kernel of the noise space, the probabilistic bound of such noise samples, and the composition over the iterations. We gain insight into the utility analysis by proving the algorithm’s approximate optimality when the state space is discrete. Experiments corroborate our theoretical findings and show improvement over existing approaches.
Tasks Q-Learning
Published 2019-12-01
URL http://papers.nips.cc/paper/9310-privacy-preserving-q-learning-with-functional-noise-in-continuous-spaces
PDF http://papers.nips.cc/paper/9310-privacy-preserving-q-learning-with-functional-noise-in-continuous-spaces.pdf
PWC https://paperswithcode.com/paper/privacy-preserving-q-learning-with-functional
Repo https://github.com/wangbx66/differentially-private-q-learning
Framework pytorch

Space and Time Efficient Kernel Density Estimation in High Dimensions

Title Space and Time Efficient Kernel Density Estimation in High Dimensions
Authors Arturs Backurs, Piotr Indyk, Tal Wagner
Abstract Recently, Charikar and Siminelakis (2017) presented a framework for kernel density estimation in provably sublinear query time, for kernels that possess a certain hashing-based property. However, their data structure requires a significantly increased super-linear storage space, as well as super-linear preprocessing time. These limitations inhibit the practical applicability of their approach on large datasets. In this work, we present an improvement to their framework that retains the same query time, while requiring only linear space and linear preprocessing time. We instantiate our framework with the Laplacian and Exponential kernels, two popular kernels which possess the aforementioned property. Our experiments on various datasets verify that our approach attains accuracy and query time similar to Charikar and Siminelakis (2017), with significantly improved space and preprocessing time.
Tasks Density Estimation
Published 2019-12-01
URL http://papers.nips.cc/paper/9709-space-and-time-efficient-kernel-density-estimation-in-high-dimensions
PDF http://papers.nips.cc/paper/9709-space-and-time-efficient-kernel-density-estimation-in-high-dimensions.pdf
PWC https://paperswithcode.com/paper/space-and-time-efficient-kernel-density
Repo https://github.com/talwagner/efficient_kde
Framework none
comments powered by Disqus