Paper Group ANR 253
Multinomial Logit Bandit with Linear Utility Functions. Deep Nonlinear Non-Gaussian Filtering for Dynamical Systems. Model-based Approximate Query Processing. Saliency for Fine-grained Object Recognition in Domains with Scarce Training Data. Inferring Routing Preferences of Bicyclists from Sparse Sets of Trajectories. airpred: A Flexible R Package …
Multinomial Logit Bandit with Linear Utility Functions
Title | Multinomial Logit Bandit with Linear Utility Functions |
Authors | Mingdong Ou, Nan Li, Shenghuo Zhu, Rong Jin |
Abstract | Multinomial logit bandit is a sequential subset selection problem which arises in many applications. In each round, the player selects a $K$-cardinality subset from $N$ candidate items, and receives a reward which is governed by a {\it multinomial logit} (MNL) choice model considering both item utility and substitution property among items. The player’s objective is to dynamically learn the parameters of MNL model and maximize cumulative reward over a finite horizon $T$. This problem faces the exploration-exploitation dilemma, and the involved combinatorial nature makes it non-trivial. In recent years, there have developed some algorithms by exploiting specific characteristics of the MNL model, but all of them estimate the parameters of MNL model separately and incur a regret no better than $\tilde{O}\big(\sqrt{NT}\big)$ which is not preferred for large candidate set size $N$. In this paper, we consider the {\it linear utility} MNL choice model whose item utilities are represented as linear functions of $d$-dimension item features, and propose an algorithm, titled {\bf LUMB}, to exploit the underlying structure. It is proven that the proposed algorithm achieves $\tilde{O}\big(dK\sqrt{T}\big)$ regret which is free of candidate set size. Experiments show the superiority of the proposed algorithm. |
Tasks | |
Published | 2018-05-08 |
URL | http://arxiv.org/abs/1805.02971v2 |
http://arxiv.org/pdf/1805.02971v2.pdf | |
PWC | https://paperswithcode.com/paper/multinomial-logit-bandit-with-linear-utility |
Repo | |
Framework | |
Deep Nonlinear Non-Gaussian Filtering for Dynamical Systems
Title | Deep Nonlinear Non-Gaussian Filtering for Dynamical Systems |
Authors | Arash Mehrjou, Bernhard Schölkopf |
Abstract | Filtering is a general name for inferring the states of a dynamical system given observations. The most common filtering approach is Gaussian Filtering (GF) where the distribution of the inferred states is a Gaussian whose mean is an affine function of the observations. There are two restrictions in this model: Gaussianity and Affinity. We propose a model to relax both these assumptions based on recent advances in implicit generative models. Empirical results show that the proposed method gives a significant advantage over GF and nonlinear methods based on fixed nonlinear kernels. |
Tasks | |
Published | 2018-11-14 |
URL | http://arxiv.org/abs/1811.05933v1 |
http://arxiv.org/pdf/1811.05933v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-nonlinear-non-gaussian-filtering-for |
Repo | |
Framework | |
Model-based Approximate Query Processing
Title | Model-based Approximate Query Processing |
Authors | Moritz Kulessa, Alejandro Molina, Carsten Binnig, Benjamin Hilprecht, Kristian Kersting |
Abstract | Interactive visualizations are arguably the most important tool to explore, understand and convey facts about data. In the past years, the database community has been working on different techniques for Approximate Query Processing (AQP) that aim to deliver an approximate query result given a fixed time bound to support interactive visualizations better. However, classical AQP approaches suffer from various problems that limit the applicability to support the ad-hoc exploration of a new data set: (1) Classical AQP approaches that perform online sampling can support ad-hoc exploration queries but yield low quality if executed over rare subpopulations. (2) Classical AQP approaches that rely on offline sampling can use some form of biased sampling to mitigate these problems but require a priori knowledge of the workload, which is often not realistic if users want to explore a new database. In this paper, we present a new approach to AQP called Model-based AQP that leverages generative models learned over the complete database to answer SQL queries at interactive speeds. Different from classical AQP approaches, generative models allow us to compute responses to ad-hoc queries and deliver high-quality estimates also over rare subpopulations at the same time. In our experiments with real and synthetic data sets, we show that Model-based AQP can in many scenarios return more accurate results in a shorter runtime. Furthermore, we think that our techniques of using generative models presented in this paper can not only be used for AQP in databases but also has applications for other database problems including Query Optimization as well as Data Cleaning. |
Tasks | |
Published | 2018-11-15 |
URL | http://arxiv.org/abs/1811.06224v1 |
http://arxiv.org/pdf/1811.06224v1.pdf | |
PWC | https://paperswithcode.com/paper/model-based-approximate-query-processing |
Repo | |
Framework | |
Saliency for Fine-grained Object Recognition in Domains with Scarce Training Data
Title | Saliency for Fine-grained Object Recognition in Domains with Scarce Training Data |
Authors | Carola Figueroa Flores, Abel Gonzalez-García, Joost van de Weijer, Bogdan Raducanu |
Abstract | This paper investigates the role of saliency to improve the classification accuracy of a Convolutional Neural Network (CNN) for the case when scarce training data is available. Our approach consists in adding a saliency branch to an existing CNN architecture which is used to modulate the standard bottom-up visual features from the original image input, acting as an attentional mechanism that guides the feature extraction process. The main aim of the proposed approach is to enable the effective training of a fine-grained recognition model with limited training samples and to improve the performance on the task, thereby alleviating the need to annotate large dataset. % The vast majority of saliency methods are evaluated on their ability to generate saliency maps, and not on their functionality in a complete vision pipeline. Our proposed pipeline allows to evaluate saliency methods for the high-level task of object recognition. We perform extensive experiments on various fine-grained datasets (Flowers, Birds, Cars, and Dogs) under different conditions and show that saliency can considerably improve the network’s performance, especially for the case of scarce training data. Furthermore, our experiments show that saliency methods that obtain improved saliency maps (as measured by traditional saliency benchmarks) also translate to saliency methods that yield improved performance gains when applied in an object recognition pipeline. |
Tasks | Object Recognition |
Published | 2018-08-01 |
URL | https://arxiv.org/abs/1808.00262v3 |
https://arxiv.org/pdf/1808.00262v3.pdf | |
PWC | https://paperswithcode.com/paper/saliency-for-fine-grained-object-recognition |
Repo | |
Framework | |
Inferring Routing Preferences of Bicyclists from Sparse Sets of Trajectories
Title | Inferring Routing Preferences of Bicyclists from Sparse Sets of Trajectories |
Authors | J. Oehrlein, A. Förster, D. Schunck, Y. Dehbi, R. Roscher, J. -H. Haunert |
Abstract | Understanding the criteria that bicyclists apply when they choose their routes is crucial for planning new bicycle paths or recommending routes to bicyclists. This is becoming more and more important as city councils are becoming increasingly aware of limitations of the transport infrastructure and problems related to automobile traffic. Since different groups of cyclists have different preferences, however, searching for a single set of criteria is prone to failure. Therefore, in this paper, we present a new approach to classify trajectories recorded and shared by bicyclists into different groups and, for each group, to identify favored and unfavored road types. Based on these results we show how to assign weights to the edges of a graph representing the road network such that minimum-weight paths in the graph, which can be computed with standard shortest-path algorithms, correspond to adequate routes. Our method combines known algorithms for machine learning and the analysis of trajectories in an innovative way and, thereby, constitutes a new comprehensive solution for the problem of deriving routing preferences from initially unclassified trajectories. An important property of our method is that it yields reasonable results even if the given set of trajectories is sparse in the sense that it does not cover all segments of the cycle network. |
Tasks | |
Published | 2018-06-24 |
URL | http://arxiv.org/abs/1806.09158v1 |
http://arxiv.org/pdf/1806.09158v1.pdf | |
PWC | https://paperswithcode.com/paper/inferring-routing-preferences-of-bicyclists |
Repo | |
Framework | |
airpred: A Flexible R Package Implementing Methods for Predicting Air Pollution
Title | airpred: A Flexible R Package Implementing Methods for Predicting Air Pollution |
Authors | M. Benjamin Sabath, Qian Di, Danielle Braun, Joel Schwarz, Francesca Dominici, Christine Choirat |
Abstract | Fine particulate matter (PM$_{2.5}$) is one of the criteria air pollutants regulated by the Environmental Protection Agency in the United States. There is strong evidence that ambient exposure to (PM$_{2.5}$) increases risk of mortality and hospitalization. Large scale epidemiological studies on the health effects of PM$_{2.5}$ provide the necessary evidence base for lowering the safety standards and inform regulatory policy. However, ambient monitors of PM$_{2.5}$ (as well as monitors for other pollutants) are sparsely located across the U.S., and therefore studies based only on the levels of PM$_{2.5}$ measured from the monitors would inevitably exclude large amounts of the population. One approach to resolving this issue has been developing models to predict local PM$_{2.5}$, NO$_2$, and ozone based on satellite, meteorological, and land use data. This process typically relies developing a prediction model that relies on large amounts of input data and is highly computationally intensive to predict levels of air pollution in unmonitored areas. We have developed a flexible R package that allows for environmental health researchers to design and train spatio-temporal models capable of predicting multiple pollutants, including PM$_{2.5}$. We utilize H2O, an open source big data platform, to achieve both performance and scalability when used in conjunction with cloud or cluster computing systems. |
Tasks | |
Published | 2018-05-29 |
URL | http://arxiv.org/abs/1805.11534v2 |
http://arxiv.org/pdf/1805.11534v2.pdf | |
PWC | https://paperswithcode.com/paper/airpred-a-flexible-r-package-implementing |
Repo | |
Framework | |
Small-Variance Asymptotics for Nonparametric Bayesian Overlapping Stochastic Blockmodels
Title | Small-Variance Asymptotics for Nonparametric Bayesian Overlapping Stochastic Blockmodels |
Authors | Gundeep Arora, Anupreet Porwal, Kanupriya Agarwal, Avani Samdariya, Piyush Rai |
Abstract | The latent feature relational model (LFRM) is a generative model for graph-structured data to learn a binary vector representation for each node in the graph. The binary vector denotes the node’s membership in one or more communities. At its core, the LFRM miller2009nonparametric is an overlapping stochastic blockmodel, which defines the link probability between any pair of nodes as a bilinear function of their community membership vectors. Moreover, using a nonparametric Bayesian prior (Indian Buffet Process) enables learning the number of communities automatically from the data. However, despite its appealing properties, inference in LFRM remains a challenge and is typically done via MCMC methods. This can be slow and may take a long time to converge. In this work, we develop a small-variance asymptotics based framework for the non-parametric Bayesian LFRM. This leads to an objective function that retains the nonparametric Bayesian flavor of LFRM, while enabling us to design deterministic inference algorithms for this model, that are easy to implement (using generic or specialized optimization routines) and are fast in practice. Our results on several benchmark datasets demonstrate that our algorithm is competitive to methods such as MCMC, while being much faster. |
Tasks | |
Published | 2018-07-10 |
URL | http://arxiv.org/abs/1807.03570v1 |
http://arxiv.org/pdf/1807.03570v1.pdf | |
PWC | https://paperswithcode.com/paper/small-variance-asymptotics-for-nonparametric |
Repo | |
Framework | |
Machine Decisions and Human Consequences
Title | Machine Decisions and Human Consequences |
Authors | Teresa Scantamburlo, Andrew Charlesworth, Nello Cristianini |
Abstract | As we increasingly delegate decision-making to algorithms, whether directly or indirectly, important questions emerge in circumstances where those decisions have direct consequences for individual rights and personal opportunities, as well as for the collective good. A key problem for policymakers is that the social implications of these new methods can only be grasped if there is an adequate comprehension of their general technical underpinnings. The discussion here focuses primarily on the case of enforcement decisions in the criminal justice system, but draws on similar situations emerging from other algorithms utilised in controlling access to opportunities, to explain how machine learning works and, as a result, how decisions are made by modern intelligent algorithms or ‘classifiers’. It examines the key aspects of the performance of classifiers, including how classifiers learn, the fact that they operate on the basis of correlation rather than causation, and that the term ‘bias’ in machine learning has a different meaning to common usage. An example of a real world ‘classifier’, the Harm Assessment Risk Tool (HART), is examined, through identification of its technical features: the classification method, the training data and the test data, the features and the labels, validation and performance measures. Four normative benchmarks are then considered by reference to HART: (a) prediction accuracy (b) fairness and equality before the law (c) transparency and accountability (d) informational privacy and freedom of expression, in order to demonstrate how its technical features have important normative dimensions that bear directly on the extent to which the system can be regarded as a viable and legitimate support for, or even alternative to, existing human decision-makers. |
Tasks | Decision Making |
Published | 2018-11-16 |
URL | http://arxiv.org/abs/1811.06747v2 |
http://arxiv.org/pdf/1811.06747v2.pdf | |
PWC | https://paperswithcode.com/paper/machine-decisions-and-human-consequences |
Repo | |
Framework | |
Code-Mixed Sentiment Analysis Using Machine Learning and Neural Network Approaches
Title | Code-Mixed Sentiment Analysis Using Machine Learning and Neural Network Approaches |
Authors | Pruthwik Mishra, Prathyusha Danda, Pranav Dhakras |
Abstract | Sentiment Analysis for Indian Languages (SAIL)-Code Mixed tools contest aimed at identifying the sentence level sentiment polarity of the code-mixed dataset of Indian languages pairs (Hi-En, Ben-Hi-En). Hi-En dataset is henceforth referred to as HI-EN and Ben-Hi-En dataset as BN-EN respectively. For this, we submitted four models for sentiment analysis of code-mixed HI-EN and BN-EN datasets. The first model was an ensemble voting classifier consisting of three classifiers - linear SVM, logistic regression and random forests while the second one was a linear SVM. Both the models used TF-IDF feature vectors of character n-grams where n ranged from 2 to 6. We used scikit-learn (sklearn) machine learning library for implementing both the approaches. Run1 was obtained from the voting classifier and Run2 used the linear SVM model for producing the results. Out of the four submitted outputs Run2 outperformed Run1 in both the datasets. We finished first in the contest for both HI-EN with an F-score of 0.569 and BN-EN with an F-score of 0.526. |
Tasks | Sentiment Analysis |
Published | 2018-08-09 |
URL | http://arxiv.org/abs/1808.03299v1 |
http://arxiv.org/pdf/1808.03299v1.pdf | |
PWC | https://paperswithcode.com/paper/code-mixed-sentiment-analysis-using-machine |
Repo | |
Framework | |
Predicting the Usefulness of Amazon Reviews Using Off-The-Shelf Argumentation Mining
Title | Predicting the Usefulness of Amazon Reviews Using Off-The-Shelf Argumentation Mining |
Authors | Marco Passon, Marco Lippi, Giuseppe Serra, Carlo Tasso |
Abstract | Internet users generate content at unprecedented rates. Building intelligent systems capable of discriminating useful content within this ocean of information is thus becoming a urgent need. In this paper, we aim to predict the usefulness of Amazon reviews, and to do this we exploit features coming from an off-the-shelf argumentation mining system. We argue that the usefulness of a review, in fact, is strictly related to its argumentative content, whereas the use of an already trained system avoids the costly need of relabeling a novel dataset. Results obtained on a large publicly available corpus support this hypothesis. |
Tasks | |
Published | 2018-09-21 |
URL | http://arxiv.org/abs/1809.08145v1 |
http://arxiv.org/pdf/1809.08145v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-the-usefulness-of-amazon-reviews |
Repo | |
Framework | |
Note: Variational Encoding of Protein Dynamics Benefits from Maximizing Latent Autocorrelation
Title | Note: Variational Encoding of Protein Dynamics Benefits from Maximizing Latent Autocorrelation |
Authors | Hannah K. Wayment-Steele, Vijay S. Pande |
Abstract | As deep Variational Auto-Encoder (VAE) frameworks become more widely used for modeling biomolecular simulation data, we emphasize the capability of the VAE architecture to concurrently maximize the timescale of the latent space while inferring a reduced coordinate, which assists in finding slow processes as according to the variational approach to conformational dynamics. We additionally provide evidence that the VDE framework (Hern'andez et al., 2017), which uses this autocorrelation loss along with a time-lagged reconstruction loss, obtains a variationally optimized latent coordinate in comparison with related loss functions. We thus recommend leveraging the autocorrelation of the latent space while training neural network models of biomolecular simulation data to better represent slow processes. |
Tasks | |
Published | 2018-03-17 |
URL | http://arxiv.org/abs/1803.06449v1 |
http://arxiv.org/pdf/1803.06449v1.pdf | |
PWC | https://paperswithcode.com/paper/note-variational-encoding-of-protein-dynamics |
Repo | |
Framework | |
The Structure Transfer Machine Theory and Applications
Title | The Structure Transfer Machine Theory and Applications |
Authors | Baochang Zhang, Lian Zhuo, Ze Wang, Jungong Han, Xiantong Zhen |
Abstract | Representation learning is a fundamental but challenging problem, especially when the distribution of data is unknown. We propose a new representation learning method, termed Structure Transfer Machine (STM), which enables feature learning process to converge at the representation expectation in a probabilistic way. We theoretically show that such an expected value of the representation (mean) is achievable if the manifold structure can be transferred from the data space to the feature space. The resulting structure regularization term, named manifold loss, is incorporated into the loss function of the typical deep learning pipeline. The STM architecture is constructed to enforce the learned deep representation to satisfy the intrinsic manifold structure from the data, which results in robust features that suit various application scenarios, such as digit recognition, image classification and object tracking. Compared to state-of-the-art CNN architectures, we achieve the better results on several commonly used benchmarks\footnote{The source code is available. https://github.com/stmstmstm/stm }. |
Tasks | Image Classification, Object Tracking, Representation Learning |
Published | 2018-04-01 |
URL | https://arxiv.org/abs/1804.00243v2 |
https://arxiv.org/pdf/1804.00243v2.pdf | |
PWC | https://paperswithcode.com/paper/the-structure-transfer-machine-theory-and |
Repo | |
Framework | |
Boosting algorithms for uplift modeling
Title | Boosting algorithms for uplift modeling |
Authors | Michał Sołtys, Szymon Jaroszewicz |
Abstract | Uplift modeling is an area of machine learning which aims at predicting the causal effect of some action on a given individual. The action may be a medical procedure, marketing campaign, or any other circumstance controlled by the experimenter. Building an uplift model requires two training sets: the treatment group, where individuals have been subject to the action, and the control group, where no action has been performed. An uplift model allows then to assess the gain resulting from taking the action on a given individual, such as the increase in probability of patient recovery or of a product being purchased. This paper describes an adaptation of the well-known boosting techniques to the uplift modeling case. We formulate three desirable properties which an uplift boosting algorithm should have. Since all three properties cannot be satisfied simultaneously, we propose three uplift boosting algorithms, each satisfying two of them. Experiments demonstrate the usefulness of the proposed methods, which often dramatically improve performance of the base models and are thus new and powerful tools for uplift modeling. |
Tasks | |
Published | 2018-07-20 |
URL | http://arxiv.org/abs/1807.07909v1 |
http://arxiv.org/pdf/1807.07909v1.pdf | |
PWC | https://paperswithcode.com/paper/boosting-algorithms-for-uplift-modeling |
Repo | |
Framework | |
Zero-Shot Adaptive Transfer for Conversational Language Understanding
Title | Zero-Shot Adaptive Transfer for Conversational Language Understanding |
Authors | Sungjin Lee, Rahul Jha |
Abstract | Conversational agents such as Alexa and Google Assistant constantly need to increase their language understanding capabilities by adding new domains. A massive amount of labeled data is required for training each new domain. While domain adaptation approaches alleviate the annotation cost, prior approaches suffer from increased training time and suboptimal concept alignments. To tackle this, we introduce a novel Zero-Shot Adaptive Transfer method for slot tagging that utilizes the slot description for transferring reusable concepts across domains, and enjoys efficient training without any explicit concept alignments. Extensive experimentation over a dataset of 10 domains relevant to our commercial personal digital assistant shows that our model outperforms previous state-of-the-art systems by a large margin, and achieves an even higher improvement in the low data regime. |
Tasks | Domain Adaptation |
Published | 2018-08-29 |
URL | http://arxiv.org/abs/1808.10059v1 |
http://arxiv.org/pdf/1808.10059v1.pdf | |
PWC | https://paperswithcode.com/paper/zero-shot-adaptive-transfer-for |
Repo | |
Framework | |
Response monitoring of breast cancer on DCE-MRI using convolutional neural network-generated seed points and constrained volume growing
Title | Response monitoring of breast cancer on DCE-MRI using convolutional neural network-generated seed points and constrained volume growing |
Authors | Bas H. M. van der Velden, Bob D. de Vos, Claudette E. Loo, Hugo J. Kuijf, Ivana Isgum, Kenneth G. A. Gilhuijs |
Abstract | Response of breast cancer to neoadjuvant chemotherapy (NAC) can be monitored using the change in visible tumor on magnetic resonance imaging (MRI). In our current workflow, seed points are manually placed in areas of enhancement likely to contain cancer. A constrained volume growing method uses these manually placed seed points as input and generates a tumor segmentation. This method is rigorously validated using complete pathological embedding. In this study, we propose to exploit deep learning for fast and automatic seed point detection, replacing manual seed point placement in our existing and well-validated workflow. The seed point generator was developed in early breast cancer patients with pathology-proven segmentations (N=100), operated shortly after MRI. It consisted of an ensemble of three independently trained fully convolutional dilated neural networks that classified breast voxels as tumor or non-tumor. Subsequently, local maxima were used as seed points for volume growing in patients receiving NAC (N=10). The percentage of tumor volume change was evaluated against semi-automatic segmentations. The primary cancer was localized in 95% of the tumors at the cost of 0.9 false positive per patient. False positives included focally enhancing regions of unknown origin and parts of the intramammary blood vessels. Volume growing from the seed points showed a median tumor volume decrease of 70% (interquartile range: 50%-77%), comparable to the semi-automatic segmentations (median: 70%, interquartile range 23%-76%). To conclude, a fast and automatic seed point generator was developed, fully automating a well-validated semi-automatic workflow for response monitoring of breast cancer to neoadjuvant chemotherapy. |
Tasks | |
Published | 2018-11-22 |
URL | http://arxiv.org/abs/1811.09063v1 |
http://arxiv.org/pdf/1811.09063v1.pdf | |
PWC | https://paperswithcode.com/paper/response-monitoring-of-breast-cancer-on-dce |
Repo | |
Framework | |