Paper Group ANR 240
Dataiku’s Solution to SPHERE’s Activity Recognition Challenge. A Stackelberg Game Perspective on the Conflict Between Machine Learning and Data Obfuscation. How scientific literature has been evolving over the time? A novel statistical approach using tracking verbal-based methods. Characterization of Lung Nodule Malignancy using Hybrid Shape and Ap …
Dataiku’s Solution to SPHERE’s Activity Recognition Challenge
Title | Dataiku’s Solution to SPHERE’s Activity Recognition Challenge |
Authors | Maxime Voisin, Leo Dreyfus-Schmidt, Pierre Gutierrez, Samuel Ronsin, Marc Beillevaire |
Abstract | Our team won the second prize of the Safe Aging with SPHERE Challenge organized by SPHERE, in conjunction with ECML-PKDD and Driven Data. The goal of the competition was to recognize activities performed by humans, using sensor data. This paper presents our solution. It is based on a rich pre-processing and state of the art machine learning methods. From the raw train data, we generate a synthetic train set with the same statistical characteristics as the test set. We then perform feature engineering. The machine learning modeling part is based on stacking weak learners through a grid searched XGBoost algorithm. Finally, we use post-processing to smooth our predictions over time. |
Tasks | Activity Recognition, Feature Engineering |
Published | 2016-10-10 |
URL | http://arxiv.org/abs/1610.02757v1 |
http://arxiv.org/pdf/1610.02757v1.pdf | |
PWC | https://paperswithcode.com/paper/dataikus-solution-to-spheres-activity |
Repo | |
Framework | |
A Stackelberg Game Perspective on the Conflict Between Machine Learning and Data Obfuscation
Title | A Stackelberg Game Perspective on the Conflict Between Machine Learning and Data Obfuscation |
Authors | Jeffrey Pawlick, Quanyan Zhu |
Abstract | Data is the new oil; this refrain is repeated extensively in the age of internet tracking, machine learning, and data analytics. As data collection becomes more personal and pervasive, however, public pressure is mounting for privacy protection. In this atmosphere, developers have created applications to add noise to user attributes visible to tracking algorithms. This creates a strategic interaction between trackers and users when incentives to maintain privacy and improve accuracy are misaligned. In this paper, we conceptualize this conflict through an N+1-player, augmented Stackelberg game. First a machine learner declares a privacy protection level, and then users respond by choosing their own perturbation amounts. We use the general frameworks of differential privacy and empirical risk minimization to quantify the utility components due to privacy and accuracy, respectively. In equilibrium, each user perturbs her data independently, which leads to a high net loss in accuracy. To remedy this scenario, we show that the learner improves his utility by proactively perturbing the data himself. While other work in this area has studied privacy markets and mechanism design for truthful reporting of user information, we take a different viewpoint by considering both user and learner perturbation. |
Tasks | |
Published | 2016-08-08 |
URL | http://arxiv.org/abs/1608.02546v2 |
http://arxiv.org/pdf/1608.02546v2.pdf | |
PWC | https://paperswithcode.com/paper/a-stackelberg-game-perspective-on-the-1 |
Repo | |
Framework | |
How scientific literature has been evolving over the time? A novel statistical approach using tracking verbal-based methods
Title | How scientific literature has been evolving over the time? A novel statistical approach using tracking verbal-based methods |
Authors | Daria Micaela Hernandez, Monica Becue-Bertaut, Igor Barahona |
Abstract | This paper provides a global vision of the scientific publications related with the Systemic Lupus Erythematosus (SLE), taking as starting point abstracts of articles. Through the time, abstracts have been evolving towards higher complexity on used terminology, which makes necessary the use of sophisticated statistical methods and answering questions including: how vocabulary is evolving through the time? Which ones are most influential articles? And which one are the articles that introduced new terms and vocabulary? To answer these, we analyze a dataset composed by 506 abstracts and downloaded from 115 different journals and cover a 18 year-period. |
Tasks | |
Published | 2016-02-05 |
URL | http://arxiv.org/abs/1607.07788v1 |
http://arxiv.org/pdf/1607.07788v1.pdf | |
PWC | https://paperswithcode.com/paper/how-scientific-literature-has-been-evolving |
Repo | |
Framework | |
Characterization of Lung Nodule Malignancy using Hybrid Shape and Appearance Features
Title | Characterization of Lung Nodule Malignancy using Hybrid Shape and Appearance Features |
Authors | Mario Buty, Ziyue Xu, Mingchen Gao, Ulas Bagci, Aaron Wu, Daniel J. Mollura |
Abstract | Computed tomography imaging is a standard modality for detecting and assessing lung cancer. In order to evaluate the malignancy of lung nodules, clinical practice often involves expert qualitative ratings on several criteria describing a nodule’s appearance and shape. Translating these features for computer-aided diagnostics is challenging due to their subjective nature and the difficulties in gaining a complete description. In this paper, we propose a computerized approach to quantitatively evaluate both appearance distinctions and 3D surface variations. Nodule shape was modeled and parameterized using spherical harmonics, and appearance features were extracted using deep convolutional neural networks. Both sets of features were combined to estimate the nodule malignancy using a random forest classifier. The proposed algorithm was tested on the publicly available Lung Image Database Consortium dataset, achieving high accuracy. By providing lung nodule characterization, this method can provide a robust alternative reference opinion for lung cancer diagnosis. |
Tasks | Lung Cancer Diagnosis |
Published | 2016-09-21 |
URL | http://arxiv.org/abs/1609.06668v1 |
http://arxiv.org/pdf/1609.06668v1.pdf | |
PWC | https://paperswithcode.com/paper/characterization-of-lung-nodule-malignancy |
Repo | |
Framework | |
Semantic 3D Reconstruction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint
Title | Semantic 3D Reconstruction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint |
Authors | Nikolay Savinov, Christian Haene, Lubor Ladicky, Marc Pollefeys |
Abstract | We propose an approach for dense semantic 3D reconstruction which uses a data term that is defined as potentials over viewing rays, combined with continuous surface area penalization. Our formulation is a convex relaxation which we augment with a crucial non-convex constraint that ensures exact handling of visibility. To tackle the non-convex minimization problem, we propose a majorize-minimize type strategy which converges to a critical point. We demonstrate the benefits of using the non-convex constraint experimentally. For the geometry-only case, we set a new state of the art on two datasets of the commonly used Middlebury multi-view stereo benchmark. Moreover, our general-purpose formulation directly reconstructs thin objects, which are usually treated with specialized algorithms. A qualitative evaluation on the dense semantic 3D reconstruction task shows that we improve significantly over previous methods. |
Tasks | 3D Reconstruction |
Published | 2016-04-11 |
URL | https://arxiv.org/abs/1604.02885v3 |
https://arxiv.org/pdf/1604.02885v3.pdf | |
PWC | https://paperswithcode.com/paper/semantic-3d-reconstruction-with-continuous |
Repo | |
Framework | |
Yelp Dataset Challenge: Review Rating Prediction
Title | Yelp Dataset Challenge: Review Rating Prediction |
Authors | Nabiha Asghar |
Abstract | Review websites, such as TripAdvisor and Yelp, allow users to post online reviews for various businesses, products and services, and have been recently shown to have a significant influence on consumer shopping behaviour. An online review typically consists of free-form text and a star rating out of 5. The problem of predicting a user’s star rating for a product, given the user’s text review for that product, is called Review Rating Prediction and has lately become a popular, albeit hard, problem in machine learning. In this paper, we treat Review Rating Prediction as a multi-class classification problem, and build sixteen different prediction models by combining four feature extraction methods, (i) unigrams, (ii) bigrams, (iii) trigrams and (iv) Latent Semantic Indexing, with four machine learning algorithms, (i) logistic regression, (ii) Naive Bayes classification, (iii) perceptrons, and (iv) linear Support Vector Classification. We analyse the performance of each of these sixteen models to come up with the best model for predicting the ratings from reviews. We use the dataset provided by Yelp for training and testing the models. |
Tasks | |
Published | 2016-05-17 |
URL | http://arxiv.org/abs/1605.05362v1 |
http://arxiv.org/pdf/1605.05362v1.pdf | |
PWC | https://paperswithcode.com/paper/yelp-dataset-challenge-review-rating |
Repo | |
Framework | |
Max-Margin Nonparametric Latent Feature Models for Link Prediction
Title | Max-Margin Nonparametric Latent Feature Models for Link Prediction |
Authors | Jun Zhu, Jiaming Song, Bei Chen |
Abstract | Link prediction is a fundamental task in statistical network analysis. Recent advances have been made on learning flexible nonparametric Bayesian latent feature models for link prediction. In this paper, we present a max-margin learning method for such nonparametric latent feature relational models. Our approach attempts to unite the ideas of max-margin learning and Bayesian nonparametrics to discover discriminative latent features for link prediction. It inherits the advances of nonparametric Bayesian methods to infer the unknown latent social dimension, while for discriminative link prediction, it adopts the max-margin learning principle by minimizing a hinge-loss using the linear expectation operator, without dealing with a highly nonlinear link likelihood function. For posterior inference, we develop an efficient stochastic variational inference algorithm under a truncated mean-field assumption. Our methods can scale up to large-scale real networks with millions of entities and tens of millions of positive links. We also provide a full Bayesian formulation, which can avoid tuning regularization hyper-parameters. Experimental results on a diverse range of real datasets demonstrate the benefits inherited from max-margin learning and Bayesian nonparametric inference. |
Tasks | Link Prediction |
Published | 2016-02-24 |
URL | http://arxiv.org/abs/1602.07428v1 |
http://arxiv.org/pdf/1602.07428v1.pdf | |
PWC | https://paperswithcode.com/paper/max-margin-nonparametric-latent-feature |
Repo | |
Framework | |
Spatial Scaling of Satellite Soil Moisture using Temporal Correlations and Ensemble Learning
Title | Spatial Scaling of Satellite Soil Moisture using Temporal Correlations and Ensemble Learning |
Authors | Subit Chakrabarti, Jasmeet Judge, Tara Bongiovanni, Anand Rangarajan, Sanjay Ranka |
Abstract | A novel algorithm is developed to downscale soil moisture (SM), obtained at satellite scales of 10-40 km by utilizing its temporal correlations to historical auxiliary data at finer scales. Including such correlations drastically reduces the size of the training set needed, accounts for time-lagged relationships, and enables downscaling even in the presence of short gaps in the auxiliary data. The algorithm is based upon bagged regression trees (BRT) and uses correlations between high-resolution remote sensing products and SM observations. The algorithm trains multiple regression trees and automatically chooses the trees that generate the best downscaled estimates. The algorithm was evaluated using a multi-scale synthetic dataset in north central Florida for two years, including two growing seasons of corn and one growing season of cotton per year. The time-averaged error across the region was found to be 0.01 $\mathrm{m}^3/\mathrm{m}^3$, with a standard deviation of 0.012 $\mathrm{m}^3/\mathrm{m}^3$ when 0.02% of the data were used for training in addition to temporal correlations from the past seven days, and all available data from the past year. The maximum spatially averaged errors obtained using this algorithm in downscaled SM were 0.005 $\mathrm{m}^3/\mathrm{m}^3$, for pixels with cotton land-cover. When land surface temperature~(LST) on the day of downscaling was not included in the algorithm to simulate “data gaps”, the spatially averaged error increased minimally by 0.015 $\mathrm{m}^3/\mathrm{m}^3$ when LST is unavailable on the day of downscaling. The results indicate that the BRT-based algorithm provides high accuracy for downscaling SM using complex non-linear spatio-temporal correlations, under heterogeneous micro meteorological conditions. |
Tasks | |
Published | 2016-01-21 |
URL | http://arxiv.org/abs/1601.05767v1 |
http://arxiv.org/pdf/1601.05767v1.pdf | |
PWC | https://paperswithcode.com/paper/spatial-scaling-of-satellite-soil-moisture |
Repo | |
Framework | |
Harmonization of conflicting medical opinions using argumentation protocols and textual entailment - a case study on Parkinson disease
Title | Harmonization of conflicting medical opinions using argumentation protocols and textual entailment - a case study on Parkinson disease |
Authors | Adrian Groza, Madalina Mand Nagy |
Abstract | Parkinson’s disease is the second most common neurodegenerative disease, affecting more than 1.2 million people in Europe. Medications are available for the management of its symptoms, but the exact cause of the disease is unknown and there is currently no cure on the market. To better understand the relations between new findings and current medical knowledge, we need tools able to analyse published medical papers based on natural language processing and tools capable to identify various relationships of new findings with the current medical knowledge. Our work aims to fill the above technological gap. To identify conflicting information in medical documents, we enact textual entailment technology. To encapsulate existing medical knowledge, we rely on ontologies. To connect the formal axioms in ontologies with natural text in medical articles, we exploit ontology verbalisation techniques. To assess the level of disagreement between human agents with respect to a medical issue, we rely on fuzzy aggregation. To harmonize this disagreement, we design mediation protocols within a multi-agent framework. |
Tasks | Natural Language Inference |
Published | 2016-07-27 |
URL | http://arxiv.org/abs/1607.08075v1 |
http://arxiv.org/pdf/1607.08075v1.pdf | |
PWC | https://paperswithcode.com/paper/harmonization-of-conflicting-medical-opinions |
Repo | |
Framework | |
Document Image Coding and Clustering for Script Discrimination
Title | Document Image Coding and Clustering for Script Discrimination |
Authors | Darko Brodic, Alessia Amelio, Zoran N. Milivojevic, Milena Jevtic |
Abstract | The paper introduces a new method for discrimination of documents given in different scripts. The document is mapped into a uniformly coded text of numerical values. It is derived from the position of the letters in the text line, based on their typographical characteristics. Each code is considered as a gray level. Accordingly, the coded text determines a 1-D image, on which texture analysis by run-length statistics and local binary pattern is performed. It defines feature vectors representing the script content of the document. A modified clustering approach employed on document feature vector groups documents written in the same script. Experimentation performed on two custom oriented databases of historical documents in old Cyrillic, angular and round Glagolitic as well as Antiqua and Fraktur scripts demonstrates the superiority of the proposed method with respect to well-known methods in the state-of-the-art. |
Tasks | Texture Classification |
Published | 2016-09-21 |
URL | http://arxiv.org/abs/1609.06492v1 |
http://arxiv.org/pdf/1609.06492v1.pdf | |
PWC | https://paperswithcode.com/paper/document-image-coding-and-clustering-for |
Repo | |
Framework | |
Lasso Guarantees for Time Series Estimation Under Subgaussian Tails and $ β$-Mixing
Title | Lasso Guarantees for Time Series Estimation Under Subgaussian Tails and $ β$-Mixing |
Authors | Kam Chung Wong, Zifan Li, Ambuj Tewari |
Abstract | Many theoretical results on estimation of high dimensional time series require specifying an underlying data generating model (DGM). Instead, along the footsteps of~\cite{wong2017lasso}, this paper relies only on (strict) stationarity and $ \beta $-mixing condition to establish consistency of lasso when data comes from a $\beta$-mixing process with marginals having subgaussian tails. Because of the general assumptions, the data can come from DGMs different than standard time series models such as VAR or ARCH. When the true DGM is not VAR, the lasso estimates correspond to those of the best linear predictors using the past observations. We establish non-asymptotic inequalities for estimation and prediction errors of the lasso estimates. Together with~\cite{wong2017lasso}, we provide lasso guarantees that cover full spectrum of the parameters in specifications of $ \beta $-mixing subgaussian time series. Applications of these results potentially extend to non-Gaussian, non-Markovian and non-linear times series models as the examples we provide demonstrate. In order to prove our results, we derive a novel Hanson-Wright type concentration inequality for $\beta$-mixing subgaussian random vectors that may be of independent interest. |
Tasks | Time Series |
Published | 2016-02-12 |
URL | http://arxiv.org/abs/1602.04265v4 |
http://arxiv.org/pdf/1602.04265v4.pdf | |
PWC | https://paperswithcode.com/paper/lasso-guarantees-for-time-series-estimation |
Repo | |
Framework | |
Technical Report: Graph-Structured Sparse Optimization for Connected Subgraph Detection
Title | Technical Report: Graph-Structured Sparse Optimization for Connected Subgraph Detection |
Authors | Baojian Zhou, Feng Chen |
Abstract | Structured sparse optimization is an important and challenging problem for analyzing high-dimensional data in a variety of applications such as bioinformatics, medical imaging, social networks, and astronomy. Although a number of structured sparsity models have been explored, such as trees, groups, clusters, and paths, connected subgraphs have been rarely explored in the current literature. One of the main technical challenges is that there is no structured sparsity-inducing norm that can directly model the space of connected subgraphs, and there is no exact implementation of a projection oracle for connected subgraphs due to its NP-hardness. In this paper, we explore efficient approximate projection oracles for connected subgraphs, and propose two new efficient algorithms, namely, Graph-IHT and Graph-GHTP, to optimize a generic nonlinear objective function subject to connectivity constraint on the support of the variables. Our proposed algorithms enjoy strong guarantees analogous to several current methods for sparsity-constrained optimization, such as Projected Gradient Descent (PGD), Approximate Model Iterative Hard Thresholding (AM-IHT), and Gradient Hard Thresholding Pursuit (GHTP) with respect to convergence rate and approximation accuracy. We apply our proposed algorithms to optimize several well-known graph scan statistics in several applications of connected subgraph detection as a case study, and the experimental results demonstrate that our proposed algorithms outperform state-of-the-art methods. |
Tasks | |
Published | 2016-09-30 |
URL | http://arxiv.org/abs/1609.09864v1 |
http://arxiv.org/pdf/1609.09864v1.pdf | |
PWC | https://paperswithcode.com/paper/technical-report-graph-structured-sparse |
Repo | |
Framework | |
Vehicle Detection from 3D Lidar Using Fully Convolutional Network
Title | Vehicle Detection from 3D Lidar Using Fully Convolutional Network |
Authors | Bo Li, Tianlei Zhang, Tian Xia |
Abstract | Convolutional network techniques have recently achieved great success in vision based detection tasks. This paper introduces the recent development of our research on transplanting the fully convolutional network technique to the detection tasks on 3D range scan data. Specifically, the scenario is set as the vehicle detection task from the range data of Velodyne 64E lidar. We proposes to present the data in a 2D point map and use a single 2D end-to-end fully convolutional network to predict the objectness confidence and the bounding boxes simultaneously. By carefully design the bounding box encoding, it is able to predict full 3D bounding boxes even using a 2D convolutional network. Experiments on the KITTI dataset shows the state-of-the-art performance of the proposed method. |
Tasks | |
Published | 2016-08-29 |
URL | http://arxiv.org/abs/1608.07916v1 |
http://arxiv.org/pdf/1608.07916v1.pdf | |
PWC | https://paperswithcode.com/paper/vehicle-detection-from-3d-lidar-using-fully |
Repo | |
Framework | |
End-to-End Training Approaches for Discriminative Segmental Models
Title | End-to-End Training Approaches for Discriminative Segmental Models |
Authors | Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu |
Abstract | Recent work on discriminative segmental models has shown that they can achieve competitive speech recognition performance, using features based on deep neural frame classifiers. However, segmental models can be more challenging to train than standard frame-based approaches. While some segmental models have been successfully trained end to end, there is a lack of understanding of their training under different settings and with different losses. We investigate a model class based on recent successful approaches, consisting of a linear model that combines segmental features based on an LSTM frame classifier. Similarly to hybrid HMM-neural network models, segmental models of this class can be trained in two stages (frame classifier training followed by linear segmental model weight training), end to end (joint training of both frame classifier and linear weights), or with end-to-end fine-tuning after two-stage training. We study segmental models trained end to end with hinge loss, log loss, latent hinge loss, and marginal log loss. We consider several losses for the case where training alignments are available as well as where they are not. We find that in general, marginal log loss provides the most consistent strong performance without requiring ground-truth alignments. We also find that training with dropout is very important in obtaining good performance with end-to-end training. Finally, the best results are typically obtained by a combination of two-stage training and fine-tuning. |
Tasks | Speech Recognition |
Published | 2016-10-21 |
URL | http://arxiv.org/abs/1610.06700v1 |
http://arxiv.org/pdf/1610.06700v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-training-approaches-for |
Repo | |
Framework | |
Back to Basics: Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness
Title | Back to Basics: Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness |
Authors | Jason J. Yu, Adam W. Harley, Konstantinos G. Derpanis |
Abstract | Recently, convolutional networks (convnets) have proven useful for predicting optical flow. Much of this success is predicated on the availability of large datasets that require expensive and involved data acquisition and laborious la- beling. To bypass these challenges, we propose an unsuper- vised approach (i.e., without leveraging groundtruth flow) to train a convnet end-to-end for predicting optical flow be- tween two images. We use a loss function that combines a data term that measures photometric constancy over time with a spatial term that models the expected variation of flow across the image. Together these losses form a proxy measure for losses based on the groundtruth flow. Empiri- cally, we show that a strong convnet baseline trained with the proposed unsupervised approach outperforms the same network trained with supervision on the KITTI dataset. |
Tasks | Optical Flow Estimation |
Published | 2016-08-20 |
URL | http://arxiv.org/abs/1608.05842v1 |
http://arxiv.org/pdf/1608.05842v1.pdf | |
PWC | https://paperswithcode.com/paper/back-to-basics-unsupervised-learning-of |
Repo | |
Framework | |