Paper Group ANR 1094
Untangling the GDPR Using ConRelMiner. Bayesian Semantic Instance Segmentation in Open Set World. Diffusion Approximations for Online Principal Component Estimation and Global Convergence. Mixed Integer Linear Programming for Feature Selection in Support Vector Machine. Extracting Actionable Knowledge from Domestic Violence Discourses on Social Med …
Untangling the GDPR Using ConRelMiner
Title | Untangling the GDPR Using ConRelMiner |
Authors | Karolin Winter, Stefanie Rinderle-Ma |
Abstract | The General Data Protection Regulation (GDPR) poses enormous challenges on companies and organizations with respect to understanding, implementing, and maintaining the contained constraints. We report on how the ConRelMiner method can be used for untangling the GDPR. For this, the GDPR is filtered and grouped along the roles mentioned by the GDPR and the reduction of sentences to be read by analysts is shown. Moreover, the output of the ConRelMiner - a cluster graph with relations between the sentences - is displayed and interpreted. Overall the goal is to illustrate how the effort for implementing the GDPR can be reduced and a structured and meaningful representation of the relevant GDPR sentences can be found. |
Tasks | |
Published | 2018-11-08 |
URL | http://arxiv.org/abs/1811.03399v1 |
http://arxiv.org/pdf/1811.03399v1.pdf | |
PWC | https://paperswithcode.com/paper/untangling-the-gdpr-using-conrelminer |
Repo | |
Framework | |
Bayesian Semantic Instance Segmentation in Open Set World
Title | Bayesian Semantic Instance Segmentation in Open Set World |
Authors | Trung Pham, Vijay Kumar B G, Thanh-Toan Do, Gustavo Carneiro, Ian Reid |
Abstract | This paper addresses the semantic instance segmentation task in the open-set conditions, where input images can contain known and unknown object classes. The training process of existing semantic instance segmentation methods requires annotation masks for all object instances, which is expensive to acquire or even infeasible in some realistic scenarios, where the number of categories may increase boundlessly. In this paper, we present a novel open-set semantic instance segmentation approach capable of segmenting all known and unknown object classes in images, based on the output of an object detector trained on known object classes. We formulate the problem using a Bayesian framework, where the posterior distribution is approximated with a simulated annealing optimization equipped with an efficient image partition sampler. We show empirically that our method is competitive with state-of-the-art supervised methods on known classes, but also performs well on unknown classes when compared with unsupervised methods. |
Tasks | Instance Segmentation, Semantic Segmentation |
Published | 2018-06-04 |
URL | http://arxiv.org/abs/1806.00911v2 |
http://arxiv.org/pdf/1806.00911v2.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-semantic-instance-segmentation-in |
Repo | |
Framework | |
Diffusion Approximations for Online Principal Component Estimation and Global Convergence
Title | Diffusion Approximations for Online Principal Component Estimation and Global Convergence |
Authors | Chris Junchi Li, Mengdi Wang, Han Liu, Tong Zhang |
Abstract | In this paper, we propose to adopt the diffusion approximation tools to study the dynamics of Oja’s iteration which is an online stochastic gradient descent method for the principal component analysis. Oja’s iteration maintains a running estimate of the true principal component from streaming data and enjoys less temporal and spatial complexities. We show that the Oja’s iteration for the top eigenvector generates a continuous-state discrete-time Markov chain over the unit sphere. We characterize the Oja’s iteration in three phases using diffusion approximation and weak convergence tools. Our three-phase analysis further provides a finite-sample error bound for the running estimate, which matches the minimax information lower bound for principal component analysis under the additional assumption of bounded samples. |
Tasks | |
Published | 2018-08-29 |
URL | http://arxiv.org/abs/1808.09645v1 |
http://arxiv.org/pdf/1808.09645v1.pdf | |
PWC | https://paperswithcode.com/paper/diffusion-approximations-for-online-principal |
Repo | |
Framework | |
Mixed Integer Linear Programming for Feature Selection in Support Vector Machine
Title | Mixed Integer Linear Programming for Feature Selection in Support Vector Machine |
Authors | Martine Labbé, Luisa I. Martínez-Merino, Antonio M. Rodríguez-Chía |
Abstract | This work focuses on support vector machine (SVM) with feature selection. A MILP formulation is proposed for the problem. The choice of suitable features to construct the separating hyperplanes has been modelled in this formulation by including a budget constraint that sets in advance a limit on the number of features to be used in the classification process. We propose both an exact and a heuristic procedure to solve this formulation in an efficient way. Finally, the validation of the model is done by checking it with some well-known data sets and comparing it with classical classification methods. |
Tasks | Feature Selection |
Published | 2018-08-07 |
URL | http://arxiv.org/abs/1808.02435v1 |
http://arxiv.org/pdf/1808.02435v1.pdf | |
PWC | https://paperswithcode.com/paper/mixed-integer-linear-programming-for-feature |
Repo | |
Framework | |
Extracting Actionable Knowledge from Domestic Violence Discourses on Social Media
Title | Extracting Actionable Knowledge from Domestic Violence Discourses on Social Media |
Authors | Sudha Subramani, Manjula O’Connor |
Abstract | Domestic Violence (DV) is considered as big social issue and there exists a strong relationship between DV and health impacts of the public. Existing research studies have focused on social media to track and analyse real world events like emerging trends, natural disasters, user sentiment analysis, political opinions, and health care. However there is less attention given on social welfare issues like DV and its impact on public health. Recently, the victims of DV turned to social media platforms to express their feelings in the form of posts and seek the social and emotional support, for sympathetic encouragement, to show compassion and empathy among public. But, it is difficult to mine the actionable knowledge from large conversational datasets from social media due to the characteristics of high dimensions, short, noisy, huge volume, high velocity, and so on. Hence, this paper will propose a novel framework to model and discover the various themes related to DV from the public domain. The proposed framework would possibly provide unprecedentedly valuable information to the public health researchers, national family health organizations, government and public with data enrichment and consolidation to improve the social welfare of the community. Thus provides actionable knowledge by monitoring and analysing continuous and rich user generated content. |
Tasks | Sentiment Analysis |
Published | 2018-07-05 |
URL | http://arxiv.org/abs/1807.02391v1 |
http://arxiv.org/pdf/1807.02391v1.pdf | |
PWC | https://paperswithcode.com/paper/extracting-actionable-knowledge-from-domestic |
Repo | |
Framework | |
PEYMA: A Tagged Corpus for Persian Named Entities
Title | PEYMA: A Tagged Corpus for Persian Named Entities |
Authors | Mahsa Sadat Shahshahani, Mahdi Mohseni, Azadeh Shakery, Heshaam Faili |
Abstract | The goal in the NER task is to classify proper nouns of a text into classes such as person, location, and organization. This is an important preprocessing step in many NLP tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art NER systems have reached performances of higher than 90 percent in terms of F1 measure, there are very few research studies for this task in Persian. One of the main important causes of this may be the lack of a standard Persian NER dataset to train and test NER systems. In this research we create a standard, big-enough tagged Persian NER dataset which will be distributed for free for research purposes. In order to construct such a standard dataset, we studied standard NER datasets which are constructed for English researches and found out that almost all of these datasets are constructed using news texts. So we collected documents from ten news websites. Later, in order to provide annotators with some guidelines to tag these documents, after studying guidelines used for constructing CoNLL and MUC standard English datasets, we set our own guidelines considering the Persian linguistic rules. |
Tasks | Question Answering |
Published | 2018-01-30 |
URL | http://arxiv.org/abs/1801.09936v1 |
http://arxiv.org/pdf/1801.09936v1.pdf | |
PWC | https://paperswithcode.com/paper/peyma-a-tagged-corpus-for-persian-named |
Repo | |
Framework | |
The persistence landscape and some of its properties
Title | The persistence landscape and some of its properties |
Authors | Peter Bubenik |
Abstract | Persistence landscapes map persistence diagrams into a function space, which may often be taken to be a Banach space or even a Hilbert space. In the latter case, it is a feature map and there is an associated kernel. The main advantage of this summary is that it allows one to apply tools from statistics and machine learning. Furthermore, the mapping from persistence diagrams to persistence landscapes is stable and invertible. We introduce a weighted version of the persistence landscape and define a one-parameter family of Poisson-weighted persistence landscape kernels that may be useful for learning. We also demonstrate some additional properties of the persistence landscape. First, the persistence landscape may be viewed as a tropical rational function. Second, in many cases it is possible to exactly reconstruct all of the component persistence diagrams from an average persistence landscape. It follows that the persistence landscape kernel is characteristic for certain generic empirical measures. Finally, the persistence landscape distance may be arbitrarily small compared to the interleaving distance. |
Tasks | |
Published | 2018-10-11 |
URL | http://arxiv.org/abs/1810.04963v2 |
http://arxiv.org/pdf/1810.04963v2.pdf | |
PWC | https://paperswithcode.com/paper/the-persistence-landscape-and-some-of-its |
Repo | |
Framework | |
Semantic Parsing: Syntactic assurance to target sentence using LSTM Encoder CFG-Decoder
Title | Semantic Parsing: Syntactic assurance to target sentence using LSTM Encoder CFG-Decoder |
Authors | Fabiano Ferreira Luz, Marcelo Finger |
Abstract | Semantic parsing can be defined as the process of mapping natural language sentences into a machine interpretable, formal representation of its meaning. Semantic parsing using LSTM encoder-decoder neural networks have become promising approach. However, human automated translation of natural language does not provide grammaticality guarantees for the sentences generate such a guarantee is particularly important for practical cases where a data base query can cause critical errors if the sentence is ungrammatical. In this work, we propose an neural architecture called Encoder CFG-Decoder, whose output conforms to a given context-free grammar. Results are show for any implementation of such architecture display its correctness and providing benchmark accuracy levels better than the literature. |
Tasks | Semantic Parsing |
Published | 2018-07-18 |
URL | http://arxiv.org/abs/1807.07108v1 |
http://arxiv.org/pdf/1807.07108v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-parsing-syntactic-assurance-to |
Repo | |
Framework | |
Generic CP-Supported CMSA for Binary Integer Linear Programs
Title | Generic CP-Supported CMSA for Binary Integer Linear Programs |
Authors | Christian Blum, Haroldo Gambini Santos |
Abstract | Construct, Merge, Solve and Adapt (CMSA) is a general hybrid metaheuristic for solving combinatorial optimization problems. At each iteration, CMSA (1) constructs feasible solutions to the tackled problem instance in a probabilistic way and (2) solves a reduced problem instance (if possible) to optimality. The construction of feasible solutions is hereby problem-specific, usually involving a fast greedy heuristic. The goal of this paper is to design a problem-agnostic CMSA variant whose exclusive input is an integer linear program (ILP). In order to reduce the complexity of this task, the current study is restricted to binary ILPs. In addition to a basic problem-agnostic CMSA variant, we also present an extended version that makes use of a constraint propagation engine for constructing solutions. The results show that our technique is able to match the upper bounds of the standalone application of CPLEX in the context of rather easy-to-solve instances, while it generally outperforms the standalone application of CPLEX in the context of hard instances. Moreover, the results indicate that the support of the constraint propagation engine is useful in the context of problems for which finding feasible solutions is rather difficult. |
Tasks | Combinatorial Optimization |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.11820v1 |
http://arxiv.org/pdf/1805.11820v1.pdf | |
PWC | https://paperswithcode.com/paper/generic-cp-supported-cmsa-for-binary-integer |
Repo | |
Framework | |
Improving Traffic Safety Through Video Analysis in Jakarta, Indonesia
Title | Improving Traffic Safety Through Video Analysis in Jakarta, Indonesia |
Authors | João Caldeira, Alex Fout, Aniket Kesari, Raesetje Sefala, Joseph Walsh, Katy Dupre, Muhammad Rizal Khaefi, Setiaji, George Hodge, Zakiya Aryana Pramestri, Muhammad Adib Imtiyazi |
Abstract | This project presents the results of a partnership between the Data Science for Social Good fellowship, Jakarta Smart City and Pulse Lab Jakarta to create a video analysis pipeline for the purpose of improving traffic safety in Jakarta. The pipeline transforms raw traffic video footage into databases that are ready to be used for traffic analysis. By analyzing these patterns, the city of Jakarta will better understand how human behavior and built infrastructure contribute to traffic challenges and safety risks. The results of this work should also be broadly applicable to smart city initiatives around the globe as they improve urban planning and sustainability through data science approaches. |
Tasks | |
Published | 2018-11-30 |
URL | http://arxiv.org/abs/1812.01106v1 |
http://arxiv.org/pdf/1812.01106v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-traffic-safety-through-video |
Repo | |
Framework | |
A Simple Method to improve Initialization Robustness for Active Contours driven by Local Region Fitting Energy
Title | A Simple Method to improve Initialization Robustness for Active Contours driven by Local Region Fitting Energy |
Authors | Keyan Ding, Linfang Xiao |
Abstract | Active contour models based on local region fitting energy can segment images with intensity inhomogeneity effectively, but their segmentation results are easy to error if the initial contour is inappropriate. In this paper, we present a simple and universal method of improving the robustness of initial contour for these local fitting-based models. The core idea of proposed method is exchanging the fitting values on the two sides of contour, so that the fitting values inside the contour are always larger (or smaller) than the values outside the contour in the process of curve evolution. In this way, the whole curve will evolve along the inner (or outer) boundaries of object, and less likely to be stuck in the object or background. Experimental results have proved that using the proposed method can enhance the robustness of initial contour and meanwhile keep the original advantages in the local fitting-based models. |
Tasks | |
Published | 2018-02-28 |
URL | http://arxiv.org/abs/1802.10437v2 |
http://arxiv.org/pdf/1802.10437v2.pdf | |
PWC | https://paperswithcode.com/paper/a-simple-method-to-improve-initialization |
Repo | |
Framework | |
On the inherent competition between valid and spurious inductive inferences in Boolean data
Title | On the inherent competition between valid and spurious inductive inferences in Boolean data |
Authors | M. Andrecut |
Abstract | Inductive inference is the process of extracting general rules from specific observations. This problem also arises in the analysis of biological networks, such as genetic regulatory networks, where the interactions are complex and the observations are incomplete. A typical task in these problems is to extract general interaction rules as combinations of Boolean covariates, that explain a measured response variable. The inductive inference process can be considered as an incompletely specified Boolean function synthesis problem. This incompleteness of the problem will also generate spurious inferences, which are a serious threat to valid inductive inference rules. Using random Boolean data as a null model, here we attempt to measure the competition between valid and spurious inductive inference rules from a given data set. We formulate two greedy search algorithms, which synthesize a given Boolean response variable in a sparse disjunct normal form, and respectively a sparse generalized algebraic normal form of the variables from the observation data, and we evaluate numerically their performance. |
Tasks | |
Published | 2018-01-06 |
URL | http://arxiv.org/abs/1801.02068v1 |
http://arxiv.org/pdf/1801.02068v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-inherent-competition-between-valid-and |
Repo | |
Framework | |
End-To-End Alzheimer’s Disease Diagnosis and Biomarker Identification
Title | End-To-End Alzheimer’s Disease Diagnosis and Biomarker Identification |
Authors | Soheil Esmaeilzadeh, Dimitrios Ioannis Belivanis, Kilian M. Pohl, Ehsan Adeli |
Abstract | As shown in computer vision, the power of deep learning lies in automatically learning relevant and powerful features for any perdition task, which is made possible through end-to-end architectures. However, deep learning approaches applied for classifying medical images do not adhere to this architecture as they rely on several pre- and post-processing steps. This shortcoming can be explained by the relatively small number of available labeled subjects, the high dimensionality of neuroimaging data, and difficulties in interpreting the results of deep learning methods. In this paper, we propose a simple 3D Convolutional Neural Networks and exploit its model parameters to tailor the end-to-end architecture for the diagnosis of Alzheimer’s disease (AD). Our model can diagnose AD with an accuracy of 94.1% on the popular ADNI dataset using only MRI data, which outperforms the previous state-of-the-art. Based on the learned model, we identify the disease biomarkers, the results of which were in accordance with the literature. We further transfer the learned model to diagnose mild cognitive impairment (MCI), the prodromal stage of AD, which yield better results compared to other methods. |
Tasks | |
Published | 2018-10-01 |
URL | http://arxiv.org/abs/1810.00523v1 |
http://arxiv.org/pdf/1810.00523v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-alzheimers-disease-diagnosis-and |
Repo | |
Framework | |
Video Event Recognition and Anomaly Detection by Combining Gaussian Process and Hierarchical Dirichlet Process Models
Title | Video Event Recognition and Anomaly Detection by Combining Gaussian Process and Hierarchical Dirichlet Process Models |
Authors | Michael Ying Yang, Wentong Liao, Yanpeng Cao, Bodo Rosenhahn |
Abstract | In this paper, we present an unsupervised learning framework for analyzing activities and interactions in surveillance videos. In our framework, three levels of video events are connected by Hierarchical Dirichlet Process (HDP) model: low-level visual features, simple atomic activities, and multi-agent interactions. Atomic activities are represented as distribution of low-level features, while complicated interactions are represented as distribution of atomic activities. This learning process is unsupervised. Given a training video sequence, low-level visual features are extracted based on optic flow and then clustered into different atomic activities and video clips are clustered into different interactions. The HDP model automatically decide the number of clusters, i.e. the categories of atomic activities and interactions. Based on the learned atomic activities and interactions, a training dataset is generated to train the Gaussian Process (GP) classifier. Then the trained GP models work in newly captured video to classify interactions and detect abnormal events in real time. Furthermore, the temporal dependencies between video events learned by HDP-Hidden Markov Models (HMM) are effectively integrated into GP classifier to enhance the accuracy of the classification in newly captured videos. Our framework couples the benefits of the generative model (HDP) with the discriminant model (GP). We provide detailed experiments showing that our framework enjoys favorable performance in video event classification in real-time in a crowded traffic scene. |
Tasks | Anomaly Detection |
Published | 2018-02-09 |
URL | http://arxiv.org/abs/1802.03257v1 |
http://arxiv.org/pdf/1802.03257v1.pdf | |
PWC | https://paperswithcode.com/paper/video-event-recognition-and-anomaly-detection |
Repo | |
Framework | |
Representation based and Attention augmented Meta learning
Title | Representation based and Attention augmented Meta learning |
Authors | Yunxiao Qin, Chenxu Zhao, Zezheng Wang, Junliang Xing, Jun Wan, Zhen Lei |
Abstract | Deep learning based computer vision fails to work when labeled images are scarce. Recently, Meta learning algorithm has been confirmed as a promising way to improve the ability of learning from few images for computer vision. However, previous Meta learning approaches expose problems: 1) they ignored the importance of attention mechanism for the Meta learner; 2) they didn’t give the Meta learner the ability of well using the past knowledge which can help to express images into high representations, resulting in that the Meta learner has to solve few shot learning task directly from the original high dimensional RGB images. In this paper, we argue that the attention mechanism and the past knowledge are crucial for the Meta learner, and the Meta learner should be trained on high representations of the RGB images instead of directly on the original ones. Based on these arguments, we propose two methods: Attention augmented Meta Learning (AML) and Representation based and Attention augmented Meta Learning(RAML). The method AML aims to improve the Meta learner’s attention ability by explicitly embedding an attention model into its network. The method RAML aims to give the Meta learner the ability of leveraging the past learned knowledge to reduce the dimension of the original input data by expressing it into high representations, and help the Meta learner to perform well. Extensive experiments demonstrate the effectiveness of the proposed models, with state-of-the-art few shot learning performances on several few shot learning benchmarks. The source code of our proposed methods will be released soon to facilitate further studies on those aforementioned problem. |
Tasks | Few-Shot Learning, Meta-Learning |
Published | 2018-11-19 |
URL | http://arxiv.org/abs/1811.07545v3 |
http://arxiv.org/pdf/1811.07545v3.pdf | |
PWC | https://paperswithcode.com/paper/representation-based-and-attention-augmented |
Repo | |
Framework | |