July 28, 2019

2916 words 14 mins read

Paper Group ANR 387

A Comparative Quantitative Analysis of Contemporary Big Data Clustering Algorithms for Market Segmentation in Hospitality Industry. Generative Bridging Network in Neural Sequence Prediction. Estimation of Graphlet Statistics. Cross-domain Semantic Parsing via Paraphrasing. Prototype Matching Networks for Large-Scale Multi-label Genomic Sequence Cla …

A Comparative Quantitative Analysis of Contemporary Big Data Clustering Algorithms for Market Segmentation in Hospitality Industry


Title	A Comparative Quantitative Analysis of Contemporary Big Data Clustering Algorithms for Market Segmentation in Hospitality Industry
Authors	Avishek Bose, Arslan Munir, Neda Shabani
Abstract	The hospitality industry is one of the data-rich industries that receives huge Volumes of data streaming at high Velocity with considerably Variety, Veracity, and Variability. These properties make the data analysis in the hospitality industry a big data problem. Meeting the customers’ expectations is a key factor in the hospitality industry to grasp the customers’ loyalty. To achieve this goal, marketing professionals in this industry actively look for ways to utilize their data in the best possible manner and advance their data analytic solutions, such as identifying a unique market segmentation clustering and developing a recommendation system. In this paper, we present a comprehensive literature review of existing big data clustering algorithms and their advantages and disadvantages for various use cases. We implement the existing big data clustering algorithms and provide a quantitative comparison of the performance of different clustering algorithms for different scenarios. We also present our insights and recommendations regarding the suitability of different big data clustering algorithms for different use cases. These recommendations will be helpful for hoteliers in selecting the appropriate market segmentation clustering algorithm for different clustering datasets to improve the customer experience and maximize the hotel revenue.
Tasks
Published	2017-09-18
URL	http://arxiv.org/abs/1709.06202v1
PDF	http://arxiv.org/pdf/1709.06202v1.pdf
PWC	https://paperswithcode.com/paper/a-comparative-quantitative-analysis-of
Repo
Framework

Generative Bridging Network in Neural Sequence Prediction


Title	Generative Bridging Network in Neural Sequence Prediction
Authors	Wenhu Chen, Guanlin Li, Shuo Ren, Shujie Liu, Zhirui Zhang, Mu Li, Ming Zhou
Abstract	In order to alleviate data sparsity and overfitting problems in maximum likelihood estimation (MLE) for sequence prediction tasks, we propose the Generative Bridging Network (GBN), in which a novel bridge module is introduced to assist the training of the sequence prediction model (the generator network). Unlike MLE directly maximizing the conditional likelihood, the bridge extends the point-wise ground truth to a bridge distribution conditioned on it, and the generator is optimized to minimize their KL-divergence. Three different GBNs, namely uniform GBN, language-model GBN and coaching GBN, are proposed to penalize confidence, enhance language smoothness and relieve learning burden. Experiments conducted on two recognized sequence prediction tasks (machine translation and abstractive text summarization) show that our proposed GBNs can yield significant improvements over strong baselines. Furthermore, by analyzing samples drawn from different bridges, expected influences on the generator are verified.
Tasks	Abstractive Text Summarization, Language Modelling, Machine Translation, Text Summarization
Published	2017-06-28
URL	http://arxiv.org/abs/1706.09152v6
PDF	http://arxiv.org/pdf/1706.09152v6.pdf
PWC	https://paperswithcode.com/paper/generative-bridging-network-in-neural
Repo
Framework

Estimation of Graphlet Statistics


Title	Estimation of Graphlet Statistics
Authors	Ryan A. Rossi, Rong Zhou, Nesreen K. Ahmed
Abstract	Graphlets are induced subgraphs of a large network and are important for understanding and modeling complex networks. Despite their practical importance, graphlets have been severely limited to applications and domains with relatively small graphs. Most previous work has focused on exact algorithms, however, it is often too expensive to compute graphlets exactly in massive networks with billions of edges, and finding an approximate count is usually sufficient for many applications. In this work, we propose an unbiased graphlet estimation framework that is (a) fast with significant speedups compared to the state-of-the-art, (b) parallel with nearly linear-speedups, (c) accurate with <1% relative error, (d) scalable and space-efficient for massive networks with billions of edges, and (e) flexible for a variety of real-world settings, as well as estimating macro and micro-level graphlet statistics (e.g., counts) of both connected and disconnected graphlets. In addition, an adaptive approach is introduced that finds the smallest sample size required to obtain estimates within a given user-defined error bound. On 300 networks from 20 domains, we obtain <1% relative error for all graphlets. This is significantly more accurate than existing methods while using less data. Moreover, it takes a few seconds on billion edge graphs (as opposed to days/weeks). These are by far the largest graphlet computations to date.
Tasks
Published	2017-01-06
URL	http://arxiv.org/abs/1701.01772v2
PDF	http://arxiv.org/pdf/1701.01772v2.pdf
PWC	https://paperswithcode.com/paper/estimation-of-graphlet-statistics
Repo
Framework

Cross-domain Semantic Parsing via Paraphrasing


Title	Cross-domain Semantic Parsing via Paraphrasing
Authors	Yu Su, Xifeng Yan
Abstract	Existing studies on semantic parsing mainly focus on the in-domain setting. We formulate cross-domain semantic parsing as a domain adaptation problem: train a semantic parser on some source domains and then adapt it to the target domain. Due to the diversity of logical forms in different domains, this problem presents unique and intriguing challenges. By converting logical forms into canonical utterances in natural language, we reduce semantic parsing to paraphrasing, and develop an attentive sequence-to-sequence paraphrase model that is general and flexible to adapt to different domains. We discover two problems, small micro variance and large macro variance, of pre-trained word embeddings that hinder their direct use in neural networks, and propose standardization techniques as a remedy. On the popular Overnight dataset, which contains eight domains, we show that both cross-domain training and standardized pre-trained word embeddings can bring significant improvement.
Tasks	Domain Adaptation, Semantic Parsing, Word Embeddings
Published	2017-04-20
URL	http://arxiv.org/abs/1704.05974v2
PDF	http://arxiv.org/pdf/1704.05974v2.pdf
PWC	https://paperswithcode.com/paper/cross-domain-semantic-parsing-via
Repo
Framework

Prototype Matching Networks for Large-Scale Multi-label Genomic Sequence Classification


Title	Prototype Matching Networks for Large-Scale Multi-label Genomic Sequence Classification
Authors	Jack Lanchantin, Arshdeep Sekhon, Ritambhara Singh, Yanjun Qi
Abstract	One of the fundamental tasks in understanding genomics is the problem of predicting Transcription Factor Binding Sites (TFBSs). With more than hundreds of Transcription Factors (TFs) as labels, genomic-sequence based TFBS prediction is a challenging multi-label classification task. There are two major biological mechanisms for TF binding: (1) sequence-specific binding patterns on genomes known as “motifs” and (2) interactions among TFs known as co-binding effects. In this paper, we propose a novel deep architecture, the Prototype Matching Network (PMN) to mimic the TF binding mechanisms. Our PMN model automatically extracts prototypes (“motif”-like features) for each TF through a novel prototype-matching loss. Borrowing ideas from few-shot matching models, we use the notion of support set of prototypes and an LSTM to learn how TFs interact and bind to genomic sequences. On a reference TFBS dataset with $2.1$ $million$ genomic sequences, PMN significantly outperforms baselines and validates our design choices empirically. To our knowledge, this is the first deep learning architecture that introduces prototype learning and considers TF-TF interactions for large-scale TFBS prediction. Not only is the proposed architecture accurate, but it also models the underlying biology.
Tasks	Multi-Label Classification
Published	2017-10-30
URL	http://arxiv.org/abs/1710.11238v2
PDF	http://arxiv.org/pdf/1710.11238v2.pdf
PWC	https://paperswithcode.com/paper/prototype-matching-networks-for-large-scale
Repo
Framework

Controllable Top-down Feature Transformer


Title	Controllable Top-down Feature Transformer
Authors	Zhiwei Jia, Haoshen Hong, Siyang Wang, Kwonjoon Lee, Zhuowen Tu
Abstract	We study the intrinsic transformation of feature maps across convolutional network layers with explicit top-down control. To this end, we develop top-down feature transformer (TFT), under controllable parameters, that are able to account for the hidden layer transformation while maintaining the overall consistency across layers. The learned generators capture the underlying feature transformation processes that are independent of particular training images. Our proposed TFT framework brings insights to and helps the understanding of, an important problem of studying the CNN internal feature representation and transformation under the top-down processes. In the case of spatial transformations, we demonstrate the significant advantage of TFT over existing data-driven approaches in building data-independent transformations. We also show that it can be adopted in other applications such as data augmentation and image style transfer.
Tasks	Data Augmentation, Style Transfer
Published	2017-12-06
URL	http://arxiv.org/abs/1712.02400v4
PDF	http://arxiv.org/pdf/1712.02400v4.pdf
PWC	https://paperswithcode.com/paper/controllable-top-down-feature-transformer
Repo
Framework

Cross-validation failure: small sample sizes lead to large error bars


Title	Cross-validation failure: small sample sizes lead to large error bars
Authors	Gaël Varoquaux
Abstract	Predictive models ground many state-of-the-art developments in statistical brain image analysis: decoding, MVPA, searchlight, or extraction of biomarkers. The principled approach to establish their validity and usefulness is cross-validation, testing prediction on unseen data. Here, I would like to raise awareness on error bars of cross-validation, which are often underestimated. Simple experiments show that sample sizes of many neuroimaging studies inherently lead to large error bars, eg $\pm$10% for 100 samples. The standard error across folds strongly underestimates them. These large error bars compromise the reliability of conclusions drawn with predictive models, such as biomarkers or methods developments where, unlike with cognitive neuroimaging MVPA approaches, more samples cannot be acquired by repeating the experiment across many subjects. Solutions to increase sample size must be investigated, tackling possible increases in heterogeneity of the data.
Tasks
Published	2017-06-23
URL	http://arxiv.org/abs/1706.07581v1
PDF	http://arxiv.org/pdf/1706.07581v1.pdf
PWC	https://paperswithcode.com/paper/cross-validation-failure-small-sample-sizes
Repo
Framework

Improved Human Emotion Recognition Using Symmetry of Facial Key Points with Dihedral Group


Title	Improved Human Emotion Recognition Using Symmetry of Facial Key Points with Dihedral Group
Authors	Mehdi Ghayoumi, Arvind Bansal
Abstract	This article describes how to deploy dihedral group theory to detect Facial Key Points (FKP) symmetry to recognize emotions. The method can be applied in many other areas which those have the same data texture.
Tasks	Emotion Recognition
Published	2017-04-14
URL	http://arxiv.org/abs/1706.07757v1
PDF	http://arxiv.org/pdf/1706.07757v1.pdf
PWC	https://paperswithcode.com/paper/improved-human-emotion-recognition-using
Repo
Framework

Advances in Artificial Intelligence Require Progress Across all of Computer Science


Title	Advances in Artificial Intelligence Require Progress Across all of Computer Science
Authors	Gregory D. Hager, Randal Bryant, Eric Horvitz, Maja Mataric, Vasant Honavar
Abstract	Advances in Artificial Intelligence require progress across all of computer science.
Tasks
Published	2017-07-13
URL	http://arxiv.org/abs/1707.04352v1
PDF	http://arxiv.org/pdf/1707.04352v1.pdf
PWC	https://paperswithcode.com/paper/advances-in-artificial-intelligence-require
Repo
Framework

Reflection Invariant and Symmetry Detection


Title	Reflection Invariant and Symmetry Detection
Authors	Erbo Li, Hua Li
Abstract	Symmetry detection and discrimination are of fundamental meaning in science, technology, and engineering. This paper introduces reflection invariants and defines the directional moment to detect symmetry for shape analysis and object recognition. And it demonstrates that detection of reflection symmetry can be done in a simple way by solving a trigonometric system derived from the directional moment, and discrimination of reflection symmetry can be achieved by application of the reflection invariants in 2D and 3D. Rotation symmetry can also be determined based on that.The experiments in 2D and 3D, including the regular triangle, the square, and the five Platonic objects, show that all the reflection lines or planes can be deterministically found using directional moments up to order six. This result can be used to simplify the efforts of symmetry detection in research areas, such as protein structure, model retrieval, inverse engineering, and machine vision etc.
Tasks	Object Recognition
Published	2017-05-30
URL	http://arxiv.org/abs/1705.10768v2
PDF	http://arxiv.org/pdf/1705.10768v2.pdf
PWC	https://paperswithcode.com/paper/reflection-invariant-and-symmetry-detection
Repo
Framework

Paving the Roadway for Safety of Automated Vehicles: An Empirical Study on Testing Challenges


Title	Paving the Roadway for Safety of Automated Vehicles: An Empirical Study on Testing Challenges
Authors	Alessia Knauss, Jan Schröder, Christian Berger, Henrik Eriksson
Abstract	The technology in the area of automated vehicles is gaining speed and promises many advantages. However, with the recent introduction of conditionally automated driving, we have also seen accidents. Test protocols for both, conditionally automated (e.g., on highways) and automated vehicles do not exist yet and leave researchers and practitioners with different challenges. For instance, current test procedures do not suffice for fully automated vehicles, which are supposed to be completely in charge for the driving task and have no driver as a back up. This paper presents current challenges of testing the functionality and safety of automated vehicles derived from conducting focus groups and interviews with 26 participants from five countries having a background related to testing automotive safety-related topics.We provide an overview of the state-of-practice of testing active safety features as well as challenges that needs to be addressed in the future to ensure safety for automated vehicles. The major challenges identified through the interviews and focus groups, enriched by literature on this topic are related to 1) virtual testing and simulation, 2) safety, reliability, and quality, 3) sensors and sensor models, 4) required scenario complexity and amount of test cases, and 5) handover of responsibility between the driver and the vehicle.
Tasks
Published	2017-05-09
URL	http://arxiv.org/abs/1708.06988v1
PDF	http://arxiv.org/pdf/1708.06988v1.pdf
PWC	https://paperswithcode.com/paper/paving-the-roadway-for-safety-of-automated
Repo
Framework

A Quasi-isometric Embedding Algorithm


Title	A Quasi-isometric Embedding Algorithm
Authors	David W. Dreisigmeyer
Abstract	The Whitney embedding theorem gives an upper bound on the smallest embedding dimension of a manifold. If a data set lies on a manifold, a random projection into this reduced dimension will retain the manifold structure. Here we present an algorithm to find a projection that distorts the data as little as possible.
Tasks
Published	2017-09-06
URL	http://arxiv.org/abs/1709.01972v3
PDF	http://arxiv.org/pdf/1709.01972v3.pdf
PWC	https://paperswithcode.com/paper/a-quasi-isometric-embedding-algorithm
Repo
Framework

autoBagging: Learning to Rank Bagging Workflows with Metalearning


Title	autoBagging: Learning to Rank Bagging Workflows with Metalearning
Authors	Fábio Pinto, Vítor Cerqueira, Carlos Soares, João Mendes-Moreira
Abstract	Machine Learning (ML) has been successfully applied to a wide range of domains and applications. One of the techniques behind most of these successful applications is Ensemble Learning (EL), the field of ML that gave birth to methods such as Random Forests or Boosting. The complexity of applying these techniques together with the market scarcity on ML experts, has created the need for systems that enable a fast and easy drop-in replacement for ML libraries. Automated machine learning (autoML) is the field of ML that attempts to answers these needs. Typically, these systems rely on optimization techniques such as bayesian optimization to lead the search for the best model. Our approach differs from these systems by making use of the most recent advances on metalearning and a learning to rank approach to learn from metadata. We propose autoBagging, an autoML system that automatically ranks 63 bagging workflows by exploiting past performance and dataset characterization. Results on 140 classification datasets from the OpenML platform show that autoBagging can yield better performance than the Average Rank method and achieve results that are not statistically different from an ideal model that systematically selects the best workflow for each dataset. For the purpose of reproducibility and generalizability, autoBagging is publicly available as an R package on CRAN.
Tasks	AutoML, Learning-To-Rank
Published	2017-06-28
URL	http://arxiv.org/abs/1706.09367v1
PDF	http://arxiv.org/pdf/1706.09367v1.pdf
PWC	https://paperswithcode.com/paper/autobagging-learning-to-rank-bagging
Repo
Framework

Ranking to Learn and Learning to Rank: On the Role of Ranking in Pattern Recognition Applications


Title	Ranking to Learn and Learning to Rank: On the Role of Ranking in Pattern Recognition Applications
Authors	Giorgio Roffo
Abstract	The last decade has seen a revolution in the theory and application of machine learning and pattern recognition. Through these advancements, variable ranking has emerged as an active and growing research area and it is now beginning to be applied to many new problems. The rationale behind this fact is that many pattern recognition problems are by nature ranking problems. The main objective of a ranking algorithm is to sort objects according to some criteria, so that, the most relevant items will appear early in the produced result list. Ranking methods can be analyzed from two different methodological perspectives: ranking to learn and learning to rank. The former aims at studying methods and techniques to sort objects for improving the accuracy of a machine learning model. Enhancing a model performance can be challenging at times. For example, in pattern classification tasks, different data representations can complicate and hide the different explanatory factors of variation behind the data. In particular, hand-crafted features contain many cues that are either redundant or irrelevant, which turn out to reduce the overall accuracy of the classifier. In such a case feature selection is used, that, by producing ranked lists of features, helps to filter out the unwanted information. Moreover, in real-time systems (e.g., visual trackers) ranking approaches are used as optimization procedures which improve the robustness of the system that deals with the high variability of the image streams that change over time. The other way around, learning to rank is necessary in the construction of ranking models for information retrieval, biometric authentication, re-identification, and recommender systems. In this context, the ranking model’s purpose is to sort objects according to their degrees of relevance, importance, or preference as defined in the specific application.
Tasks	Feature Selection, Information Retrieval, Learning-To-Rank, Recommendation Systems
Published	2017-06-01
URL	http://arxiv.org/abs/1706.05933v1
PDF	http://arxiv.org/pdf/1706.05933v1.pdf
PWC	https://paperswithcode.com/paper/ranking-to-learn-and-learning-to-rank-on-the
Repo
Framework

Transfer Learning by Ranking for Weakly Supervised Object Annotation


Title	Transfer Learning by Ranking for Weakly Supervised Object Annotation
Authors	Zhiyuan Shi, Parthipan Siva, Tao Xiang
Abstract	Most existing approaches to training object detectors rely on fully supervised learning, which requires the tedious manual annotation of object location in a training set. Recently there has been an increasing interest in developing weakly supervised approach to detector training where the object location is not manually annotated but automatically determined based on binary (weak) labels indicating if a training image contains the object. This is a challenging problem because each image can contain many candidate object locations which partially overlaps the object of interest. Existing approaches focus on how to best utilise the binary labels for object location annotation. In this paper we propose to solve this problem from a very different perspective by casting it as a transfer learning problem. Specifically, we formulate a novel transfer learning based on learning to rank, which effectively transfers a model for automatic annotation of object location from an auxiliary dataset to a target dataset with completely unrelated object categories. We show that our approach outperforms existing state-of-the-art weakly supervised approach to annotating objects in the challenging VOC dataset.
Tasks	Learning-To-Rank, Transfer Learning
Published	2017-05-02
URL	http://arxiv.org/abs/1705.00873v1
PDF	http://arxiv.org/pdf/1705.00873v1.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-by-ranking-for-weakly
Repo
Framework