April 2, 2020

3033 words 15 mins read

Paper Group ANR 244

Paper Group ANR 244

Complexity Measures and Features for Times Series classification. Clustering based on Point-Set Kernel. StochasticRank: Global Optimization of Scale-Free Discrete Functions. ConBO: Conditional Bayesian Optimization. Applications of deep learning in stock market prediction: recent progress. Segmenting Unseen Industrial Components in a Heavy Clutter …

Complexity Measures and Features for Times Series classification

Title Complexity Measures and Features for Times Series classification
Authors Francisco J. Baldán, José M. Benítez
Abstract Classification of time series is a growing problem in different disciplines due to the progressive digitalization of the world. Currently, the state of the art in time series classification is dominated by Collective of Transformation-Based Ensembles. This algorithm is composed of several classifiers of diverse nature that are combined according to their results in an internal cross validation procedure. Its high complexity prevents it from being applied to large datasets. One Nearest Neighbours with Dynamic Time Warping remains the base classifier in any time series classification problem, for its simplicity and good results. Despite their good performance, they share a weakness, which is that they are not interpretable. In the field of time series classification, there is a tradeoff between accuracy and interpretability. In this work, we propose a set of characteristics capable of extracting information of the structure of the time series in order to face time series classification problems. The use of these characteristics allows the use of traditional classification algorithms in time series problems. The experimental results demonstrate a statistically significant improvement in the accuracy of the results obtained by our proposal with respect to the original time series. Apart from the improvement in accuracy, our proposal is able to offer interpretable results based on the set of characteristics proposed.
Tasks Time Series, Time Series Classification
Published 2020-02-27
URL https://arxiv.org/abs/2002.12036v2
PDF https://arxiv.org/pdf/2002.12036v2.pdf
PWC https://paperswithcode.com/paper/complexity-measures-and-features-for-times
Repo
Framework

Clustering based on Point-Set Kernel

Title Clustering based on Point-Set Kernel
Authors Kai Ming Ting, Jonathan R. Wells, Ye Zhu
Abstract Measuring similarity between two objects is the core operation in existing cluster analyses in grouping similar objects into clusters. Cluster analyses have been applied to a number of applications, including image segmentation, social network analysis, and computational biology. This paper introduces a new similarity measure called point-set kernel which computes the similarity between an object and a sample of objects generated from an unknown distribution. The proposed clustering procedure utilizes this new measure to characterize both the typical point of every cluster and the cluster grown from the typical point. We show that the new clustering procedure is both effective and efficient such that it can deal with large scale datasets. In contrast, existing clustering algorithms are either efficient or effective; and even efficient ones have difficulty dealing with large scale datasets without special hardware. We show that the proposed algorithm is more effective and runs orders of magnitude faster than the state-of-the-art density-peak clustering and scalable kernel k-means clustering when applying to datasets of millions of data points, on commonly used computing machines.
Tasks Semantic Segmentation
Published 2020-02-14
URL https://arxiv.org/abs/2002.05815v1
PDF https://arxiv.org/pdf/2002.05815v1.pdf
PWC https://paperswithcode.com/paper/clustering-based-on-point-set-kernel
Repo
Framework

StochasticRank: Global Optimization of Scale-Free Discrete Functions

Title StochasticRank: Global Optimization of Scale-Free Discrete Functions
Authors Aleksei Ustimenko, Liudmila Prokhorenkova
Abstract In this paper, we introduce a powerful and efficient framework for the direct optimization of ranking metrics. The problem is ill-posed due to the discrete structure of the loss, and to deal with that, we introduce two important techniques: a stochastic smoothing and a novel gradient estimate based on partial integration. We also address the problem of smoothing bias and present a universal solution for a proper debiasing. To guarantee the global convergence of our method, we adopt a recently proposed Stochastic Gradient Langevin Boosting algorithm. Our algorithm is implemented as a part of the CatBoost gradient boosting library and outperforms the existing approaches on several learning to rank datasets. In addition to ranking metrics, our framework applies to any scale-free discreet loss function.
Tasks Learning-To-Rank
Published 2020-03-04
URL https://arxiv.org/abs/2003.02122v1
PDF https://arxiv.org/pdf/2003.02122v1.pdf
PWC https://paperswithcode.com/paper/stochasticrank-global-optimization-of-scale
Repo
Framework

ConBO: Conditional Bayesian Optimization

Title ConBO: Conditional Bayesian Optimization
Authors Michael Pearce, Janis Klaise, Matthew Groves
Abstract Bayesian optimization is a class of data efficient model based algorithms typically focused on global optimization. We consider the more general case where a user is faced with multiple problems that each need to be optimized conditional on a state variable, for example we optimize the location of ambulances conditioned on patient distribution given a range of cities with different patient distributions. Similarity across objectives boosts optimization of each objective in two ways: in modelling by data sharing across objectives, and also in acquisition by quantifying how all objectives benefit from a single point on one objective. For this we propose ConBO, a novel efficient algorithm that is based on a new hybrid Knowledge Gradient method, that outperforms recently published works on synthetic and real world problems, and is easily parallelized to collecting a batch of points.
Tasks
Published 2020-02-23
URL https://arxiv.org/abs/2002.09996v1
PDF https://arxiv.org/pdf/2002.09996v1.pdf
PWC https://paperswithcode.com/paper/conbo-conditional-bayesian-optimization
Repo
Framework

Applications of deep learning in stock market prediction: recent progress

Title Applications of deep learning in stock market prediction: recent progress
Authors Weiwei Jiang
Abstract Stock market prediction has been a classical yet challenging problem, with the attention from both economists and computer scientists. With the purpose of building an effective prediction model, both linear and machine learning tools have been explored for the past couple of decades. Lately, deep learning models have been introduced as new frontiers for this topic and the rapid development is too fast to catch up. Hence, our motivation for this survey is to give a latest review of recent works on deep learning models for stock market prediction. We not only category the different data sources, various neural network structures, and common used evaluation metrics, but also the implementation and reproducibility. Our goal is to help the interested researchers to synchronize with the latest progress and also help them to easily reproduce the previous studies as baselines. Base on the summary, we also highlight some future research directions in this topic.
Tasks Stock Market Prediction
Published 2020-02-29
URL https://arxiv.org/abs/2003.01859v1
PDF https://arxiv.org/pdf/2003.01859v1.pdf
PWC https://paperswithcode.com/paper/applications-of-deep-learning-in-stock-market
Repo
Framework

Segmenting Unseen Industrial Components in a Heavy Clutter Using RGB-D Fusion and Synthetic Data

Title Segmenting Unseen Industrial Components in a Heavy Clutter Using RGB-D Fusion and Synthetic Data
Authors Seunghyeok Back, Jongwon Kim, Raeyong Kang, Seungjun Choi, Kyoobin Lee
Abstract Segmentation of unseen industrial parts is essential for autonomous industrial systems. However, industrial components are texture-less, reflective, and often found in cluttered and unstructured environments with heavy occlusion, which makes it more challenging to deal with unseen objects. To tackle this problem, we propose a synthetic data generation pipeline that randomizes textures via domain randomization to focus on the shape information. In addition, we propose an RGB-D Fusion Mask R-CNN with a confidence map estimator, which exploits reliable depth information in multiple feature levels. We transferred the trained model to real-world scenarios and evaluated its performance by making comparisons with baselines and ablation studies. We demonstrate that our methods, which use only synthetic data, could be effective solutions for unseen industrial components segmentation.
Tasks Synthetic Data Generation
Published 2020-02-10
URL https://arxiv.org/abs/2002.03501v2
PDF https://arxiv.org/pdf/2002.03501v2.pdf
PWC https://paperswithcode.com/paper/segmenting-unseen-industrial-components-in-a
Repo
Framework

Data Pre-Processing and Evaluating the Performance of Several Data Mining Methods for Predicting Irrigation Water Requirement

Title Data Pre-Processing and Evaluating the Performance of Several Data Mining Methods for Predicting Irrigation Water Requirement
Authors Mahmood A. Khan, Md Zahidul Islam, Mohsin Hafeez
Abstract Recent drought and population growth are planting unprecedented demand for the use of available limited water resources. Irrigated agriculture is one of the major consumers of freshwater. A large amount of water in irrigated agriculture is wasted due to poor water management practices. To improve water management in irrigated areas, models for estimation of future water requirements are needed. Developing a model for forecasting irrigation water demand can improve water management practices and maximise water productivity. Data mining can be used effectively to build such models. In this study, we prepare a dataset containing information on suitable attributes for forecasting irrigation water demand. The data is obtained from three different sources namely meteorological data, remote sensing images and water delivery statements. In order to make the prepared dataset useful for demand forecasting and pattern extraction, we pre-process the dataset using a novel approach based on a combination of irrigation and data mining knowledge. We then apply and compare the effectiveness of different data mining methods namely decision tree (DT), artificial neural networks (ANNs), systematically developed forest (SysFor) for multiple trees, support vector machine (SVM), logistic regression, and the traditional Evapotranspiration (ETc) methods and evaluate the performance of these models to predict irrigation water demand. Our experimental results indicate the usefulness of data pre-processing and the effectiveness of different classifiers. Among the six methods we used, SysFor produces the best prediction with 97.5% accuracy followed by a decision tree with 96% and ANN with 95% respectively by closely matching the predictions with actual water usage. Therefore, we recommend using SysFor and DT models for irrigation water demand forecasting.
Tasks
Published 2020-03-01
URL https://arxiv.org/abs/2003.00411v1
PDF https://arxiv.org/pdf/2003.00411v1.pdf
PWC https://paperswithcode.com/paper/data-pre-processing-and-evaluating-the
Repo
Framework

Stochastic Coordinate Minimization with Progressive Precision for Stochastic Convex Optimization

Title Stochastic Coordinate Minimization with Progressive Precision for Stochastic Convex Optimization
Authors Sudeep Salgia, Qing Zhao, Sattar Vakili
Abstract A framework based on iterative coordinate minimization (CM) is developed for stochastic convex optimization. Given that exact coordinate minimization is impossible due to the unknown stochastic nature of the objective function, the crux of the proposed optimization algorithm is an optimal control of the minimization precision in each iteration. We establish the optimal precision control and the resulting order-optimal regret performance for strongly convex and separably nonsmooth functions. An interesting finding is that the optimal progression of precision across iterations is independent of the low-dimensional CM routine employed, suggesting a general framework for extending low-dimensional optimization routines to high-dimensional problems. The proposed algorithm is amenable to online implementation and inherits the scalability and parallelizability properties of CM for large-scale optimization. Requiring only a sublinear order of message exchanges, it also lends itself well to distributed computing as compared with the alternative approach of coordinate gradient descent.
Tasks
Published 2020-03-11
URL https://arxiv.org/abs/2003.05482v1
PDF https://arxiv.org/pdf/2003.05482v1.pdf
PWC https://paperswithcode.com/paper/stochastic-coordinate-minimization-with
Repo
Framework

Speaker Identification using EEG

Title Speaker Identification using EEG
Authors Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik
Abstract In this paper we explore speaker identification using electroencephalography (EEG) signals. The performance of speaker identification systems degrades in presence of background noise, this paper demonstrates that EEG features can be used to enhance the performance of speaker identification systems operating in presence and absence of background noise. The paper further demonstrates that in presence of high background noise, speaker identification system using only EEG features as input demonstrates better performance than the system using only acoustic features as input.
Tasks EEG, Speaker Identification
Published 2020-03-07
URL https://arxiv.org/abs/2003.04733v1
PDF https://arxiv.org/pdf/2003.04733v1.pdf
PWC https://paperswithcode.com/paper/speaker-identification-using-eeg
Repo
Framework

A Multi-Source Entity-Level Sentiment Corpus for the Financial Domain: The FinLin Corpus

Title A Multi-Source Entity-Level Sentiment Corpus for the Financial Domain: The FinLin Corpus
Authors Tobias Daudert
Abstract We introduce FinLin, a novel corpus containing investor reports, company reports, news articles, and microblogs from StockTwits, targeting multiple entities stemming from the automobile industry and covering a 3-month period. FinLin was annotated with a sentiment score and a relevance score in the range [-1.0, 1.0] and [0.0, 1.0], respectively. The annotations also include the text spans selected for the sentiment, thus, providing additional insight into the annotators’ reasoning. Overall, FinLin aims to complement the current knowledge by providing a novel and publicly available financial sentiment corpus and to foster research on the topic of financial sentiment analysis and potential applications in behavioural science.
Tasks Sentiment Analysis
Published 2020-03-09
URL https://arxiv.org/abs/2003.04073v1
PDF https://arxiv.org/pdf/2003.04073v1.pdf
PWC https://paperswithcode.com/paper/a-multi-source-entity-level-sentiment-corpus
Repo
Framework

Quality of Word Embeddings on Sentiment Analysis Tasks

Title Quality of Word Embeddings on Sentiment Analysis Tasks
Authors Erion Çano, Maurizio Morisio
Abstract Word embeddings or distributed representations of words are being used in various applications like machine translation, sentiment analysis, topic identification etc. Quality of word embeddings and performance of their applications depends on several factors like training method, corpus size and relevance etc. In this study we compare performance of a dozen of pretrained word embedding models on lyrics sentiment analysis and movie review polarity tasks. According to our results, Twitter Tweets is the best on lyrics sentiment analysis, whereas Google News and Common Crawl are the top performers on movie polarity analysis. Glove trained models slightly outrun those trained with Skipgram. Also, factors like topic relevance and size of corpus significantly impact the quality of the models. When medium or large-sized text sets are available, obtaining word embeddings from same training dataset is usually the best choice.
Tasks Machine Translation, Sentiment Analysis, Word Embeddings
Published 2020-03-06
URL https://arxiv.org/abs/2003.03264v1
PDF https://arxiv.org/pdf/2003.03264v1.pdf
PWC https://paperswithcode.com/paper/quality-of-word-embeddings-on-sentiment
Repo
Framework

Supervised Speaker Embedding De-Mixing in Two-Speaker Environment

Title Supervised Speaker Embedding De-Mixing in Two-Speaker Environment
Authors Yanpei Shi, Thomas Hain
Abstract In this work, a speaker embedding de-mixing approach is proposed. Instead of separating two-speaker signal in signal space like speech source separation, the proposed approach separates different speaker properties from two-speaker signal in embedding space. The proposed approach contains two steps. In step one, the clean speaker embeddings are learned and collected by a residual TDNN based network. In step two, the two-speaker signal and the embedding of one of the speakers are input to a speaker embedding de-mixing network. The de-mixing network is trained to generate the embedding of the other speaker of the by reconstruction loss. Speaker identification accuracy on the de-mixed speaker embeddings is used to evaluate the quality of the obtained embeddings. Experiments are done in two kind of data: artificial augmented two-speaker data (TIMIT) and real world recording of two-speaker data (MC-WSJ). Six diffident speaker embedding de-mixing architectures are investigated. Comparing with the speaker identification accuracy on the clean speaker embeddings (98.5%), the obtained results show that one of the speaker embedding de-mixing architectures obtain close performance, reaching 96.9% test accuracy on TIMIT when the SNR between the target speaker and interfering speaker is 5 dB. More surprisingly, we found choosing a simple subtraction as the embedding de-mixing function could obtain the second best performance, reaching 95.2% test accuracy.
Tasks Speaker Identification
Published 2020-01-14
URL https://arxiv.org/abs/2001.06397v1
PDF https://arxiv.org/pdf/2001.06397v1.pdf
PWC https://paperswithcode.com/paper/supervised-speaker-embedding-de-mixing-in-two
Repo
Framework

DDU-Nets: Distributed Dense Model for 3D MRI Brain Tumor Segmentation

Title DDU-Nets: Distributed Dense Model for 3D MRI Brain Tumor Segmentation
Authors Hanxiao Zhang, Jingxiong Li, Mali Shen, Yaqi Wang, Guang-Zhong Yang
Abstract Segmentation of brain tumors and their subregions remains a challenging task due to their weak features and deformable shapes. In this paper, three patterns (cross-skip, skip-1 and skip-2) of distributed dense connections (DDCs) are proposed to enhance feature reuse and propagation of CNNs by constructing tunnels between key layers of the network. For better detecting and segmenting brain tumors from multi-modal 3D MR images, CNN-based models embedded with DDCs (DDU-Nets) are trained efficiently from pixel to pixel with a limited number of parameters. Postprocessing is then applied to refine the segmentation results by reducing the false-positive samples. The proposed method is evaluated on the BraTS 2019 dataset with results demonstrating the effectiveness of the DDU-Nets while requiring less computational cost.
Tasks Brain Tumor Segmentation
Published 2020-03-03
URL https://arxiv.org/abs/2003.01337v1
PDF https://arxiv.org/pdf/2003.01337v1.pdf
PWC https://paperswithcode.com/paper/ddu-nets-distributed-dense-model-for-3d-mri
Repo
Framework

Brain Tumor Segmentation by Cascaded Deep Neural Networks Using Multiple Image Scales

Title Brain Tumor Segmentation by Cascaded Deep Neural Networks Using Multiple Image Scales
Authors Zahra Sobhaninia, Safiyeh Rezaei, Nader Karimi, Ali Emami, Shadrokh Samavi
Abstract Intracranial tumors are groups of cells that usually grow uncontrollably. One out of four cancer deaths is due to brain tumors. Early detection and evaluation of brain tumors is an essential preventive medical step that is performed by magnetic resonance imaging (MRI). Many segmentation techniques exist for this purpose. Low segmentation accuracy is the main drawback of existing methods. In this paper, we use a deep learning method to boost the accuracy of tumor segmentation in MR images. Cascade approach is used with multiple scales of images to induce both local and global views and help the network to reach higher accuracies. Our experimental results show that using multiple scales and the utilization of two cascade networks is advantageous.
Tasks Brain Tumor Segmentation
Published 2020-02-05
URL https://arxiv.org/abs/2002.01975v1
PDF https://arxiv.org/pdf/2002.01975v1.pdf
PWC https://paperswithcode.com/paper/brain-tumor-segmentation-by-cascaded-deep
Repo
Framework

Multi-site fMRI Analysis Using Privacy-preserving Federated Learning and Domain Adaptation: ABIDE Results

Title Multi-site fMRI Analysis Using Privacy-preserving Federated Learning and Domain Adaptation: ABIDE Results
Authors Xiaoxiao Li, Yufeng Gu, Nicha Dvornek, Lawrence Staib, Pamela Ventola, James S. Duncan
Abstract Deep learning models have shown their advantage in many different tasks, including neuroimage analysis. However, to effectively train a high-quality deep learning model, the aggregation of a significant amount of patient information is required. The time and cost for acquisition and annotation in assembling, for example, large fMRI datasets make it difficult to acquire large numbers at a single site. However, due to the need to protect the privacy of patient data, it is hard to assemble a central database from multiple institutions. Federated learning allows for population-level models to be trained without centralizing entities’ data by transmitting the global model to local entities, training the model locally, and then averaging the gradients or weights in the global model. However, some studies suggest that private information can be recovered from the model gradients or weights. In this work, we address the problem of multi-site fMRI classification with a privacy-preserving strategy. To solve the problem, we propose a federated learning approach, where a decentralized iterative optimization algorithm is implemented and shared local model weights are altered by a randomization mechanism. Considering the systemic differences of fMRI distributions from different sites, we further propose two domain adaptation methods in this federated learning formulation. We investigate various practical aspects of federated model optimization and compare federated learning with alternative training strategies. Overall, our results demonstrate that it is promising to utilize multi-site data without data sharing to boost neuroimage analysis performance and find reliable disease-related biomarkers. Our proposed pipeline can be generalized to other privacy-sensitive medical data analysis problems.
Tasks Domain Adaptation
Published 2020-01-16
URL https://arxiv.org/abs/2001.05647v1
PDF https://arxiv.org/pdf/2001.05647v1.pdf
PWC https://paperswithcode.com/paper/multi-site-fmri-analysis-using-privacy
Repo
Framework
comments powered by Disqus