April 2, 2020

3433 words 17 mins read

Paper Group ANR 165

Adaptive exponential power distribution with moving estimator for nonstationary time series. Deep Technology Tracing for High-tech Companies. Scattering Features for Multimodal Gait Recognition. Towards Label-Free 3D Segmentation of Optical Coherence Tomography Images of the Optic Nerve Head Using Deep Learning. Correlated Feature Selection with Ex …

Adaptive exponential power distribution with moving estimator for nonstationary time series


Title	Adaptive exponential power distribution with moving estimator for nonstationary time series
Authors	Jarek Duda
Abstract	While standard estimation assumes that all datapoints are from probability distribution of the same fixed parameters $\theta$, we will focus on maximum likelihood (ML) adaptive estimation for nonstationary time series: separately estimating parameters $\theta_T$ for each time $T$ based on the earlier values $(x_t){t<T}$ using (exponential) moving ML estimator $\theta_T=\arg\max\theta l_T$ for $l_T=\sum_{t<T} \eta^{T-t} \ln(\rho_\theta (x_t))$ and some $\eta\in(0,1]$. Computational cost of such moving estimator is generally much higher as we need to optimize log-likelihood multiple times, however, in many cases it can be made inexpensive thanks to dependencies. We focus on such example: $\rho(x)\propto \exp(-(x-\mu)/\sigma^\kappa/\kappa)$ exponential power distribution (EPD) family, which covers wide range of tail behavior like Gaussian ($\kappa=2$) or Laplace ($\kappa=1$) distribution. It is also convenient for such adaptive estimation of scale parameter $\sigma$ as its standard ML estimation is $\sigma^\kappa$ being average $\x-\mu^\kappa$. By just replacing average with exponential moving average: $(\sigma_{T+1})^\kappa=\eta(\sigma_T)^\kappa +(1-\eta)x_T-\mu^\kappa$ we can inexpensively make it adaptive. It is tested on daily log-return series for DJIA companies, leading to essentially better log-likelihoods than standard (static) estimation, with optimal $\kappa$ tails types varying between companies. Presented general alternative estimation philosophy provides tools which might be useful for building better models for analysis of nonstationary time-series.
Tasks	Time Series
Published	2020-03-04
URL	https://arxiv.org/abs/2003.02149v2
PDF	https://arxiv.org/pdf/2003.02149v2.pdf
PWC	https://paperswithcode.com/paper/adaptive-exponential-power-distribution-with
Repo
Framework

Deep Technology Tracing for High-tech Companies


Title	Deep Technology Tracing for High-tech Companies
Authors	Han Wu, Kun Zhang, Guangyi Lv, Qi Liu, Runlong Yu, Weihao Zhao, Enhong Chen, Jianhui Ma
Abstract	Technological change and innovation are vitally important, especially for high-tech companies. However, factors influencing their future research and development (R&D) trends are both complicated and various, leading it a quite difficult task to make technology tracing for high-tech companies. To this end, in this paper, we develop a novel data-driven solution, i.e., Deep Technology Forecasting (DTF) framework, to automatically find the most possible technology directions customized to each high-tech company. Specially, DTF consists of three components: Potential Competitor Recognition (PCR), Collaborative Technology Recognition (CTR), and Deep Technology Tracing (DTT) neural network. For one thing, PCR and CTR aim to capture competitive relations among enterprises and collaborative relations among technologies, respectively. For another, DTT is designed for modeling dynamic interactions between companies and technologies with the above relations involved. Finally, we evaluate our DTF framework on real-world patent data, and the experimental results clearly prove that DTF can precisely help to prospect future technology emphasis of companies by exploiting hybrid factors.
Tasks
Published	2020-01-02
URL	https://arxiv.org/abs/2001.08606v1
PDF	https://arxiv.org/pdf/2001.08606v1.pdf
PWC	https://paperswithcode.com/paper/deep-technology-tracing-for-high-tech
Repo
Framework

Scattering Features for Multimodal Gait Recognition


Title	Scattering Features for Multimodal Gait Recognition
Authors	Srđan Kitić, Gilles Puy, Patrick Pérez, Philippe Gilberton
Abstract	We consider the problem of identifying people on the basis of their walk (gait) pattern. Classical approaches to tackle this problem are based on, e.g., video recordings or piezoelectric sensors embedded in the floor. In this work, we rely on acoustic and vibration measurements, obtained from a microphone and a geophone sensor, respectively. The contribution of this work is twofold. First, we propose a feature extraction method based on an (untrained) shallow scattering network, specially tailored for the gait signals. Second, we demonstrate that fusing the two modalities improves identification in the practically relevant open set scenario.
Tasks	Gait Recognition
Published	2020-01-23
URL	https://arxiv.org/abs/2001.08830v1
PDF	https://arxiv.org/pdf/2001.08830v1.pdf
PWC	https://paperswithcode.com/paper/scattering-features-for-multimodal-gait
Repo
Framework

Towards Label-Free 3D Segmentation of Optical Coherence Tomography Images of the Optic Nerve Head Using Deep Learning


Title	Towards Label-Free 3D Segmentation of Optical Coherence Tomography Images of the Optic Nerve Head Using Deep Learning
Authors	Sripad Krishna Devalla, Tan Hung Pham, Satish Kumar Panda, Liang Zhang, Giridhar Subramanian, Anirudh Swaminathan, Chin Zhi Yun, Mohan Rajan, Sujatha Mohan, Ramaswami Krishnadas, Vijayalakshmi Senthil, John Mark S. de Leon, Tin A. Tun, Ching-Yu Cheng, Leopold Schmetterer, Shamira Perera, Tin Aung, Alexandre H. Thiery, Michael J. A. Girard
Abstract	Since the introduction of optical coherence tomography (OCT), it has been possible to study the complex 3D morphological changes of the optic nerve head (ONH) tissues that occur along with the progression of glaucoma. Although several deep learning (DL) techniques have been recently proposed for the automated extraction (segmentation) and quantification of these morphological changes, the device specific nature and the difficulty in preparing manual segmentations (training data) limit their clinical adoption. With several new manufacturers and next-generation OCT devices entering the market, the complexity in deploying DL algorithms clinically is only increasing. To address this, we propose a DL based 3D segmentation framework that is easily translatable across OCT devices in a label-free manner (i.e. without the need to manually re-segment data for each device). Specifically, we developed 2 sets of DL networks. The first (referred to as the enhancer) was able to enhance OCT image quality from 3 OCT devices, and harmonized image-characteristics across these devices. The second performed 3D segmentation of 6 important ONH tissue layers. We found that the use of the enhancer was critical for our segmentation network to achieve device independency. In other words, our 3D segmentation network trained on any of 3 devices successfully segmented ONH tissue layers from the other two devices with high performance (Dice coefficients > 0.92). With such an approach, we could automatically segment images from new OCT devices without ever needing manual segmentation data from such devices.
Tasks
Published	2020-02-22
URL	https://arxiv.org/abs/2002.09635v1
PDF	https://arxiv.org/pdf/2002.09635v1.pdf
PWC	https://paperswithcode.com/paper/towards-label-free-3d-segmentation-of-optical
Repo
Framework

Correlated Feature Selection with Extended Exclusive Group Lasso


Title	Correlated Feature Selection with Extended Exclusive Group Lasso
Authors	Yuxin Sun, Benny Chain, Samuel Kaski, John Shawe-Taylor
Abstract	In many high dimensional classification or regression problems set in a biological context, the complete identification of the set of informative features is often as important as predictive accuracy, since this can provide mechanistic insight and conceptual understanding. Lasso and related algorithms have been widely used since their sparse solutions naturally identify a set of informative features. However, Lasso performs erratically when features are correlated. This limits the use of such algorithms in biological problems, where features such as genes often work together in pathways, leading to sets of highly correlated features. In this paper, we examine the performance of a Lasso derivative, the exclusive group Lasso, in this setting. We propose fast algorithms to solve the exclusive group Lasso, and introduce a solution to the case when the underlying group structure is unknown. The solution combines stability selection with random group allocation and introduction of artificial features. Experiments with both synthetic and real-world data highlight the advantages of this proposed methodology over Lasso in comprehensive selection of informative features.
Tasks	Feature Selection
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12460v1
PDF	https://arxiv.org/pdf/2002.12460v1.pdf
PWC	https://paperswithcode.com/paper/correlated-feature-selection-with-extended
Repo
Framework

An Information Diffusion Approach to Rumor Propagation and Identification on Twitter


Title	An Information Diffusion Approach to Rumor Propagation and Identification on Twitter
Authors	Abiola Osho, Caden Waters, George Amariucai
Abstract	With the increasing use of online social networks as a source of news and information, the propensity for a rumor to disseminate widely and quickly poses a great concern, especially in disaster situations where users do not have enough time to fact-check posts before making the informed decision to react to a post that appears to be credible. In this study, we explore the propagation pattern of rumors on Twitter by exploring the dynamics of microscopic-level misinformation spread, based on the latent message and user interaction attributes. We perform supervised learning for feature selection and prediction. Experimental results with real-world data sets give the models’ prediction accuracy at about 90% for the diffusion of both True and False topics. Our findings confirm that rumor cascades run deeper and that rumor masked as news, and messages that incite fear, will diffuse faster than other messages. We show that the models for True and False message propagation differ significantly, both in the prediction parameters and in the message features that govern the diffusion. Finally, we show that the diffusion pattern is an important metric in identifying the credibility of a tweet.
Tasks	Feature Selection
Published	2020-02-24
URL	https://arxiv.org/abs/2002.11104v1
PDF	https://arxiv.org/pdf/2002.11104v1.pdf
PWC	https://paperswithcode.com/paper/an-information-diffusion-approach-to-rumor
Repo
Framework

High-speed Autonomous Drifting with Deep Reinforcement Learning


Title	High-speed Autonomous Drifting with Deep Reinforcement Learning
Authors	Peide Cai, Xiaodong Mei, Lei Tai, Yuxiang Sun, Ming Liu
Abstract	Drifting is a complicated task for autonomous vehicle control. Most traditional methods in this area are based on motion equations derived by the understanding of vehicle dynamics, which is difficult to be modeled precisely. We propose a robust drift controller without explicit motion equations, which is based on the latest model-free deep reinforcement learning algorithm soft actor-critic. The drift control problem is formulated as a trajectory following task, where the errorbased state and reward are designed. After being trained on tracks with different levels of difficulty, our controller is capable of making the vehicle drift through various sharp corners quickly and stably in the unseen map. The proposed controller is further shown to have excellent generalization ability, which can directly handle unseen vehicle types with different physical properties, such as mass, tire friction, etc.
Tasks
Published	2020-01-06
URL	https://arxiv.org/abs/2001.01377v1
PDF	https://arxiv.org/pdf/2001.01377v1.pdf
PWC	https://paperswithcode.com/paper/high-speed-autonomous-drifting-with-deep
Repo
Framework


Title	The Value of Big Data for Credit Scoring: Enhancing Financial Inclusion using Mobile Phone Data and Social Network Analytics
Authors	María Óskarsdóttir, Cristián Bravo, Carlos Sarraute, Jan Vanthienen, Bart Baesens
Abstract	Credit scoring is without a doubt one of the oldest applications of analytics. In recent years, a multitude of sophisticated classification techniques have been developed to improve the statistical performance of credit scoring models. Instead of focusing on the techniques themselves, this paper leverages alternative data sources to enhance both statistical and economic model performance. The study demonstrates how including call networks, in the context of positive credit information, as a new Big Data source has added value in terms of profit by applying a profit measure and profit-based feature selection. A unique combination of datasets, including call-detail records, credit and debit account information of customers is used to create scorecards for credit card applicants. Call-detail records are used to build call networks and advanced social network analytics techniques are applied to propagate influence from prior defaulters throughout the network to produce influence scores. The results show that combining call-detail records with traditional data in credit scoring models significantly increases their performance when measured in AUC. In terms of profit, the best model is the one built with only calling behavior features. In addition, the calling behavior features are the most predictive in other models, both in terms of statistical and economic performance. The results have an impact in terms of ethical use of call-detail records, regulatory implications, financial inclusion, as well as data sharing and privacy.
Tasks	Feature Selection
Published	2020-02-23
URL	https://arxiv.org/abs/2002.09931v1
PDF	https://arxiv.org/pdf/2002.09931v1.pdf
PWC	https://paperswithcode.com/paper/the-value-of-big-data-for-credit-scoring
Repo
Framework

ZoomCount: A Zooming Mechanism for Crowd Counting in Static Images


Title	ZoomCount: A Zooming Mechanism for Crowd Counting in Static Images
Authors	Usman Sajid, Hasan Sajid, Hongcheng Wang, Guanghui Wang
Abstract	This paper proposes a novel approach for crowd counting in low to high density scenarios in static images. Current approaches cannot handle huge crowd diversity well and thus perform poorly in extreme cases, where the crowd density in different regions of an image is either too low or too high, leading to crowd underestimation or overestimation. The proposed solution is based on the observation that detecting and handling such extreme cases in a specialized way leads to better crowd estimation. Additionally, existing methods find it hard to differentiate between the actual crowd and the cluttered background regions, resulting in further count overestimation. To address these issues, we propose a simple yet effective modular approach, where an input image is first subdivided into fixed-size patches and then fed to a four-way classification module labeling each image patch as low, medium, high-dense or no-crowd. This module also provides a count for each label, which is then analyzed via a specifically devised novel decision module to decide whether the image belongs to any of the two extreme cases (very low or very high density) or a normal case. Images, specified as high- or low-density extreme or a normal case, pass through dedicated zooming or normal patch-making blocks respectively before routing to the regressor in the form of fixed-size patches for crowd estimate. Extensive experimental evaluations demonstrate that the proposed approach outperforms the state-of-the-art methods on four benchmarks under most of the evaluation criteria.
Tasks	Crowd Counting
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12256v1
PDF	https://arxiv.org/pdf/2002.12256v1.pdf
PWC	https://paperswithcode.com/paper/zoomcount-a-zooming-mechanism-for-crowd
Repo
Framework

Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language


Title	Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language
Authors	Kohei Matsuura, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Abstract	Ainu is an unwritten language that has been spoken by Ainu people who are one of the ethnic groups in Japan. It is recognized as critically endangered by UNESCO and archiving and documentation of its language heritage is of paramount importance. Although a considerable amount of voice recordings of Ainu folklore has been produced and accumulated to save their culture, only a quite limited parts of them are transcribed so far. Thus, we started a project of automatic speech recognition (ASR) for the Ainu language in order to contribute to the development of annotated language archives. In this paper, we report speech corpus development and the structure and performance of end-to-end ASR for Ainu. We investigated four modeling units (phone, syllable, word piece, and word) and found that the syllable-based model performed best in terms of both word and phone recognition accuracy, which were about 60% and over 85% respectively in speaker-open condition. Furthermore, word and phone accuracy of 80% and 90% has been achieved in a speaker-closed setting. We also found out that a multilingual ASR training with additional speech corpora of English and Japanese further improves the speaker-open test accuracy.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2020-02-16
URL	https://arxiv.org/abs/2002.06675v2
PDF	https://arxiv.org/pdf/2002.06675v2.pdf
PWC	https://paperswithcode.com/paper/speech-corpus-of-ainu-folklore-and-end-to-end
Repo
Framework

Small energy masking for improved neural network training for end-to-end speech recognition


Title	Small energy masking for improved neural network training for end-to-end speech recognition
Authors	Chanwoo Kim, Kwangyoun Kim, Sathish Reddy Indurthi
Abstract	In this paper, we present a Small Energy Masking (SEM) algorithm, which masks inputs having values below a certain threshold. More specifically, a time-frequency bin is masked if the filterbank energy in this bin is less than a certain energy threshold. A uniform distribution is employed to randomly generate the ratio of this energy threshold to the peak filterbank energy of each utterance in decibels. The unmasked feature elements are scaled so that the total sum of the feature values remain the same through this masking procedure. This very simple algorithm shows relatively 11.2 % and 13.5 % Word Error Rate (WER) improvements on the standard LibriSpeech test-clean and test-other sets over the baseline end-to-end speech recognition system. Additionally, compared to the input dropout algorithm, SEM algorithm shows relatively 7.7 % and 11.6 % improvements on the same LibriSpeech test-clean and test-other sets. With a modified shallow-fusion technique with a Transformer LM, we obtained a 2.62 % WER on the LibriSpeech test-clean set and a 7.87 % WER on the LibriSpeech test-other set.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2020-02-15
URL	https://arxiv.org/abs/2002.06312v1
PDF	https://arxiv.org/pdf/2002.06312v1.pdf
PWC	https://paperswithcode.com/paper/small-energy-masking-for-improved-neural
Repo
Framework

Unsupervised Speaker Adaptation using Attention-based Speaker Memory for End-to-End ASR


Title	Unsupervised Speaker Adaptation using Attention-based Speaker Memory for End-to-End ASR
Authors	Leda Sarı, Niko Moritz, Takaaki Hori, Jonathan Le Roux
Abstract	We propose an unsupervised speaker adaptation method inspired by the neural Turing machine for end-to-end (E2E) automatic speech recognition (ASR). The proposed model contains a memory block that holds speaker i-vectors extracted from the training data and reads relevant i-vectors from the memory through an attention mechanism. The resulting memory vector (M-vector) is concatenated to the acoustic features or to the hidden layer activations of an E2E neural network model. The E2E ASR system is based on the joint connectionist temporal classification and attention-based encoder-decoder architecture. M-vector and i-vector results are compared for inserting them at different layers of the encoder neural network using the WSJ and TED-LIUM2 ASR benchmarks. We show that M-vectors, which do not require an auxiliary speaker embedding extraction system at test time, achieve similar word error rates (WERs) compared to i-vectors for single speaker utterances and significantly lower WERs for utterances in which there are speaker changes.
Tasks	End-To-End Speech Recognition, Speech Recognition
Published	2020-02-14
URL	https://arxiv.org/abs/2002.06165v1
PDF	https://arxiv.org/pdf/2002.06165v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-speaker-adaptation-using
Repo
Framework

Pulsars Detection by Machine Learning with Very Few Features


Title	Pulsars Detection by Machine Learning with Very Few Features
Authors	Haitao Lin, Xiangru Li, Ziying Luo
Abstract	It is an active topic to investigate the schemes based on machine learning (ML) methods for detecting pulsars as the data volume growing exponentially in modern surveys. To improve the detection performance, input features into an ML model should be investigated specifically. In the existing pulsar detection researches based on ML methods, there are mainly two kinds of feature designs: the empirical features and statistical features. Due to the combinational effects from multiple features, however, there exist some redundancies and even irrelevant components in the available features, which can reduce the accuracy of a pulsar detection model. Therefore, it is essential to select a subset of relevant features from a set of available candidate features and known as {\itshape feature selection.} In this work, two feature selection algorithms —-\textit{Grid Search} (GS) and \textit{Recursive Feature Elimination} (RFE)—- are proposed to improve the detection performance by removing the redundant and irrelevant features. The algorithms were evaluated on the Southern High Time Resolution University survey (HTRU-S) with five pulsar detection models. The experimental results verify the effectiveness and efficiency of our proposed feature selection algorithms. By the GS, a model with only two features reach a recall rate as high as 99% and a false positive rate (FPR) as low as 0.65%; By the RFE, another model with only three features achieves a recall rate 99% and an FPR of 0.16% in pulsar candidates classification. Furthermore, this work investigated the number of features required as well as the misclassified pulsars by our models.
Tasks	Feature Selection
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08519v1
PDF	https://arxiv.org/pdf/2002.08519v1.pdf
PWC	https://paperswithcode.com/paper/pulsars-detection-by-machine-learning-with
Repo
Framework

Inverse Problems, Deep Learning, and Symmetry Breaking


Title	Inverse Problems, Deep Learning, and Symmetry Breaking
Authors	Kshitij Tayal, Chieh-Hsin Lai, Vipin Kumar, Ju Sun
Abstract	In many physical systems, inputs related by intrinsic system symmetries are mapped to the same output. When inverting such systems, i.e., solving the associated inverse problems, there is no unique solution. This causes fundamental difficulties for deploying the emerging end-to-end deep learning approach. Using the generalized phase retrieval problem as an illustrative example, we show that careful symmetry breaking on the training data can help get rid of the difficulties and significantly improve the learning performance. We also extract and highlight the underlying mathematical principle of the proposed solution, which is directly applicable to other inverse problems.
Tasks
Published	2020-03-20
URL	https://arxiv.org/abs/2003.09077v1
PDF	https://arxiv.org/pdf/2003.09077v1.pdf
PWC	https://paperswithcode.com/paper/inverse-problems-deep-learning-and-symmetry
Repo
Framework

Time-Domain Audio Source Separation Based on Wave-U-Net Combined with Discrete Wavelet Transform


Title	Time-Domain Audio Source Separation Based on Wave-U-Net Combined with Discrete Wavelet Transform
Authors	Tomohiko Nakamura, Hiroshi Saruwatari
Abstract	We propose a time-domain audio source separation method using down-sampling (DS) and up-sampling (US) layers based on a discrete wavelet transform (DWT). The proposed method is based on one of the state-of-the-art deep neural networks, Wave-U-Net, which successively down-samples and up-samples feature maps. We find that this architecture resembles that of multiresolution analysis, and reveal that the DS layers of Wave-U-Net cause aliasing and may discard information useful for the separation. Although the effects of these problems may be reduced by training, to achieve a more reliable source separation method, we should design DS layers capable of overcoming the problems. With this belief, focusing on the fact that the DWT has an anti-aliasing filter and the perfect reconstruction property, we design the proposed layers. Experiments on music source separation show the efficacy of the proposed method and the importance of simultaneously considering the anti-aliasing filters and the perfect reconstruction property.
Tasks	Music Source Separation
Published	2020-01-28
URL	https://arxiv.org/abs/2001.10190v1
PDF	https://arxiv.org/pdf/2001.10190v1.pdf
PWC	https://paperswithcode.com/paper/time-domain-audio-source-separation-based-on
Repo
Framework