January 25, 2020

3177 words 15 mins read

Paper Group ANR 1719

Learning a Generic Adaptive Wavelet Shrinkage Function for Denoising. CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion. Hybrid Kronecker Product Decomposition and Approximation. A superpixel-driven deep learning approach for the analysis of dermatological wounds. A note on the quasiconvex Jensen divergences and the quasiconvex Br …

Learning a Generic Adaptive Wavelet Shrinkage Function for Denoising


Title	Learning a Generic Adaptive Wavelet Shrinkage Function for Denoising
Authors	Tobias Alt, Joachim Weickert
Abstract	The rise of machine learning in image processing has created a gap between trainable data-driven and classical model-driven approaches: While learning-based models often show superior performance, classical ones are often more transparent. To reduce this gap, we introduce a generic wavelet shrinkage function for denoising which is adaptive to both the wavelet scales as well as the noise standard deviation. It is inferred from trained results of a tightly parametrised function which is inherited from nonlinear diffusion. Our proposed shrinkage function is smooth and compact while only using two parameters. In contrast to many existing shrinkage functions, it is able to enhance image structures by amplifying wavelet coefficients. Experiments show that it outperforms classical shrinkage functions by a significant margin.
Tasks	Denoising
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09234v2
PDF	https://arxiv.org/pdf/1910.09234v2.pdf
PWC	https://paperswithcode.com/paper/learning-a-generic-adaptive-wavelet-shrinkage
Repo
Framework

CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion


Title	CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion
Authors	Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo
Abstract	Non-parallel voice conversion (VC) is a technique for learning the mapping from source to target speech without relying on parallel data. This is an important task, but it has been challenging due to the disadvantages of the training conditions. Recently, CycleGAN-VC has provided a breakthrough and performed comparably to a parallel VC method without relying on any extra data, modules, or time alignment procedures. However, there is still a large gap between the real target and converted speech, and bridging this gap remains a challenge. To reduce this gap, we propose CycleGAN-VC2, which is an improved version of CycleGAN-VC incorporating three new techniques: an improved objective (two-step adversarial losses), improved generator (2-1-2D CNN), and improved discriminator (PatchGAN). We evaluated our method on a non-parallel VC task and analyzed the effect of each technique in detail. An objective evaluation showed that these techniques help bring the converted feature sequence closer to the target in terms of both global and local structures, which we assess by using Mel-cepstral distortion and modulation spectra distance, respectively. A subjective evaluation showed that CycleGAN-VC2 outperforms CycleGAN-VC in terms of naturalness and similarity for every speaker pair, including intra-gender and inter-gender pairs.
Tasks	Voice Conversion
Published	2019-04-09
URL	http://arxiv.org/abs/1904.04631v1
PDF	http://arxiv.org/pdf/1904.04631v1.pdf
PWC	https://paperswithcode.com/paper/cyclegan-vc2-improved-cyclegan-based-non
Repo
Framework

Hybrid Kronecker Product Decomposition and Approximation


Title	Hybrid Kronecker Product Decomposition and Approximation
Authors	Chencheng Cai, Rong Chen, Han Xiao
Abstract	Discovering the underlying low dimensional structure of high dimensional data has attracted a significant amount of researches recently and has shown to have a wide range of applications. As an effective dimension reduction tool, singular value decomposition is often used to analyze high dimensional matrices, which are traditionally assumed to have a low rank matrix approximation. In this paper, we propose a new approach. We assume a high dimensional matrix can be approximated by a sum of a small number of Kronecker products of matrices with potentially different configurations, named as a hybird Kronecker outer Product Approximation (hKoPA). It provides an extremely flexible way of dimension reduction compared to the low-rank matrix approximation. Challenges arise in estimating a hKoPA when the configurations of component Kronecker products are different or unknown. We propose an estimation procedure when the set of configurations are given and a joint configuration determination and component estimation procedure when the configurations are unknown. Specifically, a least squares backfitting algorithm is used when the configuration is given. When the configuration is unknown, an iterative greedy algorithm is used. Both simulation and real image examples show that the proposed algorithms have promising performances. The hybrid Kronecker product approximation may have potentially wider applications in low dimensional representation of high dimensional data
Tasks	Dimensionality Reduction
Published	2019-12-06
URL	https://arxiv.org/abs/1912.02955v1
PDF	https://arxiv.org/pdf/1912.02955v1.pdf
PWC	https://paperswithcode.com/paper/hybrid-kronecker-product-decomposition-and
Repo
Framework

A superpixel-driven deep learning approach for the analysis of dermatological wounds


Title	A superpixel-driven deep learning approach for the analysis of dermatological wounds
Authors	Gustavo Blanco, Agma J. M. Traina, Caetano Traina Jr., Paulo M. Azevedo-Marques, Ana E. S. Jorge, Daniel de Oliveira, Marcos V. N. Bedo
Abstract	Background. The image-based identification of distinct tissues within dermatological wounds enhances patients’ care since it requires no intrusive evaluations. This manuscript presents an approach, we named QTDU, that combines deep learning models with superpixel-driven segmentation methods for assessing the quality of tissues from dermatological ulcers. Method. QTDU consists of a three-stage pipeline for the obtaining of ulcer segmentation, tissues’ labeling, and wounded area quantification. We set up our approach by using a real and annotated set of dermatological ulcers for training several deep learning models to the identification of ulcered superpixels. Results. Empirical evaluations on 179,572 superpixels divided into four classes showed QTDU accurately spot wounded tissues (AUC = 0.986, sensitivity = 0.97, and specificity = 0.974) and outperformed machine-learning approaches in up to 8.2% regarding F1-Score through fine-tuning of a ResNet-based model. Last, but not least, experimental evaluations also showed QTDU correctly quantified wounded tissue areas within a 0.089 Mean Absolute Error ratio. Conclusions. Results indicate QTDU effectiveness for both tissue segmentation and wounded area quantification tasks. When compared to existing machine-learning approaches, the combination of superpixels and deep learning models outperformed the competitors within strong significant levels.
Tasks
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06264v2
PDF	https://arxiv.org/pdf/1909.06264v2.pdf
PWC	https://paperswithcode.com/paper/a-superpixel-driven-deep-learning-approach
Repo
Framework

A note on the quasiconvex Jensen divergences and the quasiconvex Bregman divergences derived thereof


Title	A note on the quasiconvex Jensen divergences and the quasiconvex Bregman divergences derived thereof
Authors	Frank Nielsen, Gaëtan Hadjeres
Abstract	We first introduce the class of strictly quasiconvex and strictly quasiconcave Jensen divergences which are oriented (asymmetric) distances, and study some of their properties. We then define the strictly quasiconvex Bregman divergences as the limit case of scaled and skewed quasiconvex Jensen divergences, and report a simple closed-form formula which shows that these divergences are only pseudo-divergences at countably many inflection points of the generators. To remedy this problem, we propose the $\delta$-averaged quasiconvex Bregman divergences which integrate the pseudo-divergences over a small neighborhood in order obtain a proper divergence. The formula of $\delta$-averaged quasiconvex Bregman divergences extend even to non-differentiable strictly quasiconvex generators. These quasiconvex Bregman divergences between distinct elements have the property to always have one orientation finite while the other orientation is infinite. We show that these quasiconvex Bregman divergences can also be interpreted as limit cases of generalized skewed Jensen divergences with respect to comparative convexity by using power means. Finally, we illustrate how these quasiconvex Bregman divergences naturally appear as equivalent divergences for the Kullback-Leibler divergences between probability densities belonging to a same parametric family of distributions with nested supports.
Tasks
Published	2019-09-19
URL	https://arxiv.org/abs/1909.08857v2
PDF	https://arxiv.org/pdf/1909.08857v2.pdf
PWC	https://paperswithcode.com/paper/a-note-on-the-quasiconvex-jensen-divergences
Repo
Framework

GAN-based Generation and Automatic Selection of Explanations for Neural Networks


Title	GAN-based Generation and Automatic Selection of Explanations for Neural Networks
Authors	Saumitra Mishra, Daniel Stoller, Emmanouil Benetos, Bob L. Sturm, Simon Dixon
Abstract	One way to interpret trained deep neural networks (DNNs) is by inspecting characteristics that neurons in the model respond to, such as by iteratively optimising the model input (e.g., an image) to maximally activate specific neurons. However, this requires a careful selection of hyper-parameters to generate interpretable examples for each neuron of interest, and current methods rely on a manual, qualitative evaluation of each setting, which is prohibitively slow. We introduce a new metric that uses Fr'echet Inception Distance (FID) to encourage similarity between model activations for real and generated data. This provides an efficient way to evaluate a set of generated examples for each setting of hyper-parameters. We also propose a novel GAN-based method for generating explanations that enables an efficient search through the input space and imposes a strong prior favouring realistic outputs. We apply our approach to a classification model trained to predict whether a music audio recording contains singing voice. Our results suggest that this proposed metric successfully selects hyper-parameters leading to interpretable examples, avoiding the need for manual evaluation. Moreover, we see that examples synthesised to maximise or minimise the predicted probability of singing voice presence exhibit vocal or non-vocal characteristics, respectively, suggesting that our approach is able to generate suitable explanations for understanding concepts learned by a neural network.
Tasks
Published	2019-04-21
URL	http://arxiv.org/abs/1904.09533v2
PDF	http://arxiv.org/pdf/1904.09533v2.pdf
PWC	https://paperswithcode.com/paper/gan-based-generation-and-automatic-selection
Repo
Framework

Learning the Arrow of Time


Title	Learning the Arrow of Time
Authors	Nasim Rahaman, Steffen Wolf, Anirudh Goyal, Roman Remme, Yoshua Bengio
Abstract	We humans seem to have an innate understanding of the asymmetric progression of time, which we use to efficiently and safely perceive and manipulate our environment. Drawing inspiration from that, we address the problem of learning an arrow of time in a Markov (Decision) Process. We illustrate how a learned arrow of time can capture meaningful information about the environment, which in turn can be used to measure reachability, detect side-effects and to obtain an intrinsic reward signal. We show empirical results on a selection of discrete and continuous environments, and demonstrate for a class of stochastic processes that the learned arrow of time agrees reasonably well with a known notion of an arrow of time given by the celebrated Jordan-Kinderlehrer-Otto result.
Tasks
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01285v1
PDF	https://arxiv.org/pdf/1907.01285v1.pdf
PWC	https://paperswithcode.com/paper/learning-the-arrow-of-time
Repo
Framework

AI Pipeline - bringing AI to you. End-to-end integration of data, algorithms and deployment tools


Title	AI Pipeline - bringing AI to you. End-to-end integration of data, algorithms and deployment tools
Authors	Miguel de Prado, Jing Su, Rozenn Dahyot, Rabia Saeed, Lorenzo Keller, Noelia Vallez
Abstract	Next generation of embedded Information and Communication Technology (ICT) systems are interconnected collaborative intelligent systems able to perform autonomous tasks. Training and deployment of such systems on Edge devices however require a fine-grained integration of data and tools to achieve high accuracy and overcome functional and non-functional requirements. In this work, we present a modular AI pipeline as an integrating framework to bring data, algorithms and deployment tools together. By these means, we are able to interconnect the different entities or stages of particular systems and provide an end-to-end development of AI products. We demonstrate the effectiveness of the AI pipeline by solving an Automatic Speech Recognition challenge and we show that all the steps leading to an end-to-end development for Key-word Spotting tasks: importing, partitioning and pre-processing of speech data, training of different neural network architectures and their deployment on heterogeneous embedded platforms.
Tasks	Speech Recognition
Published	2019-01-15
URL	http://arxiv.org/abs/1901.05049v1
PDF	http://arxiv.org/pdf/1901.05049v1.pdf
PWC	https://paperswithcode.com/paper/ai-pipeline-bringing-ai-to-you-end-to-end
Repo
Framework

Probing Contextualized Sentence Representations with Visual Awareness


Title	Probing Contextualized Sentence Representations with Visual Awareness
Authors	Zhuosheng Zhang, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Hai Zhao
Abstract	We present a universal framework to model contextualized sentence representations with visual awareness that is motivated to overcome the shortcomings of the multimodal parallel data with manual annotations. For each sentence, we first retrieve a diversity of images from a shared cross-modal embedding space, which is pre-trained on a large-scale of text-image pairs. Then, the texts and images are respectively encoded by transformer encoder and convolutional neural network. The two sequences of representations are further fused by a simple and effective attention layer. The architecture can be easily applied to text-only natural language processing tasks without manually annotating multimodal parallel corpora. We apply the proposed method on three tasks, including neural machine translation, natural language inference and sequence labeling and experimental results verify the effectiveness.
Tasks	Machine Translation, Natural Language Inference
Published	2019-11-07
URL	https://arxiv.org/abs/1911.02971v1
PDF	https://arxiv.org/pdf/1911.02971v1.pdf
PWC	https://paperswithcode.com/paper/probing-contextualized-sentence
Repo
Framework

Multi-Scale Self-Attention for Text Classification


Title	Multi-Scale Self-Attention for Text Classification
Authors	Qipeng Guo, Xipeng Qiu, Pengfei Liu, Xiangyang Xue, Zheng Zhang
Abstract	In this paper, we introduce the prior knowledge, multi-scale structure, into self-attention modules. We propose a Multi-Scale Transformer which uses multi-scale multi-head self-attention to capture features from different scales. Based on the linguistic perspective and the analysis of pre-trained Transformer (BERT) on a huge corpus, we further design a strategy to control the scale distribution for each layer. Results of three different kinds of tasks (21 datasets) show our Multi-Scale Transformer outperforms the standard Transformer consistently and significantly on small and moderate size datasets.
Tasks	Text Classification
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00544v1
PDF	https://arxiv.org/pdf/1912.00544v1.pdf
PWC	https://paperswithcode.com/paper/multi-scale-self-attention-for-text
Repo
Framework

Malware Classification using Deep Learning based Feature Extraction and Wrapper based Feature Selection Technique


Title	Malware Classification using Deep Learning based Feature Extraction and Wrapper based Feature Selection Technique
Authors	Muhammad Furqan Rafique, Muhammad Ali, Aqsa Saeed Qureshi, Asifullah Khan, Anwar Majid Mirza
Abstract	In case of behavior analysis of a malware, categorization of malicious files is an essential part after malware detection. Numerous static and dynamic techniques have been reported so far for categorizing malwares. This research work presents a deep learning based malware detection (DLMD) technique based on static methods for classifying different malware families. The proposed DLMD technique uses both the byte and ASM files for feature engineering and thus classifying malwares families. First, features are extracted from byte files using two different types of Deep Convolutional Neural Networks (CNN). After that, important and discriminative opcode features are selected using a wrapper-based mechanism, where Support Vector Machine (SVM) is used as a classifier. The idea is to construct a hybrid feature space by combining the different feature spaces in order that the shortcoming of a particular feature space may be overcome by another feature space. And consequently to reduce the chances of missing a malware. Finally, the hybrid feature space is then used to train a Multilayer Perceptron, which classifies all the nine different malware families. Experimental results show that proposed DLMD technique achieves log-loss of 0.09 for ten independent runs. Moreover, the performance of the proposed DLMD technique is compared against different classifiers and shows its effectiveness in categorizing malwares. The relevant code and database can be found at https://github.com/cyberhunters/Malware-Detection-Using-Machine-Learning.
Tasks	Feature Engineering, Feature Selection, Malware Classification, Malware Detection
Published	2019-10-24
URL	https://arxiv.org/abs/1910.10958v2
PDF	https://arxiv.org/pdf/1910.10958v2.pdf
PWC	https://paperswithcode.com/paper/malware-classification-using-deep-learning
Repo
Framework

Joint Modeling of Local and Global Temporal Dynamics for Multivariate Time Series Forecasting with Missing Values


Title	Joint Modeling of Local and Global Temporal Dynamics for Multivariate Time Series Forecasting with Missing Values
Authors	Xianfeng Tang, Huaxiu Yao, Yiwei Sun, Charu Aggarwal, Prasenjit Mitra, Suhang Wang
Abstract	Multivariate time series (MTS) forecasting is widely used in various domains, such as meteorology and traffic. Due to limitations on data collection, transmission, and storage, real-world MTS data usually contains missing values, making it infeasible to apply existing MTS forecasting models such as linear regression and recurrent neural networks. Though many efforts have been devoted to this problem, most of them solely rely on local dependencies for imputing missing values, which ignores global temporal dynamics. Local dependencies/patterns would become less useful when the missing ratio is high, or the data have consecutive missing values; while exploring global patterns can alleviate such problems. Thus, jointly modeling local and global temporal dynamics is very promising for MTS forecasting with missing values. However, work in this direction is rather limited. Therefore, we study a novel problem of MTS forecasting with missing values by jointly exploring local and global temporal dynamics. We propose a new framework LGnet, which leverages memory network to explore global patterns given estimations from local perspectives. We further introduce adversarial training to enhance the modeling of global temporal distribution. Experimental results on real-world datasets show the effectiveness of LGnet for MTS forecasting with missing values and its robustness under various missing ratios.
Tasks	Multivariate Time Series Forecasting, Time Series, Time Series Forecasting
Published	2019-11-22
URL	https://arxiv.org/abs/1911.10273v1
PDF	https://arxiv.org/pdf/1911.10273v1.pdf
PWC	https://paperswithcode.com/paper/joint-modeling-of-local-and-global-temporal
Repo
Framework

Generative Machine Learning for Robust Free-Space Communication


Title	Generative Machine Learning for Robust Free-Space Communication
Authors	Sanjaya Lohani, Ryan T. Glasser
Abstract	Realistic free-space optical communications systems suffer from turbulent propagation of light through the atmosphere and detector noise at the receiver, which can significantly degrade the optical mode quality of the received state, increase cross-talk between modes, and correspondingly increase the symbol error ratio (SER) of the system. In order to overcome these obstacles, we develop a state-of-the-art generative machine learning (GML) and convolutional neural network (CNN) system in combination, and demonstrate its efficacy in a free-space optical (FSO) communications setting. The system corrects for the distortion effects due to turbulence and reduces detector noise, resulting in significantly lowered SERs and cross-talk at the output of the receiver, while requiring no feedback. This scheme is straightforward to scale, and may provide a concrete and cost effective technique to establishing long range classical and quantum communication links in the near future.
Tasks
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02249v1
PDF	https://arxiv.org/pdf/1909.02249v1.pdf
PWC	https://paperswithcode.com/paper/generative-machine-learning-for-robust-free
Repo
Framework

Document Rectification and Illumination Correction using a Patch-based CNN


Title	Document Rectification and Illumination Correction using a Patch-based CNN
Authors	Xiaoyu Li, Bo Zhang, Jing Liao, Pedro V. Sander
Abstract	We propose a novel learning method to rectify document images with various distortion types from a single input image. As opposed to previous learning-based methods, our approach seeks to first learn the distortion flow on input image patches rather than the entire image. We then present a robust technique to stitch the patch results into the rectified document by processing in the gradient domain. Furthermore, we propose a second network to correct the uneven illumination, further improving the readability and OCR accuracy. Due to the less complex distortion present on the smaller image patches, our patch-based approach followed by stitching and illumination correction can significantly improve the overall accuracy in both the synthetic and real datasets.
Tasks	Optical Character Recognition
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09470v1
PDF	https://arxiv.org/pdf/1909.09470v1.pdf
PWC	https://paperswithcode.com/paper/document-rectification-and-illumination
Repo
Framework


Title	Predicting engagement in online social networks: Challenges and opportunities
Authors	Farig Sadeque, Steven Bethard
Abstract	Since the introduction of social media, user participation or engagement has received little research attention. In this survey article, we establish the notion of participation in social media and main challenges that researchers may face while exploring this phenomenon. We surveyed a handful of research articles that had been done in this area, and tried to extract, analyze and summarize the techniques performed by the researchers. We classified these works based on our task definitions, and explored the machine learning models that have been used for any kind of participation prediction. We also explored the vast amount of features that have been proven useful, and classified them into categories for better understanding and ease of re-implementation. We have found that the success of a technique mostly depends on the type of the network that has been researched on, and there is no universal machine learning algorithm or feature sets that works reasonably well in all types of social media. There is a lack of attempts in implementing state-of-the-art machine learning techniques like neural networks, and the possibility of transfer learning and domain adaptation has not been explored.
Tasks	Domain Adaptation, Transfer Learning
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05442v1
PDF	https://arxiv.org/pdf/1907.05442v1.pdf
PWC	https://paperswithcode.com/paper/predicting-engagement-in-online-social
Repo
Framework