Paper Group ANR 1719
Learning a Generic Adaptive Wavelet Shrinkage Function for Denoising. CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion. Hybrid Kronecker Product Decomposition and Approximation. A superpixel-driven deep learning approach for the analysis of dermatological wounds. A note on the quasiconvex Jensen divergences and the quasiconvex Br …
Learning a Generic Adaptive Wavelet Shrinkage Function for Denoising
Title | Learning a Generic Adaptive Wavelet Shrinkage Function for Denoising |
Authors | Tobias Alt, Joachim Weickert |
Abstract | The rise of machine learning in image processing has created a gap between trainable data-driven and classical model-driven approaches: While learning-based models often show superior performance, classical ones are often more transparent. To reduce this gap, we introduce a generic wavelet shrinkage function for denoising which is adaptive to both the wavelet scales as well as the noise standard deviation. It is inferred from trained results of a tightly parametrised function which is inherited from nonlinear diffusion. Our proposed shrinkage function is smooth and compact while only using two parameters. In contrast to many existing shrinkage functions, it is able to enhance image structures by amplifying wavelet coefficients. Experiments show that it outperforms classical shrinkage functions by a significant margin. |
Tasks | Denoising |
Published | 2019-10-21 |
URL | https://arxiv.org/abs/1910.09234v2 |
https://arxiv.org/pdf/1910.09234v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-a-generic-adaptive-wavelet-shrinkage |
Repo | |
Framework | |
CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion
Title | CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion |
Authors | Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo |
Abstract | Non-parallel voice conversion (VC) is a technique for learning the mapping from source to target speech without relying on parallel data. This is an important task, but it has been challenging due to the disadvantages of the training conditions. Recently, CycleGAN-VC has provided a breakthrough and performed comparably to a parallel VC method without relying on any extra data, modules, or time alignment procedures. However, there is still a large gap between the real target and converted speech, and bridging this gap remains a challenge. To reduce this gap, we propose CycleGAN-VC2, which is an improved version of CycleGAN-VC incorporating three new techniques: an improved objective (two-step adversarial losses), improved generator (2-1-2D CNN), and improved discriminator (PatchGAN). We evaluated our method on a non-parallel VC task and analyzed the effect of each technique in detail. An objective evaluation showed that these techniques help bring the converted feature sequence closer to the target in terms of both global and local structures, which we assess by using Mel-cepstral distortion and modulation spectra distance, respectively. A subjective evaluation showed that CycleGAN-VC2 outperforms CycleGAN-VC in terms of naturalness and similarity for every speaker pair, including intra-gender and inter-gender pairs. |
Tasks | Voice Conversion |
Published | 2019-04-09 |
URL | http://arxiv.org/abs/1904.04631v1 |
http://arxiv.org/pdf/1904.04631v1.pdf | |
PWC | https://paperswithcode.com/paper/cyclegan-vc2-improved-cyclegan-based-non |
Repo | |
Framework | |
Hybrid Kronecker Product Decomposition and Approximation
Title | Hybrid Kronecker Product Decomposition and Approximation |
Authors | Chencheng Cai, Rong Chen, Han Xiao |
Abstract | Discovering the underlying low dimensional structure of high dimensional data has attracted a significant amount of researches recently and has shown to have a wide range of applications. As an effective dimension reduction tool, singular value decomposition is often used to analyze high dimensional matrices, which are traditionally assumed to have a low rank matrix approximation. In this paper, we propose a new approach. We assume a high dimensional matrix can be approximated by a sum of a small number of Kronecker products of matrices with potentially different configurations, named as a hybird Kronecker outer Product Approximation (hKoPA). It provides an extremely flexible way of dimension reduction compared to the low-rank matrix approximation. Challenges arise in estimating a hKoPA when the configurations of component Kronecker products are different or unknown. We propose an estimation procedure when the set of configurations are given and a joint configuration determination and component estimation procedure when the configurations are unknown. Specifically, a least squares backfitting algorithm is used when the configuration is given. When the configuration is unknown, an iterative greedy algorithm is used. Both simulation and real image examples show that the proposed algorithms have promising performances. The hybrid Kronecker product approximation may have potentially wider applications in low dimensional representation of high dimensional data |
Tasks | Dimensionality Reduction |
Published | 2019-12-06 |
URL | https://arxiv.org/abs/1912.02955v1 |
https://arxiv.org/pdf/1912.02955v1.pdf | |
PWC | https://paperswithcode.com/paper/hybrid-kronecker-product-decomposition-and |
Repo | |
Framework | |
A superpixel-driven deep learning approach for the analysis of dermatological wounds
Title | A superpixel-driven deep learning approach for the analysis of dermatological wounds |
Authors | Gustavo Blanco, Agma J. M. Traina, Caetano Traina Jr., Paulo M. Azevedo-Marques, Ana E. S. Jorge, Daniel de Oliveira, Marcos V. N. Bedo |
Abstract | Background. The image-based identification of distinct tissues within dermatological wounds enhances patients’ care since it requires no intrusive evaluations. This manuscript presents an approach, we named QTDU, that combines deep learning models with superpixel-driven segmentation methods for assessing the quality of tissues from dermatological ulcers. Method. QTDU consists of a three-stage pipeline for the obtaining of ulcer segmentation, tissues’ labeling, and wounded area quantification. We set up our approach by using a real and annotated set of dermatological ulcers for training several deep learning models to the identification of ulcered superpixels. Results. Empirical evaluations on 179,572 superpixels divided into four classes showed QTDU accurately spot wounded tissues (AUC = 0.986, sensitivity = 0.97, and specificity = 0.974) and outperformed machine-learning approaches in up to 8.2% regarding F1-Score through fine-tuning of a ResNet-based model. Last, but not least, experimental evaluations also showed QTDU correctly quantified wounded tissue areas within a 0.089 Mean Absolute Error ratio. Conclusions. Results indicate QTDU effectiveness for both tissue segmentation and wounded area quantification tasks. When compared to existing machine-learning approaches, the combination of superpixels and deep learning models outperformed the competitors within strong significant levels. |
Tasks | |
Published | 2019-09-13 |
URL | https://arxiv.org/abs/1909.06264v2 |
https://arxiv.org/pdf/1909.06264v2.pdf | |
PWC | https://paperswithcode.com/paper/a-superpixel-driven-deep-learning-approach |
Repo | |
Framework | |
A note on the quasiconvex Jensen divergences and the quasiconvex Bregman divergences derived thereof
Title | A note on the quasiconvex Jensen divergences and the quasiconvex Bregman divergences derived thereof |
Authors | Frank Nielsen, Gaëtan Hadjeres |
Abstract | We first introduce the class of strictly quasiconvex and strictly quasiconcave Jensen divergences which are oriented (asymmetric) distances, and study some of their properties. We then define the strictly quasiconvex Bregman divergences as the limit case of scaled and skewed quasiconvex Jensen divergences, and report a simple closed-form formula which shows that these divergences are only pseudo-divergences at countably many inflection points of the generators. To remedy this problem, we propose the $\delta$-averaged quasiconvex Bregman divergences which integrate the pseudo-divergences over a small neighborhood in order obtain a proper divergence. The formula of $\delta$-averaged quasiconvex Bregman divergences extend even to non-differentiable strictly quasiconvex generators. These quasiconvex Bregman divergences between distinct elements have the property to always have one orientation finite while the other orientation is infinite. We show that these quasiconvex Bregman divergences can also be interpreted as limit cases of generalized skewed Jensen divergences with respect to comparative convexity by using power means. Finally, we illustrate how these quasiconvex Bregman divergences naturally appear as equivalent divergences for the Kullback-Leibler divergences between probability densities belonging to a same parametric family of distributions with nested supports. |
Tasks | |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.08857v2 |
https://arxiv.org/pdf/1909.08857v2.pdf | |
PWC | https://paperswithcode.com/paper/a-note-on-the-quasiconvex-jensen-divergences |
Repo | |
Framework | |
GAN-based Generation and Automatic Selection of Explanations for Neural Networks
Title | GAN-based Generation and Automatic Selection of Explanations for Neural Networks |
Authors | Saumitra Mishra, Daniel Stoller, Emmanouil Benetos, Bob L. Sturm, Simon Dixon |
Abstract | One way to interpret trained deep neural networks (DNNs) is by inspecting characteristics that neurons in the model respond to, such as by iteratively optimising the model input (e.g., an image) to maximally activate specific neurons. However, this requires a careful selection of hyper-parameters to generate interpretable examples for each neuron of interest, and current methods rely on a manual, qualitative evaluation of each setting, which is prohibitively slow. We introduce a new metric that uses Fr'echet Inception Distance (FID) to encourage similarity between model activations for real and generated data. This provides an efficient way to evaluate a set of generated examples for each setting of hyper-parameters. We also propose a novel GAN-based method for generating explanations that enables an efficient search through the input space and imposes a strong prior favouring realistic outputs. We apply our approach to a classification model trained to predict whether a music audio recording contains singing voice. Our results suggest that this proposed metric successfully selects hyper-parameters leading to interpretable examples, avoiding the need for manual evaluation. Moreover, we see that examples synthesised to maximise or minimise the predicted probability of singing voice presence exhibit vocal or non-vocal characteristics, respectively, suggesting that our approach is able to generate suitable explanations for understanding concepts learned by a neural network. |
Tasks | |
Published | 2019-04-21 |
URL | http://arxiv.org/abs/1904.09533v2 |
http://arxiv.org/pdf/1904.09533v2.pdf | |
PWC | https://paperswithcode.com/paper/gan-based-generation-and-automatic-selection |
Repo | |
Framework | |
Learning the Arrow of Time
Title | Learning the Arrow of Time |
Authors | Nasim Rahaman, Steffen Wolf, Anirudh Goyal, Roman Remme, Yoshua Bengio |
Abstract | We humans seem to have an innate understanding of the asymmetric progression of time, which we use to efficiently and safely perceive and manipulate our environment. Drawing inspiration from that, we address the problem of learning an arrow of time in a Markov (Decision) Process. We illustrate how a learned arrow of time can capture meaningful information about the environment, which in turn can be used to measure reachability, detect side-effects and to obtain an intrinsic reward signal. We show empirical results on a selection of discrete and continuous environments, and demonstrate for a class of stochastic processes that the learned arrow of time agrees reasonably well with a known notion of an arrow of time given by the celebrated Jordan-Kinderlehrer-Otto result. |
Tasks | |
Published | 2019-07-02 |
URL | https://arxiv.org/abs/1907.01285v1 |
https://arxiv.org/pdf/1907.01285v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-the-arrow-of-time |
Repo | |
Framework | |
AI Pipeline - bringing AI to you. End-to-end integration of data, algorithms and deployment tools
Title | AI Pipeline - bringing AI to you. End-to-end integration of data, algorithms and deployment tools |
Authors | Miguel de Prado, Jing Su, Rozenn Dahyot, Rabia Saeed, Lorenzo Keller, Noelia Vallez |
Abstract | Next generation of embedded Information and Communication Technology (ICT) systems are interconnected collaborative intelligent systems able to perform autonomous tasks. Training and deployment of such systems on Edge devices however require a fine-grained integration of data and tools to achieve high accuracy and overcome functional and non-functional requirements. In this work, we present a modular AI pipeline as an integrating framework to bring data, algorithms and deployment tools together. By these means, we are able to interconnect the different entities or stages of particular systems and provide an end-to-end development of AI products. We demonstrate the effectiveness of the AI pipeline by solving an Automatic Speech Recognition challenge and we show that all the steps leading to an end-to-end development for Key-word Spotting tasks: importing, partitioning and pre-processing of speech data, training of different neural network architectures and their deployment on heterogeneous embedded platforms. |
Tasks | Speech Recognition |
Published | 2019-01-15 |
URL | http://arxiv.org/abs/1901.05049v1 |
http://arxiv.org/pdf/1901.05049v1.pdf | |
PWC | https://paperswithcode.com/paper/ai-pipeline-bringing-ai-to-you-end-to-end |
Repo | |
Framework | |
Probing Contextualized Sentence Representations with Visual Awareness
Title | Probing Contextualized Sentence Representations with Visual Awareness |
Authors | Zhuosheng Zhang, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Hai Zhao |
Abstract | We present a universal framework to model contextualized sentence representations with visual awareness that is motivated to overcome the shortcomings of the multimodal parallel data with manual annotations. For each sentence, we first retrieve a diversity of images from a shared cross-modal embedding space, which is pre-trained on a large-scale of text-image pairs. Then, the texts and images are respectively encoded by transformer encoder and convolutional neural network. The two sequences of representations are further fused by a simple and effective attention layer. The architecture can be easily applied to text-only natural language processing tasks without manually annotating multimodal parallel corpora. We apply the proposed method on three tasks, including neural machine translation, natural language inference and sequence labeling and experimental results verify the effectiveness. |
Tasks | Machine Translation, Natural Language Inference |
Published | 2019-11-07 |
URL | https://arxiv.org/abs/1911.02971v1 |
https://arxiv.org/pdf/1911.02971v1.pdf | |
PWC | https://paperswithcode.com/paper/probing-contextualized-sentence |
Repo | |
Framework | |
Multi-Scale Self-Attention for Text Classification
Title | Multi-Scale Self-Attention for Text Classification |
Authors | Qipeng Guo, Xipeng Qiu, Pengfei Liu, Xiangyang Xue, Zheng Zhang |
Abstract | In this paper, we introduce the prior knowledge, multi-scale structure, into self-attention modules. We propose a Multi-Scale Transformer which uses multi-scale multi-head self-attention to capture features from different scales. Based on the linguistic perspective and the analysis of pre-trained Transformer (BERT) on a huge corpus, we further design a strategy to control the scale distribution for each layer. Results of three different kinds of tasks (21 datasets) show our Multi-Scale Transformer outperforms the standard Transformer consistently and significantly on small and moderate size datasets. |
Tasks | Text Classification |
Published | 2019-12-02 |
URL | https://arxiv.org/abs/1912.00544v1 |
https://arxiv.org/pdf/1912.00544v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-scale-self-attention-for-text |
Repo | |
Framework | |
Malware Classification using Deep Learning based Feature Extraction and Wrapper based Feature Selection Technique
Title | Malware Classification using Deep Learning based Feature Extraction and Wrapper based Feature Selection Technique |
Authors | Muhammad Furqan Rafique, Muhammad Ali, Aqsa Saeed Qureshi, Asifullah Khan, Anwar Majid Mirza |
Abstract | In case of behavior analysis of a malware, categorization of malicious files is an essential part after malware detection. Numerous static and dynamic techniques have been reported so far for categorizing malwares. This research work presents a deep learning based malware detection (DLMD) technique based on static methods for classifying different malware families. The proposed DLMD technique uses both the byte and ASM files for feature engineering and thus classifying malwares families. First, features are extracted from byte files using two different types of Deep Convolutional Neural Networks (CNN). After that, important and discriminative opcode features are selected using a wrapper-based mechanism, where Support Vector Machine (SVM) is used as a classifier. The idea is to construct a hybrid feature space by combining the different feature spaces in order that the shortcoming of a particular feature space may be overcome by another feature space. And consequently to reduce the chances of missing a malware. Finally, the hybrid feature space is then used to train a Multilayer Perceptron, which classifies all the nine different malware families. Experimental results show that proposed DLMD technique achieves log-loss of 0.09 for ten independent runs. Moreover, the performance of the proposed DLMD technique is compared against different classifiers and shows its effectiveness in categorizing malwares. The relevant code and database can be found at https://github.com/cyberhunters/Malware-Detection-Using-Machine-Learning. |
Tasks | Feature Engineering, Feature Selection, Malware Classification, Malware Detection |
Published | 2019-10-24 |
URL | https://arxiv.org/abs/1910.10958v2 |
https://arxiv.org/pdf/1910.10958v2.pdf | |
PWC | https://paperswithcode.com/paper/malware-classification-using-deep-learning |
Repo | |
Framework | |
Joint Modeling of Local and Global Temporal Dynamics for Multivariate Time Series Forecasting with Missing Values
Title | Joint Modeling of Local and Global Temporal Dynamics for Multivariate Time Series Forecasting with Missing Values |
Authors | Xianfeng Tang, Huaxiu Yao, Yiwei Sun, Charu Aggarwal, Prasenjit Mitra, Suhang Wang |
Abstract | Multivariate time series (MTS) forecasting is widely used in various domains, such as meteorology and traffic. Due to limitations on data collection, transmission, and storage, real-world MTS data usually contains missing values, making it infeasible to apply existing MTS forecasting models such as linear regression and recurrent neural networks. Though many efforts have been devoted to this problem, most of them solely rely on local dependencies for imputing missing values, which ignores global temporal dynamics. Local dependencies/patterns would become less useful when the missing ratio is high, or the data have consecutive missing values; while exploring global patterns can alleviate such problems. Thus, jointly modeling local and global temporal dynamics is very promising for MTS forecasting with missing values. However, work in this direction is rather limited. Therefore, we study a novel problem of MTS forecasting with missing values by jointly exploring local and global temporal dynamics. We propose a new framework LGnet, which leverages memory network to explore global patterns given estimations from local perspectives. We further introduce adversarial training to enhance the modeling of global temporal distribution. Experimental results on real-world datasets show the effectiveness of LGnet for MTS forecasting with missing values and its robustness under various missing ratios. |
Tasks | Multivariate Time Series Forecasting, Time Series, Time Series Forecasting |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.10273v1 |
https://arxiv.org/pdf/1911.10273v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-modeling-of-local-and-global-temporal |
Repo | |
Framework | |
Generative Machine Learning for Robust Free-Space Communication
Title | Generative Machine Learning for Robust Free-Space Communication |
Authors | Sanjaya Lohani, Ryan T. Glasser |
Abstract | Realistic free-space optical communications systems suffer from turbulent propagation of light through the atmosphere and detector noise at the receiver, which can significantly degrade the optical mode quality of the received state, increase cross-talk between modes, and correspondingly increase the symbol error ratio (SER) of the system. In order to overcome these obstacles, we develop a state-of-the-art generative machine learning (GML) and convolutional neural network (CNN) system in combination, and demonstrate its efficacy in a free-space optical (FSO) communications setting. The system corrects for the distortion effects due to turbulence and reduces detector noise, resulting in significantly lowered SERs and cross-talk at the output of the receiver, while requiring no feedback. This scheme is straightforward to scale, and may provide a concrete and cost effective technique to establishing long range classical and quantum communication links in the near future. |
Tasks | |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02249v1 |
https://arxiv.org/pdf/1909.02249v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-machine-learning-for-robust-free |
Repo | |
Framework | |
Document Rectification and Illumination Correction using a Patch-based CNN
Title | Document Rectification and Illumination Correction using a Patch-based CNN |
Authors | Xiaoyu Li, Bo Zhang, Jing Liao, Pedro V. Sander |
Abstract | We propose a novel learning method to rectify document images with various distortion types from a single input image. As opposed to previous learning-based methods, our approach seeks to first learn the distortion flow on input image patches rather than the entire image. We then present a robust technique to stitch the patch results into the rectified document by processing in the gradient domain. Furthermore, we propose a second network to correct the uneven illumination, further improving the readability and OCR accuracy. Due to the less complex distortion present on the smaller image patches, our patch-based approach followed by stitching and illumination correction can significantly improve the overall accuracy in both the synthetic and real datasets. |
Tasks | Optical Character Recognition |
Published | 2019-09-20 |
URL | https://arxiv.org/abs/1909.09470v1 |
https://arxiv.org/pdf/1909.09470v1.pdf | |
PWC | https://paperswithcode.com/paper/document-rectification-and-illumination |
Repo | |
Framework | |
Predicting engagement in online social networks: Challenges and opportunities
Title | Predicting engagement in online social networks: Challenges and opportunities |
Authors | Farig Sadeque, Steven Bethard |
Abstract | Since the introduction of social media, user participation or engagement has received little research attention. In this survey article, we establish the notion of participation in social media and main challenges that researchers may face while exploring this phenomenon. We surveyed a handful of research articles that had been done in this area, and tried to extract, analyze and summarize the techniques performed by the researchers. We classified these works based on our task definitions, and explored the machine learning models that have been used for any kind of participation prediction. We also explored the vast amount of features that have been proven useful, and classified them into categories for better understanding and ease of re-implementation. We have found that the success of a technique mostly depends on the type of the network that has been researched on, and there is no universal machine learning algorithm or feature sets that works reasonably well in all types of social media. There is a lack of attempts in implementing state-of-the-art machine learning techniques like neural networks, and the possibility of transfer learning and domain adaptation has not been explored. |
Tasks | Domain Adaptation, Transfer Learning |
Published | 2019-07-11 |
URL | https://arxiv.org/abs/1907.05442v1 |
https://arxiv.org/pdf/1907.05442v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-engagement-in-online-social |
Repo | |
Framework | |