January 28, 2020

3676 words 18 mins read

Paper Group ANR 1007

The LIG system for the English-Czech Text Translation Task of IWSLT 2019. WOTBoost: Weighted Oversampling Technique in Boosting for imbalanced learning. ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation. A Hierarchical Two-tier Approach to Hyper-parameter Optimization in Reinfor …

The LIG system for the English-Czech Text Translation Task of IWSLT 2019


Title	The LIG system for the English-Czech Text Translation Task of IWSLT 2019
Authors	Loïc Vial, Benjamin Lecouteux, Didier Schwab, Hang Le, Laurent Besacier
Abstract	In this paper, we present our submission for the English to Czech Text Translation Task of IWSLT 2019. Our system aims to study how pre-trained language models, used as input embeddings, can improve a specialized machine translation system trained on few data. Therefore, we implemented a Transformer-based encoder-decoder neural system which is able to use the output of a pre-trained language model as input embeddings, and we compared its performance under three configurations: 1) without any pre-trained language model (constrained), 2) using a language model trained on the monolingual parts of the allowed English-Czech data (constrained), and 3) using a language model trained on a large quantity of external monolingual data (unconstrained). We used BERT as external pre-trained language model (configuration 3), and BERT architecture for training our own language model (configuration 2). Regarding the training data, we trained our MT system on a small quantity of parallel text: one set only consists of the provided MuST-C corpus, and the other set consists of the MuST-C corpus and the News Commentary corpus from WMT. We observed that using the external pre-trained BERT improves the scores of our system by +0.8 to +1.5 of BLEU on our development set, and +0.97 to +1.94 of BLEU on the test set. However, using our own language model trained only on the allowed parallel data seems to improve the machine translation performances only when the system is trained on the smallest dataset.
Tasks	Language Modelling, Machine Translation
Published	2019-11-07
URL	https://arxiv.org/abs/1911.02898v1
PDF	https://arxiv.org/pdf/1911.02898v1.pdf
PWC	https://paperswithcode.com/paper/the-lig-system-for-the-english-czech-text
Repo
Framework

WOTBoost: Weighted Oversampling Technique in Boosting for imbalanced learning


Title	WOTBoost: Weighted Oversampling Technique in Boosting for imbalanced learning
Authors	Wenhao Zhang, Ramin Ramezani, Arash Naeim
Abstract	Machine learning classifiers often stumble over imbalanced datasets where classes are not equally represented. This inherent bias towards the majority class may result in low accuracy in labeling minority class. Imbalanced learning is prevalent in many real-world applications, such as medical research, network intrusion detection, and fraud detection in credit card transactions, etc. A good number of research works have been reported to tackle this challenging problem. For example, Synthetic Minority Over-sampling TEchnique (SMOTE) and ADAptive SYNthetic sampling approach (ADASYN) use oversampling techniques to balance the skewed datasets. In this paper, we propose a novel method that combines a Weighted Oversampling Technique and ensemble Boosting method (WOTBoost) to improve the classification accuracy of minority data without sacrificing the accuracy of the majority class. WOTBoost adjusts its oversampling strategy at each round of boosting to synthesize more targeted minority data samples. The adjustment is enforced using a weighted distribution. We compare WOTBoost with other four classification models (i.e., decision tree, SMOTE + decision tree, ADASYN + decision tree, SMOTEBoost) extensively on 18 public accessible imbalanced datasets. WOTBoost achieves the best G mean on 6 datasets and highest AUC score on 7 datasets.
Tasks	Fraud Detection, Intrusion Detection, Network Intrusion Detection
Published	2019-10-17
URL	https://arxiv.org/abs/1910.07892v3
PDF	https://arxiv.org/pdf/1910.07892v3.pdf
PWC	https://paperswithcode.com/paper/wotboost-weighted-oversampling-technique-in
Repo
Framework

ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation


Title	ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation
Authors	J. Edward Hu, Rachel Rudinger, Matt Post, Benjamin Van Durme
Abstract	We present ParaBank, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality. Following the approach of ParaNMT, we train a Czech-English neural machine translation (NMT) system to generate novel paraphrases of English reference sentences. By adding lexical constraints to the NMT decoding procedure, however, we are able to produce multiple high-quality sentential paraphrases per source sentence, yielding an English paraphrase resource with more than 4 billion generated tokens and exhibiting greater lexical diversity. Using human judgments, we also demonstrate that ParaBank’s paraphrases improve over ParaNMT on both semantic similarity and fluency. Finally, we use ParaBank to train a monolingual NMT model with the same support for lexically-constrained decoding for sentence rewriting tasks.
Tasks	Machine Translation, Semantic Similarity, Semantic Textual Similarity
Published	2019-01-11
URL	http://arxiv.org/abs/1901.03644v1
PDF	http://arxiv.org/pdf/1901.03644v1.pdf
PWC	https://paperswithcode.com/paper/parabank-monolingual-bitext-generation-and
Repo
Framework

A Hierarchical Two-tier Approach to Hyper-parameter Optimization in Reinforcement Learning


Title	A Hierarchical Two-tier Approach to Hyper-parameter Optimization in Reinforcement Learning
Authors	Juan Cruz Barsce, Jorge A. Palombarini, Ernesto Martínez
Abstract	Optimization of hyper-parameters in reinforcement learning (RL) algorithms is a key task, because they determine how the agent will learn its policy by interacting with its environment, and thus what data is gathered. In this work, an approach that uses Bayesian optimization to perform a two-step optimization is proposed: first, categorical RL structure hyper-parameters are taken as binary variables and optimized with an acquisition function tailored for such variables. Then, at a lower level of abstraction, solution-level hyper-parameters are optimized by resorting to the expected improvement acquisition function, while using the best categorical hyper-parameters found in the optimization at the upper-level of abstraction. This two-tier approach is validated in a simulated control task. Results obtained are promising and open the way for more user-independent applications of reinforcement learning.
Tasks
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08332v1
PDF	https://arxiv.org/pdf/1909.08332v1.pdf
PWC	https://paperswithcode.com/paper/a-hierarchical-two-tier-approach-to-hyper
Repo
Framework

Are Transformers universal approximators of sequence-to-sequence functions?


Title	Are Transformers universal approximators of sequence-to-sequence functions?
Authors	Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar
Abstract	Despite the widespread adoption of Transformer models for NLP tasks, the expressive power of these models is not well-understood. In this paper, we establish that Transformer models are universal approximators of continuous permutation equivariant sequence-to-sequence functions with compact support, which is quite surprising given the amount of shared parameters in these models. Furthermore, using positional encodings, we circumvent the restriction of permutation equivariance, and show that Transformer models can universally approximate arbitrary continuous sequence-to-sequence functions on a compact domain. Interestingly, our proof techniques clearly highlight the different roles of the self-attention and the feed-forward layers in Transformers. In particular, we prove that fixed width self-attention layers can compute contextual mappings of the input sequences, playing a key role in the universal approximation property of Transformers. Based on this insight from our analysis, we consider other simpler alternatives to self-attention layers and empirically evaluate them.
Tasks
Published	2019-12-20
URL	https://arxiv.org/abs/1912.10077v2
PDF	https://arxiv.org/pdf/1912.10077v2.pdf
PWC	https://paperswithcode.com/paper/are-transformers-universal-approximators-of-1
Repo
Framework

Understanding the Disharmony between Weight Normalization Family and Weight Decay: $ε-$shifted $L_2$ Regularizer


Title	Understanding the Disharmony between Weight Normalization Family and Weight Decay: $ε-$shifted $L_2$ Regularizer
Authors	Li Xiang, Chen Shuo, Xia Yan, Yang Jian
Abstract	The merits of fast convergence and potentially better performance of the weight normalization family have drawn increasing attention in recent years. These methods use standardization or normalization that changes the weight $\boldsymbol{W}$ to $\boldsymbol{W}‘$, which makes $\boldsymbol{W}‘$ independent to the magnitude of $\boldsymbol{W}$. Surprisingly, $\boldsymbol{W}$ must be decayed during gradient descent, otherwise we will observe a severe under-fitting problem, which is very counter-intuitive since weight decay is widely known to prevent deep networks from over-fitting. In this paper, we \emph{theoretically} prove that the weight decay term $\frac{1}{2}\lambda{\boldsymbol{W}}^2$ merely modulates the effective learning rate for improving objective optimization, and has no influence on generalization when the weight normalization family is compositely employed. Furthermore, we also expose several critical problems when introducing weight decay term to weight normalization family, including the missing of global minimum and training instability. To address these problems, we propose an $\epsilon-$shifted $L_2$ regularizer, which shifts the $L_2$ objective by a positive constant $\epsilon$. Such a simple operation can theoretically guarantee the existence of global minimum, while preventing the network weights from being too small and thus avoiding gradient float overflow. It significantly improves the training stability and can achieve slightly better performance in our practice. The effectiveness of $\epsilon-$shifted $L_2$ regularizer is comprehensively validated on the ImageNet, CIFAR-100, and COCO datasets. Our codes and pretrained models will be released in https://github.com/implus/PytorchInsight.
Tasks
Published	2019-11-14
URL	https://arxiv.org/abs/1911.05920v1
PDF	https://arxiv.org/pdf/1911.05920v1.pdf
PWC	https://paperswithcode.com/paper/understanding-the-disharmony-between-weight
Repo
Framework

AFP-CKSAAP: Prediction of Antifreeze Proteins Using Composition of k-Spaced Amino Acid Pairs with Deep Neural Network


Title	AFP-CKSAAP: Prediction of Antifreeze Proteins Using Composition of k-Spaced Amino Acid Pairs with Deep Neural Network
Authors	Muhammad Usman, Jeong A Lee
Abstract	Antifreeze proteins (AFPs) are the sub-set of ice binding proteins indispensable for the species living in extreme cold weather. These proteins bind to the ice crystals, hindering their growth into large ice lattice that could cause physical damage. There are variety of AFPs found in numerous organisms and due to the heterogeneous sequence characteristics, AFPs are found to demonstrate a high degree of diversity, which makes their prediction a challenging task. Herein, we propose a machine learning framework to deal with this vigorous and diverse prediction problem using the manifolding learning through composition of k-spaced amino acid pairs. We propose to use the deep neural network with skipped connection and ReLU non-linearity to learn the non-linear mapping of protein sequence descriptor and class label. The proposed antifreeze protein prediction method called AFP-CKSAAP has shown to outperform the contemporary methods, achieving excellent prediction scores on standard dataset. The main evaluater for the performance of the proposed method in this study is Youden’s index whose high value is dependent on both sensitivity and specificity. In particular, AFP-CKSAAP yields a Youden’s index value of 0.82 on the independent dataset, which is better than previous methods.
Tasks
Published	2019-09-11
URL	https://arxiv.org/abs/1910.06392v1
PDF	https://arxiv.org/pdf/1910.06392v1.pdf
PWC	https://paperswithcode.com/paper/afp-cksaap-prediction-of-antifreeze-proteins
Repo
Framework

On Inversely Proportional Hypermutations with Mutation Potential


Title	On Inversely Proportional Hypermutations with Mutation Potential
Authors	Dogan Corus, Pietro S. Oliveto, Donya Yazdani
Abstract	Artificial Immune Systems (AIS) employing hypermutations with linear static mutation potential have recently been shown to be very effective at escaping local optima of combinatorial optimisation problems at the expense of being slower during the exploitation phase compared to standard evolutionary algorithms. In this paper we prove that considerable speed-ups in the exploitation phase may be achieved with dynamic inversely proportional mutation potentials (IPM) and argue that the potential should decrease inversely to the distance to the optimum rather than to the difference in fitness. Afterwards we define a simple (1+1)~Opt-IA, that uses IPM hypermutations and ageing, for realistic applications where optimal solutions are unknown. The aim of the AIS is to approximate the ideal behaviour of the inversely proportional hypermutations better and better as the search space is explored. We prove that such desired behaviour, and related speed-ups, occur for a well-studied bimodal benchmark function called \textsc{TwoMax}. Furthermore, we prove that the (1+1)~Opt-IA with IPM efficiently optimises a third bimodal function, \textsc{Cliff}, by escaping its local optima while Opt-IA with static potential cannot, thus requires exponential expected runtime in the distance between the cliff and the optimum.
Tasks
Published	2019-03-27
URL	http://arxiv.org/abs/1903.11674v1
PDF	http://arxiv.org/pdf/1903.11674v1.pdf
PWC	https://paperswithcode.com/paper/on-inversely-proportional-hypermutations-with
Repo
Framework

Stochastic Linear Bandits with Hidden Low Rank Structure


Title	Stochastic Linear Bandits with Hidden Low Rank Structure
Authors	Sahin Lale, Kamyar Azizzadenesheli, Anima Anandkumar, Babak Hassibi
Abstract	High-dimensional representations often have a lower dimensional underlying structure. This is particularly the case in many decision making settings. For example, when the representation of actions is generated from a deep neural network, it is reasonable to expect a low-rank structure whereas conventional structures like sparsity are not valid anymore. Subspace recovery methods, such as Principle Component Analysis (PCA) can find the underlying low-rank structures in the feature space and reduce the complexity of the learning tasks. In this work, we propose Projected Stochastic Linear Bandit (PSLB), an algorithm for high dimensional stochastic linear bandits (SLB) when the representation of actions has an underlying low-dimensional subspace structure. PSLB deploys PCA based projection to iteratively find the low rank structure in SLBs. We show that deploying projection methods assures dimensionality reduction and results in a tighter regret upper bound that is in terms of the dimensionality of the subspace and its properties, rather than the dimensionality of the ambient space. We modify the image classification task into the SLB setting and empirically show that, when a pre-trained DNN provides the high dimensional feature representations, deploying PSLB results in significant reduction of regret and faster convergence to an accurate model compared to state-of-art algorithm.
Tasks	Decision Making, Dimensionality Reduction, Image Classification
Published	2019-01-28
URL	http://arxiv.org/abs/1901.09490v1
PDF	http://arxiv.org/pdf/1901.09490v1.pdf
PWC	https://paperswithcode.com/paper/stochastic-linear-bandits-with-hidden-low
Repo
Framework

Optimized Tracking of Topic Evolution


Title	Optimized Tracking of Topic Evolution
Authors	Patrick Kiss, Elaheh Momeni
Abstract	Topic evolution modeling has been researched for a long time and has gained considerable interest. A state-of-the-art method has been recently using word modeling algorithms in combination with community detection mechanisms to achieve better results in a more effective way. We analyse results of this approach and discuss the two major challenges that this approach still faces. Although the topics that have resulted from the recent algorithm are good in general, they are very noisy due to many topics that are very unimportant because of their size, words, or ambiguity. Additionally, the number of words defining each topic is too large, making it difficult to analyse them in their unsorted state. In this paper, we propose approaches to tackle these challenges by adding topic filtering and network analysis metrics to define the importance of a topic. We test different combinations of these metrics to see which combination yields the best results. Furthermore, we add word filtering and ranking to each topic to identify the words with the highest novelty automatically. We evaluate our enhancement methods in two ways: human qualitative evaluation and automatic quantitative evaluation. Moreover, we created two case studies to test the quality of the clusters and words. In the quantitative evaluation, we use the pairwise mutual information score to test the coherency of topics. The quantitative evaluation also includes an analysis of execution times for each part of the program. The results of the experimental evaluations show that the two evaluation methods agree on the positive feasibility of the algorithm. We then show possible extensions in the form of usability and future improvements to the algorithm.
Tasks	Community Detection
Published	2019-12-16
URL	https://arxiv.org/abs/1912.07419v1
PDF	https://arxiv.org/pdf/1912.07419v1.pdf
PWC	https://paperswithcode.com/paper/optimized-tracking-of-topic-evolution
Repo
Framework

Vision-Infused Deep Audio Inpainting


Title	Vision-Infused Deep Audio Inpainting
Authors	Hang Zhou, Ziwei Liu, Xudong Xu, Ping Luo, Xiaogang Wang
Abstract	Multi-modality perception is essential to develop interactive intelligence. In this work, we consider a new task of visual information-infused audio inpainting, \ie synthesizing missing audio segments that correspond to their accompanying videos. We identify two key aspects for a successful inpainter: (1) It is desirable to operate on spectrograms instead of raw audios. Recent advances in deep semantic image inpainting could be leveraged to go beyond the limitations of traditional audio inpainting. (2) To synthesize visually indicated audio, a visual-audio joint feature space needs to be learned with synchronization of audio and video. To facilitate a large-scale study, we collect a new multi-modality instrument-playing dataset called MUSIC-Extra-Solo (MUSICES) by enriching MUSIC dataset. Extensive experiments demonstrate that our framework is capable of inpainting realistic and varying audio segments with or without visual contexts. More importantly, our synthesized audio segments are coherent with their video counterparts, showing the effectiveness of our proposed Vision-Infused Audio Inpainter (VIAI). Code, models, dataset and video results are available at https://hangz-nju-cuhk.github.io/projects/AudioInpainting
Tasks	Image Inpainting
Published	2019-10-24
URL	https://arxiv.org/abs/1910.10997v1
PDF	https://arxiv.org/pdf/1910.10997v1.pdf
PWC	https://paperswithcode.com/paper/vision-infused-deep-audio-inpainting-1
Repo
Framework

PPINN: Parareal Physics-Informed Neural Network for time-dependent PDEs


Title	PPINN: Parareal Physics-Informed Neural Network for time-dependent PDEs
Authors	Xuhui Meng, Zhen Li, Dongkun Zhang, George Em Karniadakis
Abstract	Physics-informed neural networks (PINNs) encode physical conservation laws and prior physical knowledge into the neural networks, ensuring the correct physics is represented accurately while alleviating the need for supervised learning to a great degree. While effective for relatively short-term time integration, when long time integration of the time-dependent PDEs is sought, the time-space domain may become arbitrarily large and hence training of the neural network may become prohibitively expensive. To this end, we develop a parareal physics-informed neural network (PPINN), hence decomposing a long-time problem into many independent short-time problems supervised by an inexpensive/fast coarse-grained (CG) solver. In particular, the serial CG solver is designed to provide approximate predictions of the solution at discrete times, while initiate many fine PINNs simultaneously to correct the solution iteratively. There is a two-fold benefit from training PINNs with small-data sets rather than working on a large-data set directly, i.e., training of individual PINNs with small-data is much faster, while training the fine PINNs can be readily parallelized. Consequently, compared to the original PINN approach, the proposed PPINN approach may achieve a significant speedup for long-time integration of PDEs, assuming that the CG solver is fast and can provide reasonable predictions of the solution, hence aiding the PPINN solution to converge in just a few iterations. To investigate the PPINN performance on solving time-dependent PDEs, we first apply the PPINN to solve the Burgers equation, and subsequently we apply the PPINN to solve a two-dimensional nonlinear diffusion-reaction equation. Our results demonstrate that PPINNs converge in a couple of iterations with significant speed-ups proportional to the number of time-subdomains employed.
Tasks
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10145v1
PDF	https://arxiv.org/pdf/1909.10145v1.pdf
PWC	https://paperswithcode.com/paper/190910145
Repo
Framework

Prediction of overall survival and molecular markers in gliomas via analysis of digital pathology images using deep learning


Title	Prediction of overall survival and molecular markers in gliomas via analysis of digital pathology images using deep learning
Authors	Saima Rathore, Muhammad Aksam Iftikhar, Zissimos Mourelatos
Abstract	Cancer histology reveals disease progression and associated molecular processes, and contains rich phenotypic information that is predictive of outcome. In this paper, we developed a computational approach based on deep learning to predict the overall survival and molecular subtypes of glioma patients from microscopic images of tissue biopsies, reflecting measures of microvascular proliferation, mitotic activity, nuclear atypia, and the presence of necrosis. Whole-slide images from 663 unique patients [IDH: 333 IDH-wildtype, 330 IDH-mutants, 1p/19q: 201 1p/19q non-codeleted, 129 1p/19q codeleted] were obtained from TCGA. Sub-images that were free of artifacts and that contained viable tumor with descriptive histologic characteristics were extracted, which were further used for training and testing a deep neural network. The output layer of the network was configured in two different ways: (i) a final Cox model layer to output a prediction of patient risk, and (ii) a final layer with sigmoid activation function, and stochastic gradient decent based optimization with binary cross-entropy loss. Both survival prediction and molecular subtype classification produced promising results using our model. The c-statistic was estimated to be 0.82 (p-value=4.8x10-5) between the risk scores of the proposed deep learning model and overall survival, while accuracies of 88% (area under the curve [AUC]=0.86) were achieved in the detection of IDH mutational status and 1p/19q codeletion. These findings suggest that the deep learning techniques can be applied to microscopic images for objective, accurate, and integrated prediction of outcome for glioma patients. The proposed marker may contribute to (i) stratification of patients into clinical trials, (ii) patient selection for targeted therapy, and (iii) personalized treatment planning.
Tasks
Published	2019-09-19
URL	https://arxiv.org/abs/1909.09124v1
PDF	https://arxiv.org/pdf/1909.09124v1.pdf
PWC	https://paperswithcode.com/paper/prediction-of-overall-survival-and-molecular
Repo
Framework

Deep Self-representative Concept Factorization Network for Representation Learning


Title	Deep Self-representative Concept Factorization Network for Representation Learning
Authors	Yan Zhang, Zhao Zhang, Zheng Zhang, Mingbo Zhao, Li Zhang, Zhengjun Zha, Meng Wang
Abstract	In this paper, we investigate the unsupervised deep representation learning issue and technically propose a novel framework called Deep Self-representative Concept Factorization Network (DSCF-Net), for clustering deep features. To improve the representation and clustering abilities, DSCF-Net explicitly considers discovering hidden deep semantic features, enhancing the robustness proper-ties of the deep factorization to noise and preserving the local man-ifold structures of deep features. Specifically, DSCF-Net seamlessly integrates the robust deep concept factorization, deep self-expressive representation and adaptive locality preserving feature learning into a unified framework. To discover hidden deep repre-sentations, DSCF-Net designs a hierarchical factorization architec-ture using multiple layers of linear transformations, where the hierarchical representation is performed by formulating the prob-lem as optimizing the basis concepts in each layer to improve the representation indirectly. DSCF-Net also improves the robustness by subspace recovery for sparse error correction firstly and then performs the deep factorization in the recovered visual subspace. To obtain locality-preserving representations, we also present an adaptive deep self-representative weighting strategy by using the coefficient matrix as the adaptive reconstruction weights to keep the locality of representations. Extensive comparison results with several other related models show that DSCF-Net delivers state-of-the-art performance on several public databases.
Tasks	Representation Learning
Published	2019-12-13
URL	https://arxiv.org/abs/1912.06444v4
PDF	https://arxiv.org/pdf/1912.06444v4.pdf
PWC	https://paperswithcode.com/paper/deep-self-representative-concept
Repo
Framework

SentiCite: An Approach for Publication Sentiment Analysis


Title	SentiCite: An Approach for Publication Sentiment Analysis
Authors	Dominique Mercier, Akansha Bhardwaj, Andreas Dengel, Sheraz Ahmed
Abstract	With the rapid growth in the number of scientific publications, year after year, it is becoming increasingly difficult to identify quality authoritative work on a single topic. Though there is an availability of scientometric measures which promise to offer a solution to this problem, these measures are mostly quantitative and rely, for instance, only on the number of times an article is cited. With this approach, it becomes irrelevant if an article is cited 10 times in a positive, negative or neutral way. In this context, it is quite important to study the qualitative aspect of a citation to understand its significance. This paper presents a novel system for sentiment analysis of citations in scientific documents (SentiCite) and is also capable of detecting nature of citations by targeting the motivation behind a citation, e.g., reference to a dataset, reading reference. Furthermore, the paper also presents two datasets (SentiCiteDB and IntentCiteDB) containing about 2,600 citations with their ground truth for sentiment and nature of citation. SentiCite along with other state-of-the-art methods for sentiment analysis are evaluated on the presented datasets. Evaluation results reveal that SentiCite outperforms state-of-the-art methods for sentiment analysis in scientific publications by achieving a F1-measure of 0.71.
Tasks	Sentiment Analysis
Published	2019-10-07
URL	https://arxiv.org/abs/1910.03498v1
PDF	https://arxiv.org/pdf/1910.03498v1.pdf
PWC	https://paperswithcode.com/paper/senticite-an-approach-for-publication
Repo
Framework