January 25, 2020

2969 words 14 mins read

Paper Group ANR 1737

Deep convolutional autoencoder for cryptocurrency market analysis. Clinical Text Generation through Leveraging Medical Concept and Relations. Privacy-Enhancing Context Authentication from Location-Sensitive Data. Convolutional Neural Network for Intrusion Detection System In Cyber Physical Systems. Learning Single Camera Depth Estimation using Dual …

Deep convolutional autoencoder for cryptocurrency market analysis


Title	Deep convolutional autoencoder for cryptocurrency market analysis
Authors	Vladimir Puzyrev
Abstract	This study attempts to analyze patterns in cryptocurrency markets using a special type of deep neural networks, namely a convolutional autoencoder. The method extracts the dominant features of market behavior and classifies the 40 studied cryptocurrencies into several classes for twelve 6-month periods starting from 15th May 2013. Transitions from one class to another with time are related to the maturement of cryptocurrencies. In speculative cryptocurrency markets, these findings have potential implications for investment and trading strategies.
Tasks
Published	2019-10-27
URL	https://arxiv.org/abs/1910.12281v1
PDF	https://arxiv.org/pdf/1910.12281v1.pdf
PWC	https://paperswithcode.com/paper/deep-convolutional-autoencoder-for
Repo
Framework

Clinical Text Generation through Leveraging Medical Concept and Relations


Title	Clinical Text Generation through Leveraging Medical Concept and Relations
Authors	Wangjin Lee, Hyeryun Park, Jooyoung Yoon, Kyeongmo Kim, Jinwook Choi
Abstract	With a neural sequence generation model, this study aims to develop a method of writing the patient clinical texts given a brief medical history. As a proof-of-a-concept, we have demonstrated that it can be workable to use medical concept embedding in clinical text generation. Our model was based on the Sequence-to-Sequence architecture and trained with a large set of de-identified clinical text data. The quantitative result shows that our concept embedding method decreased the perplexity of the baseline architecture. Also, we discuss the analyzed results from a human evaluation performed by medical doctors.
Tasks	Text Generation
Published	2019-10-02
URL	https://arxiv.org/abs/1910.00861v1
PDF	https://arxiv.org/pdf/1910.00861v1.pdf
PWC	https://paperswithcode.com/paper/clinical-text-generation-through-leveraging
Repo
Framework

Privacy-Enhancing Context Authentication from Location-Sensitive Data


Title	Privacy-Enhancing Context Authentication from Location-Sensitive Data
Authors	Pradip Mainali, Carlton Shepherd, Fabien A. P. Petitcolas
Abstract	This paper proposes a new privacy-enhancing, context-aware user authentication system, ConSec, which uses a transformation of general location-sensitive data, such as GPS location, barometric altitude and noise levels, collected from the user’s device, into a representation based on locality-sensitive hashing (LSH). The resulting hashes provide a dimensionality reduction of the underlying data, which we leverage to model users’ behaviour for authentication using machine learning. We present how ConSec supports learning from categorical and numerical data, while addressing a number of on-device and network-based threats. ConSec is implemented subsequently for the Android platform and evaluated using data collected from 35 users, which is followed by a security and privacy analysis. We demonstrate that LSH presents a useful approach for context authentication from location-sensitive data without directly utilising plain measurements.
Tasks	Dimensionality Reduction
Published	2019-04-18
URL	https://arxiv.org/abs/1904.08800v2
PDF	https://arxiv.org/pdf/1904.08800v2.pdf
PWC	https://paperswithcode.com/paper/privacy-enhancing-context-authentication-from
Repo
Framework

Convolutional Neural Network for Intrusion Detection System In Cyber Physical Systems


Title	Convolutional Neural Network for Intrusion Detection System In Cyber Physical Systems
Authors	Gael Kamdem De Teyou, Junior Ziazet
Abstract	The extensive use of Information and Communication Technology in critical infrastructures such as Industrial Control Systems make them vulnerable to cyber-attacks. One particular class of cyber-attacks is advanced persistent threats where highly skilled attackers can steal user authentication information’s and move in the network from host to host until a valuable target is reached. The detection of the attacker should occur as soon as possible in order to take appropriate response, otherwise the attacker will have enough time to reach sensitive assets. When facing intelligent threats, intelligent solutions have to be designed. Therefore, in this paper, we take advantage of recent progress in deep learning to build a convolutional neural networks that can detect intrusions in cyber physical system. The Intrusion Detection System is applied on the NSL-KDD dataset and the performances of the proposed approach are presented and compared with the state of art. Results show the effectiveness of the techniques.
Tasks	Intrusion Detection
Published	2019-05-08
URL	https://arxiv.org/abs/1905.03168v2
PDF	https://arxiv.org/pdf/1905.03168v2.pdf
PWC	https://paperswithcode.com/paper/convolutional-neural-network-for-intrusion
Repo
Framework

Learning Single Camera Depth Estimation using Dual-Pixels


Title	Learning Single Camera Depth Estimation using Dual-Pixels
Authors	Rahul Garg, Neal Wadhwa, Sameer Ansari, Jonathan T. Barron
Abstract	Deep learning techniques have enabled rapid progress in monocular depth estimation, but their quality is limited by the ill-posed nature of the problem and the scarcity of high quality datasets. We estimate depth from a single camera by leveraging the dual-pixel auto-focus hardware that is increasingly common on modern camera sensors. Classic stereo algorithms and prior learning-based depth estimation techniques under-perform when applied on this dual-pixel data, the former due to too-strong assumptions about RGB image matching, and the latter due to not leveraging the understanding of optics of dual-pixel image formation. To allow learning based methods to work well on dual-pixel imagery, we identify an inherent ambiguity in the depth estimated from dual-pixel cues, and develop an approach to estimate depth up to this ambiguity. Using our approach, existing monocular depth estimation techniques can be effectively applied to dual-pixel data, and much smaller models can be constructed that still infer high quality depth. To demonstrate this, we capture a large dataset of in-the-wild 5-viewpoint RGB images paired with corresponding dual-pixel data, and show how view supervision with this data can be used to learn depth up to the unknown ambiguities. On our new task, our model is 30% more accurate than any prior work on learning-based monocular or stereoscopic depth estimation.
Tasks	Depth Estimation, Monocular Depth Estimation
Published	2019-04-11
URL	https://arxiv.org/abs/1904.05822v3
PDF	https://arxiv.org/pdf/1904.05822v3.pdf
PWC	https://paperswithcode.com/paper/learning-single-camera-depth-estimation-using
Repo
Framework

Deep Radiomics for Brain Tumor Detection and Classification from Multi-Sequence MRI


Title	Deep Radiomics for Brain Tumor Detection and Classification from Multi-Sequence MRI
Authors	Subhashis Banerjee, Sushmita Mitra, Francesco Masulli, Stefano Rovetta
Abstract	Glioma constitutes 80% of malignant primary brain tumors and is usually classified as HGG and LGG. The LGG tumors are less aggressive, with slower growth rate as compared to HGG, and are responsive to therapy. Tumor biopsy being challenging for brain tumor patients, noninvasive imaging techniques like Magnetic Resonance Imaging (MRI) have been extensively employed in diagnosing brain tumors. Therefore automated systems for the detection and prediction of the grade of tumors based on MRI data becomes necessary for assisting doctors in the framework of augmented intelligence. In this paper, we thoroughly investigate the power of Deep ConvNets for classification of brain tumors using multi-sequence MR images. We propose novel ConvNet models, which are trained from scratch, on MRI patches, slices, and multi-planar volumetric slices. The suitability of transfer learning for the task is next studied by applying two existing ConvNets models (VGGNet and ResNet) trained on ImageNet dataset, through fine-tuning of the last few layers. LOPO testing, and testing on the holdout dataset are used to evaluate the performance of the ConvNets. Results demonstrate that the proposed ConvNets achieve better accuracy in all cases where the model is trained on the multi-planar volumetric dataset. Unlike conventional models, it obtains a testing accuracy of 95% for the low/high grade glioma classification problem. A score of 97% is generated for classification of LGG with/without 1p/19q codeletion, without any additional effort towards extraction and selection of features. We study the properties of self-learned kernels/ filters in different layers, through visualization of the intermediate layer outputs. We also compare the results with that of state-of-the-art methods, demonstrating a maximum improvement of 7% on the grading performance of ConvNets and 9% on the prediction of 1p/19q codeletion status.
Tasks	Transfer Learning
Published	2019-03-21
URL	http://arxiv.org/abs/1903.09240v1
PDF	http://arxiv.org/pdf/1903.09240v1.pdf
PWC	https://paperswithcode.com/paper/deep-radiomics-for-brain-tumor-detection-and
Repo
Framework

Fine-Grained Static Detection of Obfuscation Transforms Using Ensemble-Learning and Semantic Reasoning


Title	Fine-Grained Static Detection of Obfuscation Transforms Using Ensemble-Learning and Semantic Reasoning
Authors	Ramtine Tofighi-Shirazi, Irina Mariuca Asavoae, Philippe Elbaz-Vincent
Abstract	The ability to efficiently detect the software protections used is at a prime to facilitate the selection and application of adequate deob-fuscation techniques. We present a novel approach that combines semantic reasoning techniques with ensemble learning classification for the purpose of providing a static detection framework for obfuscation transformations. By contrast to existing work, we provide a methodology that can detect multiple layers of obfuscation, without depending on knowledge of the underlying functionality of the training-set used. We also extend our work to detect constructions of obfuscation transformations, thus providing a fine-grained methodology. To that end, we provide several studies for the best practices of the use of machine learning techniques for a scalable and efficient model. According to our experimental results and evaluations on obfuscators such as Tigress and OLLVM, our models have up to 91% accuracy on state-of-the-art obfuscation transformations. Our overall accuracies for their constructions are up to 100%.
Tasks
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07523v1
PDF	https://arxiv.org/pdf/1911.07523v1.pdf
PWC	https://paperswithcode.com/paper/fine-grained-static-detection-of-obfuscation
Repo
Framework

CLaRO: a Data-driven CNL for Specifying Competency Questions


Title	CLaRO: a Data-driven CNL for Specifying Competency Questions
Authors	C. Maria Keet, Zola Mahlaza, Mary-Jane Antia
Abstract	Competency Questions (CQs) for an ontology and similar artefacts aim to provide insights into the contents of an ontology and to demarcate its scope. The absence of a controlled natural language, tooling and automation to support the authoring of CQs has hampered their effective use in ontology development and evaluation. The few question templates that exists are based on informal analyses of a small number of CQs and have limited coverage of question types and sentence constructions. We aim to fill this gap by proposing a template-based CNL to author CQs, called CLaRO. For its design, we exploited a new dataset of 234 CQs that had been processed automatically into 106 patterns, which we analysed and used to design a template-based CNL, with an additional CNL model and XML serialisation. The CNL was evaluated with a subset of questions from the original dataset and with two sets of newly sourced CQs. The coverage of CLaRO, with its 93 main templates and 41 linguistic variants, is about 90% for unseen questions. CLaRO has the potential to facilitate streamlining formalising ontology content requirements and, given that about one third of the competency questions in the test sets turned out to be invalid questions, assist in writing good questions.
Tasks
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07378v1
PDF	https://arxiv.org/pdf/1907.07378v1.pdf
PWC	https://paperswithcode.com/paper/claro-a-data-driven-cnl-for-specifying
Repo
Framework

End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning


Title	End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning
Authors	Tao Tu, Yuan-Jui Chen, Cheng-chieh Yeh, Hung-yi Lee
Abstract	End-to-end text-to-speech (TTS) has shown great success on large quantities of paired text plus speech data. However, laborious data collection remains difficult for at least 95% of the languages over the world, which hinders the development of TTS in different languages. In this paper, we aim to build TTS systems for such low-resource (target) languages where only very limited paired data are available. We show such TTS can be effectively constructed by transferring knowledge from a high-resource (source) language. Since the model trained on source language cannot be directly applied to target language due to input space mismatch, we propose a method to learn a mapping between source and target linguistic symbols. Benefiting from this learned mapping, pronunciation information can be preserved throughout the transferring procedure. Preliminary experiments show that we only need around 15 minutes of paired data to obtain a relatively good TTS system. Furthermore, analytic studies demonstrated that the automatically discovered mapping correlate well with the phonetic expertise.
Tasks	Cross-Lingual Transfer, Transfer Learning
Published	2019-04-13
URL	https://arxiv.org/abs/1904.06508v2
PDF	https://arxiv.org/pdf/1904.06508v2.pdf
PWC	https://paperswithcode.com/paper/end-to-end-text-to-speech-for-low-resource
Repo
Framework

Cross-Domain Ambiguity Detection using Linear Transformation of Word Embedding Spaces


Title	Cross-Domain Ambiguity Detection using Linear Transformation of Word Embedding Spaces
Authors	Vaibhav Jain, Ruchika Malhotra, Sanskar Jain, Nishant Tanwar
Abstract	The requirements engineering process is a crucial stage of the software development life cycle. It involves various stakeholders from different professional backgrounds, particularly in the requirements elicitation phase. Each stakeholder carries distinct domain knowledge, causing them to differently interpret certain words, leading to cross-domain ambiguity. This can result in misunderstanding amongst them and jeopardize the entire project. This paper proposes a natural language processing approach to find potentially ambiguous words for a given set of domains. The idea is to apply linear transformations on word embedding models trained on different domain corpora, to bring them into a unified embedding space. The approach then finds words with divergent embeddings as they signify a variation in the meaning across the domains. It can help a requirements analyst in preventing misunderstandings during elicitation interviews and meetings by defining a set of potentially ambiguous terms in advance. The paper also discusses certain problems with the existing approaches and discusses how the proposed approach resolves them.
Tasks
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12956v3
PDF	https://arxiv.org/pdf/1910.12956v3.pdf
PWC	https://paperswithcode.com/paper/cross-domain-ambiguity-detection-using-linear
Repo
Framework

Weight Normalization based Quantization for Deep Neural Network Compression


Title	Weight Normalization based Quantization for Deep Neural Network Compression
Authors	Wen-Pu Cai, Wu-Jun Li
Abstract	With the development of deep neural networks, the size of network models becomes larger and larger. Model compression has become an urgent need for deploying these network models to mobile or embedded devices. Model quantization is a representative model compression technique. Although a lot of quantization methods have been proposed, many of them suffer from a high quantization error caused by a long-tail distribution of network weights. In this paper, we propose a novel quantization method, called weight normalization based quantization (WNQ), for model compression. WNQ adopts weight normalization to avoid the long-tail distribution of network weights and subsequently reduces the quantization error. Experiments on CIFAR-100 and ImageNet show that WNQ can outperform other baselines to achieve state-of-the-art performance.
Tasks	Model Compression, Neural Network Compression, Quantization
Published	2019-07-01
URL	https://arxiv.org/abs/1907.00593v1
PDF	https://arxiv.org/pdf/1907.00593v1.pdf
PWC	https://paperswithcode.com/paper/weight-normalization-based-quantization-for
Repo
Framework

Simple, Scalable Adaptation for Neural Machine Translation


Title	Simple, Scalable Adaptation for Neural Machine Translation
Authors	Ankur Bapna, Naveen Arivazhagan, Orhan Firat
Abstract	Fine-tuning pre-trained Neural Machine Translation (NMT) models is the dominant approach for adapting to new languages and domains. However, fine-tuning requires adapting and maintaining a separate model for each target task. We propose a simple yet efficient approach for adaptation in NMT. Our proposed approach consists of injecting tiny task specific adapter layers into a pre-trained model. These lightweight adapters, with just a small fraction of the original model size, adapt the model to multiple individual tasks simultaneously. We evaluate our approach on two tasks: (i) Domain Adaptation and (ii) Massively Multilingual NMT. Experiments on domain adaptation demonstrate that our proposed approach is on par with full fine-tuning on various domains, dataset sizes and model capacities. On a massively multilingual dataset of 103 languages, our adaptation approach bridges the gap between individual bilingual models and one massively multilingual model for most language pairs, paving the way towards universal machine translation.
Tasks	Domain Adaptation, Machine Translation
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08478v1
PDF	https://arxiv.org/pdf/1909.08478v1.pdf
PWC	https://paperswithcode.com/paper/simple-scalable-adaptation-for-neural-machine
Repo
Framework

Crowd Sourced Data Analysis: Mapping of Programming Concepts to Syntactical Patterns


Title	Crowd Sourced Data Analysis: Mapping of Programming Concepts to Syntactical Patterns
Authors	Deepak Thukral, Darvesh Punia
Abstract	Since programming concepts do not match their syntactic representations, code search is a very tedious task. For instance in Java or C, array doesn’t match [], so using “array” as a query, one cannot find what they are looking for. Often developers have to search code whether to understand any code, or to reuse some part of that code, or just to read it, without natural language searching, developers have to often scroll back and forth or use variable names as their queries. In our work, we have used Stackoverflow (SO) question and answers to make a mapping of programming concepts with their respective natural language keywords, and then tag these natural language terms to every line of code, which can further we used in searching using natural language keywords.
Tasks	Code Search
Published	2019-03-28
URL	http://arxiv.org/abs/1903.12495v1
PDF	http://arxiv.org/pdf/1903.12495v1.pdf
PWC	https://paperswithcode.com/paper/crowd-sourced-data-analysis-mapping-of
Repo
Framework

Annotated Guidelines and Building Reference Corpus for Myanmar-English Word Alignment


Title	Annotated Guidelines and Building Reference Corpus for Myanmar-English Word Alignment
Authors	Nway Nway Han, Aye Thida
Abstract	Reference corpus for word alignment is an important resource for developing and evaluating word alignment methods. For Myanmar-English language pairs, there is no reference corpus to evaluate the word alignment tasks. Therefore, we created the guidelines for Myanmar-English word alignment annotation between two languages over contrastive learning and built the Myanmar-English reference corpus consisting of verified alignments from Myanmar ALT of the Asian Language Treebank (ALT). This reference corpus contains confident labels sure (S) and possible (P) for word alignments which are used to test for the purpose of evaluation of the word alignments tasks. We discuss the most linking ambiguities to define consistent and systematic instructions to align manual words. We evaluated the results of annotators agreement using our reference corpus in terms of alignment error rate (AER) in word alignment tasks and discuss the words relationships in terms of BLEU scores.
Tasks	Word Alignment
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11288v1
PDF	https://arxiv.org/pdf/1909.11288v1.pdf
PWC	https://paperswithcode.com/paper/annotated-guidelines-and-building-reference
Repo
Framework

Jointly Learning to Align and Translate with Transformer Models


Title	Jointly Learning to Align and Translate with Transformer Models
Authors	Sarthak Garg, Stephan Peitz, Udhyakumar Nallasamy, Matthias Paulik
Abstract	The state of the art in machine translation (MT) is governed by neural approaches, which typically provide superior translation accuracy over statistical approaches. However, on the closely related task of word alignment, traditional statistical word alignment models often remain the go-to solution. In this paper, we present an approach to train a Transformer model to produce both accurate translations and alignments. We extract discrete alignments from the attention probabilities learnt during regular neural machine translation model training and leverage them in a multi-task framework to optimize towards translation and alignment objectives. We demonstrate that our approach produces competitive results compared to GIZA++ trained IBM alignment models without sacrificing translation accuracy and outperforms previous attempts on Transformer model based word alignment. Finally, by incorporating IBM model alignments into our multi-task training, we report significantly better alignment accuracies compared to GIZA++ on three publicly available data sets.
Tasks	Machine Translation, Word Alignment
Published	2019-09-04
URL	https://arxiv.org/abs/1909.02074v1
PDF	https://arxiv.org/pdf/1909.02074v1.pdf
PWC	https://paperswithcode.com/paper/jointly-learning-to-align-and-translate-with
Repo
Framework