January 31, 2020

2987 words 15 mins read

Paper Group ANR 184

Differentiable Sparsification for Deep Neural Networks. Gated neural networks for implied volatility surfaces. Effective Network Compression Using Simulation-Guided Iterative Pruning. AutoEncoders for Training Compact Deep Learning RF Classifiers for Wireless Protocols. Transformer Based Reinforcement Learning For Games. Turing-Completeness of Dyna …

Differentiable Sparsification for Deep Neural Networks


Title	Differentiable Sparsification for Deep Neural Networks
Authors	Yognjin Lee
Abstract	A deep neural network has relieved the burden of feature engineering by human experts, but comparable efforts are instead required to determine an effective architecture. On the other hands, as the size of a network has over-grown, a lot of resources are also invested to reduce its size. These problems can be addressed by sparsification of an over-complete model, which removes redundant parameters or connections by pruning them away after training or encouraging them to become zero during training. In general, however, these approaches are not fully differentiable and interrupt an end-to-end training process with the stochastic gradient descent in that they require either a parameter selection or a soft-thresholding step. In this paper, we propose a fully differentiable sparsification method for deep neural networks, which allows parameters to be exactly zero during training, and thus can learn the sparsified structure and the weights of networks simultaneously using the stochastic gradient descent. We apply the proposed method to various popular models in order to show its effectiveness.
Tasks	Feature Engineering
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03201v1
PDF	https://arxiv.org/pdf/1910.03201v1.pdf
PWC	https://paperswithcode.com/paper/differentiable-sparsification-for-deep-neural
Repo
Framework

Gated neural networks for implied volatility surfaces


Title	Gated neural networks for implied volatility surfaces
Authors	Yu Zheng, Yongxin Yang, Bowei Chen
Abstract	This paper presents a framework of developing neural networks to predict implied volatility surfaces. It can incorporate the related properties from existing mathematical models and empirical findings, including no static arbitrage, limiting boundaries, asymptotic slope and volatility smile. These properties are also satisfied empirically in our experiments with the option data on the S&P 500 index over 20 years. The developed neural network model outperforms the widely used surface stochastic volatility inspired (SSVI) model and other benchmarked neural network models on the mean average percentage error in both in-sample and out-of-sample datasets. This study has two major contributions. First, it contributes to the recent use of machine learning in finance, and an accurate deep learning implied volatility surface prediction model is obtained. Second, it provides the methodological guidance on how to seamlessly combine data-driven models with domain knowledge in the development of machine learning applications.
Tasks
Published	2019-04-29
URL	https://arxiv.org/abs/1904.12834v4
PDF	https://arxiv.org/pdf/1904.12834v4.pdf
PWC	https://paperswithcode.com/paper/gated-deep-neural-networks-for-implied
Repo
Framework

Effective Network Compression Using Simulation-Guided Iterative Pruning


Title	Effective Network Compression Using Simulation-Guided Iterative Pruning
Authors	Dae-Woong Jeong, Jaehun Kim, Youngseok Kim, Tae-Ho Kim, Myungsu Chae
Abstract	Existing high-performance deep learning models require very intensive computing. For this reason, it is difficult to embed a deep learning model into a system with limited resources. In this paper, we propose the novel idea of the network compression as a method to solve this limitation. The principle of this idea is to make iterative pruning more effective and sophisticated by simulating the reduced network. A simple experiment was conducted to evaluate the method; the results showed that the proposed method achieved higher performance than existing methods at the same pruning level.
Tasks
Published	2019-02-12
URL	http://arxiv.org/abs/1902.04224v1
PDF	http://arxiv.org/pdf/1902.04224v1.pdf
PWC	https://paperswithcode.com/paper/effective-network-compression-using
Repo
Framework

AutoEncoders for Training Compact Deep Learning RF Classifiers for Wireless Protocols


Title	AutoEncoders for Training Compact Deep Learning RF Classifiers for Wireless Protocols
Authors	Silvija Kokalj-Filipovic, Rob Miller, Joshua Morman
Abstract	We show that compact fully connected (FC) deep learning networks trained to classify wireless protocols using a hierarchy of multiple denoising autoencoders (AEs) outperform reference FC networks trained in a typical way, i.e., with a stochastic gradient based optimization of a given FC architecture. Not only is the complexity of such FC network, measured in number of trainable parameters and scalar multiplications, much lower than the reference FC and residual models, its accuracy also outperforms both models for nearly all tested SNR values (0 dB to 50dB). Such AE-trained networks are suited for in-situ protocol inference performed by simple mobile devices based on noisy signal measurements. Training is based on the data transmitted by real devices, and collected in a controlled environment, and systematically augmented by a policy-based data synthesis process by adding to the signal any subset of impairments commonly seen in a wireless receiver.
Tasks	Denoising
Published	2019-04-13
URL	http://arxiv.org/abs/1904.11874v1
PDF	http://arxiv.org/pdf/1904.11874v1.pdf
PWC	https://paperswithcode.com/paper/190411874
Repo
Framework

Transformer Based Reinforcement Learning For Games


Title	Transformer Based Reinforcement Learning For Games
Authors	Uddeshya Upadhyay, Nikunj Shah, Sucheta Ravikanti, Mayanka Medhe
Abstract	Recent times have witnessed sharp improvements in reinforcement learning tasks using deep reinforcement learning techniques like Deep Q Networks, Policy Gradients, Actor Critic methods which are based on deep learning based models and back-propagation of gradients to train such models. An active area of research in reinforcement learning is about training agents to play complex video games, which so far has been something accomplished only by human intelligence. Some state of the art performances in video game playing using deep reinforcement learning are obtained by processing the sequence of frames from video games, passing them through a convolutional network to obtain features and then using recurrent neural networks to figure out the action leading to optimal rewards. The recurrent neural network will learn to extract the meaningful signal out of the sequence of such features. In this work, we propose a method utilizing a transformer network which have recently replaced RNNs in Natural Language Processing (NLP), and perform experiments to compare with existing methods.
Tasks
Published	2019-12-09
URL	https://arxiv.org/abs/1912.03918v1
PDF	https://arxiv.org/pdf/1912.03918v1.pdf
PWC	https://paperswithcode.com/paper/transformer-based-reinforcement-learning-for
Repo
Framework

Turing-Completeness of Dynamics in Abstract Persuasion Argumentation


Title	Turing-Completeness of Dynamics in Abstract Persuasion Argumentation
Authors	Ryuta Arisaka
Abstract	Abstract Persuasion Argumentation (APA) is a dynamic argumentation formalism that extends Dung argumentation with persuasion relations. In this work, we show through two-counter Minsky machine encoding that APA dynamics is Turing-complete.
Tasks
Published	2019-03-19
URL	http://arxiv.org/abs/1903.07837v1
PDF	http://arxiv.org/pdf/1903.07837v1.pdf
PWC	https://paperswithcode.com/paper/turing-completeness-of-dynamics-in-abstract
Repo
Framework

Preventing Information Leakage with Neural Architecture Search


Title	Preventing Information Leakage with Neural Architecture Search
Authors	Shuang Zhang, Liyao Xiang, Congcong Li, Yixuan Wang, Zeyu Liu, Quanshi Zhang, Bo Li
Abstract	Powered by machine learning services in the cloud, numerous learning-driven mobile applications are gaining popularity in the market. As deep learning tasks are mostly computation-intensive, it has become a trend to process raw data on devices and send the neural network features to the cloud, whereas the part of the neural network residing in the cloud completes the task to return final results. However, there is always the potential for unexpected leakage with the release of features, with which an adversary could infer a significant amount of information about the original data. To address this problem, we propose a privacy-preserving deep learning framework on top of the mobile cloud infrastructure: the trained deep neural network is tailored to prevent information leakage through features while maintaining highly accurate results. In essence, we learn the strategy to prevent leakage by modifying the trained deep neural network against a generic opponent, who infers unintended information from released features and auxiliary data, while preserving the accuracy of the model as much as possible.
Tasks	Neural Architecture Search, Privacy Preserving Deep Learning
Published	2019-12-18
URL	https://arxiv.org/abs/1912.08421v1
PDF	https://arxiv.org/pdf/1912.08421v1.pdf
PWC	https://paperswithcode.com/paper/preventing-information-leakage-with-neural
Repo
Framework

An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search


Title	An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search
Authors	Kartik Goyal, Chris Dyer, Taylor Berg-Kirkpatrick
Abstract	Globally normalized neural sequence models are considered superior to their locally normalized equivalents because they may ameliorate the effects of label bias. However, when considering high-capacity neural parametrizations that condition on the whole input sequence, both model classes are theoretically equivalent in terms of the distributions they are capable of representing. Thus, the practical advantage of global normalization in the context of modern neural methods remains unclear. In this paper, we attempt to shed light on this problem through an empirical study. We extend an approach for search-aware training via a continuous relaxation of beam search (Goyal et al., 2017b) in order to enable training of globally normalized recurrent sequence models through simple backpropagation. We then use this technique to conduct an empirical study of the interaction between global normalization, high-capacity encoders, and search-aware optimization. We observe that in the context of inexact search, globally normalized neural models are still more effective than their locally normalized counterparts. Further, since our training approach is sensitive to warm-starting with pre-trained models, we also propose a novel initialization strategy based on self-normalization for pre-training globally normalized models. We perform analysis of our approach on two tasks: CCG supertagging and Machine Translation, and demonstrate the importance of global normalization under different conditions while using search-aware training.
Tasks	CCG Supertagging, Machine Translation
Published	2019-04-15
URL	http://arxiv.org/abs/1904.06834v1
PDF	http://arxiv.org/pdf/1904.06834v1.pdf
PWC	https://paperswithcode.com/paper/an-empirical-investigation-of-global-and
Repo
Framework

A Multimodal Target-Source Classifier with Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects


Title	A Multimodal Target-Source Classifier with Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects
Authors	Aly Magassouba, Komei Sugiura, Hisashi Kawai
Abstract	In this study, we focus on multimodal language understanding for fetching instructions in the domestic service robots context. This task consists of predicting a target object, as instructed by the user, given an image and an unstructured sentence, such as “Bring me the yellow box (from the wooden cabinet).” This is challenging because of the ambiguity of natural language, i.e., the relevant information may be missing or there might be several candidates. To solve such a task, we propose the multimodal target-source classifier model with attention branches (MTCM-AB), which is an extension of the MTCM. Our methodology uses the attention branch network (ABN) to develop a multimodal attention mechanism based on linguistic and visual inputs. Experimental validation using a standard dataset showed that the MTCM-AB outperformed both state-of-the-art methods and the MTCM. In particular the MTCM-AB accuracy on average was 90.1% while human performance was 90.3% on the PFN-PIC dataset.
Tasks
Published	2019-12-23
URL	https://arxiv.org/abs/1912.10675v2
PDF	https://arxiv.org/pdf/1912.10675v2.pdf
PWC	https://paperswithcode.com/paper/a-multimodal-target-source-classifier-with
Repo
Framework

Measuring the Completeness of Theories


Title	Measuring the Completeness of Theories
Authors	Drew Fudenberg, Jon Kleinberg, Annie Liang, Sendhil Mullainathan
Abstract	We use machine learning to provide a tractable measure of the amount of predictable variation in the data that a theory captures, which we call its “completeness.” We apply this measure to three problems: assigning certain equivalents to lotteries, initial play in games, and human generation of random sequences. We discover considerable variation in the completeness of existing models, which sheds light on whether to focus on developing better models with the same features or instead to look for new features that will improve predictions. We also illustrate how and why completeness varies with the experiments considered, which highlights the role played in choosing which experiments to run.
Tasks
Published	2019-10-15
URL	https://arxiv.org/abs/1910.07022v1
PDF	https://arxiv.org/pdf/1910.07022v1.pdf
PWC	https://paperswithcode.com/paper/measuring-the-completeness-of-theories
Repo
Framework

Transfer Learning for Relation Extraction via Relation-Gated Adversarial Learning


Title	Transfer Learning for Relation Extraction via Relation-Gated Adversarial Learning
Authors	Ningyu Zhang, Shumin Deng, Zhanlin Sun, Jiaoyan Chen, Wei Zhang, Huajun Chen
Abstract	Relation extraction aims to extract relational facts from sentences. Previous models mainly rely on manually labeled datasets, seed instances or human-crafted patterns, and distant supervision. However, the human annotation is expensive, while human-crafted patterns suffer from semantic drift and distant supervision samples are usually noisy. Domain adaptation methods enable leveraging labeled data from a different but related domain. However, different domains usually have various textual relation descriptions and different label space (the source label space is usually a superset of the target label space). To solve these problems, we propose a novel model of relation-gated adversarial learning for relation extraction, which extends the adversarial based domain adaptation. Experimental results have shown that the proposed approach outperforms previous domain adaptation methods regarding partial domain adaptation and can improve the accuracy of distance supervised relation extraction through fine-tuning.
Tasks	Domain Adaptation, Partial Domain Adaptation, Relation Extraction, Transfer Learning
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08507v1
PDF	https://arxiv.org/pdf/1908.08507v1.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-for-relation-extraction-via
Repo
Framework

Data-Centric Mixed-Variable Bayesian Optimization For Materials Design


Title	Data-Centric Mixed-Variable Bayesian Optimization For Materials Design
Authors	Akshay Iyer, Yichi Zhang, Aditya Prasad, Siyu Tao, Yixing Wang, Linda Schadler, L Catherine Brinson, Wei Chen
Abstract	Materials design can be cast as an optimization problem with the goal of achieving desired properties, by varying material composition, microstructure morphology, and processing conditions. Existence of both qualitative and quantitative material design variables leads to disjointed regions in property space, making the search for optimal design challenging. Limited availability of experimental data and the high cost of simulations magnify the challenge. This situation calls for design methodologies that can extract useful information from existing data and guide the search for optimal designs efficiently. To this end, we present a data-centric, mixed-variable Bayesian Optimization framework that integrates data from literature, experiments, and simulations for knowledge discovery and computational materials design. Our framework pivots around the Latent Variable Gaussian Process (LVGP), a novel Gaussian Process technique which projects qualitative variables on a continuous latent space for covariance formulation, as the surrogate model to quantify “lack of data” uncertainty. Expected improvement, an acquisition criterion that balances exploration and exploitation, helps navigate a complex, nonlinear design space to locate the optimum design. The proposed framework is tested through a case study which seeks to concurrently identify the optimal composition and morphology for insulating polymer nanocomposites. We also present an extension of mixed-variable Bayesian Optimization for multiple objectives to identify the Pareto Frontier within tens of iterations. These findings project Bayesian Optimization as a powerful tool for design of engineered material systems.
Tasks
Published	2019-07-04
URL	https://arxiv.org/abs/1907.02577v1
PDF	https://arxiv.org/pdf/1907.02577v1.pdf
PWC	https://paperswithcode.com/paper/data-centric-mixed-variable-bayesian
Repo
Framework

Word Sense Disambiguation using Knowledge-based Word Similarity


Title	Word Sense Disambiguation using Knowledge-based Word Similarity
Authors	Sunjae Kwon, Dongsuk Oh, Youngjoong Ko
Abstract	In natural language processing, word-sense disambiguation (WSD) is an open problem concerned with identifying the correct sense of words in a particular context. To address this problem, we introduce a novel knowledge-based WSD system. We suggest the adoption of two methods in our system. First, we suggest a novel method to encode the word vector representation by considering the graphical semantic relationships from the lexical knowledge-base. Second, we propose a method for extracting the contextual words from the text for analyzing an ambiguous word based on the similarity of word vector representations. To validate the effectiveness of our WSD system, we conducted experiments on the five benchmark English WSD corpora (Senseval-02, Senseval-03, SemEval-07, SemEval-13, and SemEval-15). The obtained results demonstrated that the suggested methods significantly enhanced the WSD performance. Furthermore, our system outperformed the existing knowledge-based WSD systems and showed a performance comparable to that of the state-of-the-art supervised WSD systems.
Tasks	Word Sense Disambiguation
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04015v1
PDF	https://arxiv.org/pdf/1911.04015v1.pdf
PWC	https://paperswithcode.com/paper/word-sense-disambiguation-using-knowledge
Repo
Framework

Automated detection of oral pre-cancerous tongue lesions using deep learning for early diagnosis of oral cavity cancer


Title	Automated detection of oral pre-cancerous tongue lesions using deep learning for early diagnosis of oral cavity cancer
Authors	Mohammed Zubair M. Shamim, Sadatullah Syed, Mohammad Shiblee, Mohammed Usman, Syed Ali
Abstract	Discovering oral cavity cancer (OCC) at an early stage is an effective way to increase patient survival rate. However, current initial screening process is done manually and is expensive for the average individual, especially in developing countries worldwide. This problem is further compounded due to the lack of specialists in such areas. Automating the initial screening process using artificial intelligence (AI) to detect pre-cancerous lesions can prove to be an effective and inexpensive technique that would allow patients to be triaged accordingly to receive appropriate clinical management. In this study, we have applied and evaluated the efficacy of six deep convolutional neural network (DCNN) models using transfer learning, for identifying pre-cancerous tongue lesions directly using a small data set of clinically annotated photographic images to diagnose early signs of OCC. DCNN model based on Vgg19 architecture was able to differentiate between benign and pre-cancerous tongue lesions with a mean classification accuracy of 0.98, sensitivity 0.89 and specificity 0.97. Additionally, the ResNet50 DCNN model was able to distinguish between five types of tongue lesions i.e. hairy tongue, fissured tongue, geographic tongue, strawberry tongue and oral hairy leukoplakia with a mean classification accuracy of 0.97. Preliminary results using an (AI+Physician) ensemble model demonstrate that an automated initial screening process of tongue lesions using DCNNs can achieve near-human level classification performance for diagnosing early signs of OCC in patients.
Tasks	Transfer Learning
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08987v1
PDF	https://arxiv.org/pdf/1909.08987v1.pdf
PWC	https://paperswithcode.com/paper/automated-detection-of-oral-pre-cancerous
Repo
Framework

MobileFAN: Transferring Deep Hidden Representation for Face Alignment


Title	MobileFAN: Transferring Deep Hidden Representation for Face Alignment
Authors	Yang Zhao, Yifan Liu, Chunhua Shen, Yongsheng Gao, Shengwu Xiong
Abstract	Facial landmark detection is a crucial prerequisite for many face analysis applications. Deep learning-based methods currently dominate the approach of addressing the facial landmark detection. However, such works generally introduce a large number of parameters, resulting in high memory cost. In this paper, we aim for a lightweight as well as effective solution to facial landmark detection. To this end, we propose an effective lightweight model, namely Mobile Face Alignment Network (MobileFAN), using a simple backbone MobileNetV2 as the encoder and three deconvolutional layers as the decoder. The proposed MobileFAN, with only 8% of the model size and lower computational cost, achieves superior or equivalent performance compared with state-of-the-art models. Moreover, by transferring the geometric structural information of a face graph from a large complex model to our proposed MobileFAN through feature-aligned distillation and feature-similarity distillation, the performance of MobileFAN is further improved in effectiveness and efficiency for face alignment. Extensive experiment results on three challenging facial landmark estimation benchmarks including COFW, 300W and WFLW show the superiority of our proposed MobileFAN against state-of-the-art methods.
Tasks	Face Alignment, Facial Landmark Detection
Published	2019-08-11
URL	https://arxiv.org/abs/1908.03839v3
PDF	https://arxiv.org/pdf/1908.03839v3.pdf
PWC	https://paperswithcode.com/paper/mobilefan-transferring-deep-hidden
Repo
Framework