Paper Group ANR 184
Differentiable Sparsification for Deep Neural Networks. Gated neural networks for implied volatility surfaces. Effective Network Compression Using Simulation-Guided Iterative Pruning. AutoEncoders for Training Compact Deep Learning RF Classifiers for Wireless Protocols. Transformer Based Reinforcement Learning For Games. Turing-Completeness of Dyna …
Differentiable Sparsification for Deep Neural Networks
Title | Differentiable Sparsification for Deep Neural Networks |
Authors | Yognjin Lee |
Abstract | A deep neural network has relieved the burden of feature engineering by human experts, but comparable efforts are instead required to determine an effective architecture. On the other hands, as the size of a network has over-grown, a lot of resources are also invested to reduce its size. These problems can be addressed by sparsification of an over-complete model, which removes redundant parameters or connections by pruning them away after training or encouraging them to become zero during training. In general, however, these approaches are not fully differentiable and interrupt an end-to-end training process with the stochastic gradient descent in that they require either a parameter selection or a soft-thresholding step. In this paper, we propose a fully differentiable sparsification method for deep neural networks, which allows parameters to be exactly zero during training, and thus can learn the sparsified structure and the weights of networks simultaneously using the stochastic gradient descent. We apply the proposed method to various popular models in order to show its effectiveness. |
Tasks | Feature Engineering |
Published | 2019-10-08 |
URL | https://arxiv.org/abs/1910.03201v1 |
https://arxiv.org/pdf/1910.03201v1.pdf | |
PWC | https://paperswithcode.com/paper/differentiable-sparsification-for-deep-neural |
Repo | |
Framework | |
Gated neural networks for implied volatility surfaces
Title | Gated neural networks for implied volatility surfaces |
Authors | Yu Zheng, Yongxin Yang, Bowei Chen |
Abstract | This paper presents a framework of developing neural networks to predict implied volatility surfaces. It can incorporate the related properties from existing mathematical models and empirical findings, including no static arbitrage, limiting boundaries, asymptotic slope and volatility smile. These properties are also satisfied empirically in our experiments with the option data on the S&P 500 index over 20 years. The developed neural network model outperforms the widely used surface stochastic volatility inspired (SSVI) model and other benchmarked neural network models on the mean average percentage error in both in-sample and out-of-sample datasets. This study has two major contributions. First, it contributes to the recent use of machine learning in finance, and an accurate deep learning implied volatility surface prediction model is obtained. Second, it provides the methodological guidance on how to seamlessly combine data-driven models with domain knowledge in the development of machine learning applications. |
Tasks | |
Published | 2019-04-29 |
URL | https://arxiv.org/abs/1904.12834v4 |
https://arxiv.org/pdf/1904.12834v4.pdf | |
PWC | https://paperswithcode.com/paper/gated-deep-neural-networks-for-implied |
Repo | |
Framework | |
Effective Network Compression Using Simulation-Guided Iterative Pruning
Title | Effective Network Compression Using Simulation-Guided Iterative Pruning |
Authors | Dae-Woong Jeong, Jaehun Kim, Youngseok Kim, Tae-Ho Kim, Myungsu Chae |
Abstract | Existing high-performance deep learning models require very intensive computing. For this reason, it is difficult to embed a deep learning model into a system with limited resources. In this paper, we propose the novel idea of the network compression as a method to solve this limitation. The principle of this idea is to make iterative pruning more effective and sophisticated by simulating the reduced network. A simple experiment was conducted to evaluate the method; the results showed that the proposed method achieved higher performance than existing methods at the same pruning level. |
Tasks | |
Published | 2019-02-12 |
URL | http://arxiv.org/abs/1902.04224v1 |
http://arxiv.org/pdf/1902.04224v1.pdf | |
PWC | https://paperswithcode.com/paper/effective-network-compression-using |
Repo | |
Framework | |
AutoEncoders for Training Compact Deep Learning RF Classifiers for Wireless Protocols
Title | AutoEncoders for Training Compact Deep Learning RF Classifiers for Wireless Protocols |
Authors | Silvija Kokalj-Filipovic, Rob Miller, Joshua Morman |
Abstract | We show that compact fully connected (FC) deep learning networks trained to classify wireless protocols using a hierarchy of multiple denoising autoencoders (AEs) outperform reference FC networks trained in a typical way, i.e., with a stochastic gradient based optimization of a given FC architecture. Not only is the complexity of such FC network, measured in number of trainable parameters and scalar multiplications, much lower than the reference FC and residual models, its accuracy also outperforms both models for nearly all tested SNR values (0 dB to 50dB). Such AE-trained networks are suited for in-situ protocol inference performed by simple mobile devices based on noisy signal measurements. Training is based on the data transmitted by real devices, and collected in a controlled environment, and systematically augmented by a policy-based data synthesis process by adding to the signal any subset of impairments commonly seen in a wireless receiver. |
Tasks | Denoising |
Published | 2019-04-13 |
URL | http://arxiv.org/abs/1904.11874v1 |
http://arxiv.org/pdf/1904.11874v1.pdf | |
PWC | https://paperswithcode.com/paper/190411874 |
Repo | |
Framework | |
Transformer Based Reinforcement Learning For Games
Title | Transformer Based Reinforcement Learning For Games |
Authors | Uddeshya Upadhyay, Nikunj Shah, Sucheta Ravikanti, Mayanka Medhe |
Abstract | Recent times have witnessed sharp improvements in reinforcement learning tasks using deep reinforcement learning techniques like Deep Q Networks, Policy Gradients, Actor Critic methods which are based on deep learning based models and back-propagation of gradients to train such models. An active area of research in reinforcement learning is about training agents to play complex video games, which so far has been something accomplished only by human intelligence. Some state of the art performances in video game playing using deep reinforcement learning are obtained by processing the sequence of frames from video games, passing them through a convolutional network to obtain features and then using recurrent neural networks to figure out the action leading to optimal rewards. The recurrent neural network will learn to extract the meaningful signal out of the sequence of such features. In this work, we propose a method utilizing a transformer network which have recently replaced RNNs in Natural Language Processing (NLP), and perform experiments to compare with existing methods. |
Tasks | |
Published | 2019-12-09 |
URL | https://arxiv.org/abs/1912.03918v1 |
https://arxiv.org/pdf/1912.03918v1.pdf | |
PWC | https://paperswithcode.com/paper/transformer-based-reinforcement-learning-for |
Repo | |
Framework | |
Turing-Completeness of Dynamics in Abstract Persuasion Argumentation
Title | Turing-Completeness of Dynamics in Abstract Persuasion Argumentation |
Authors | Ryuta Arisaka |
Abstract | Abstract Persuasion Argumentation (APA) is a dynamic argumentation formalism that extends Dung argumentation with persuasion relations. In this work, we show through two-counter Minsky machine encoding that APA dynamics is Turing-complete. |
Tasks | |
Published | 2019-03-19 |
URL | http://arxiv.org/abs/1903.07837v1 |
http://arxiv.org/pdf/1903.07837v1.pdf | |
PWC | https://paperswithcode.com/paper/turing-completeness-of-dynamics-in-abstract |
Repo | |
Framework | |
Preventing Information Leakage with Neural Architecture Search
Title | Preventing Information Leakage with Neural Architecture Search |
Authors | Shuang Zhang, Liyao Xiang, Congcong Li, Yixuan Wang, Zeyu Liu, Quanshi Zhang, Bo Li |
Abstract | Powered by machine learning services in the cloud, numerous learning-driven mobile applications are gaining popularity in the market. As deep learning tasks are mostly computation-intensive, it has become a trend to process raw data on devices and send the neural network features to the cloud, whereas the part of the neural network residing in the cloud completes the task to return final results. However, there is always the potential for unexpected leakage with the release of features, with which an adversary could infer a significant amount of information about the original data. To address this problem, we propose a privacy-preserving deep learning framework on top of the mobile cloud infrastructure: the trained deep neural network is tailored to prevent information leakage through features while maintaining highly accurate results. In essence, we learn the strategy to prevent leakage by modifying the trained deep neural network against a generic opponent, who infers unintended information from released features and auxiliary data, while preserving the accuracy of the model as much as possible. |
Tasks | Neural Architecture Search, Privacy Preserving Deep Learning |
Published | 2019-12-18 |
URL | https://arxiv.org/abs/1912.08421v1 |
https://arxiv.org/pdf/1912.08421v1.pdf | |
PWC | https://paperswithcode.com/paper/preventing-information-leakage-with-neural |
Repo | |
Framework | |
An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search
Title | An Empirical Investigation of Global and Local Normalization for Recurrent Neural Sequence Models Using a Continuous Relaxation to Beam Search |
Authors | Kartik Goyal, Chris Dyer, Taylor Berg-Kirkpatrick |
Abstract | Globally normalized neural sequence models are considered superior to their locally normalized equivalents because they may ameliorate the effects of label bias. However, when considering high-capacity neural parametrizations that condition on the whole input sequence, both model classes are theoretically equivalent in terms of the distributions they are capable of representing. Thus, the practical advantage of global normalization in the context of modern neural methods remains unclear. In this paper, we attempt to shed light on this problem through an empirical study. We extend an approach for search-aware training via a continuous relaxation of beam search (Goyal et al., 2017b) in order to enable training of globally normalized recurrent sequence models through simple backpropagation. We then use this technique to conduct an empirical study of the interaction between global normalization, high-capacity encoders, and search-aware optimization. We observe that in the context of inexact search, globally normalized neural models are still more effective than their locally normalized counterparts. Further, since our training approach is sensitive to warm-starting with pre-trained models, we also propose a novel initialization strategy based on self-normalization for pre-training globally normalized models. We perform analysis of our approach on two tasks: CCG supertagging and Machine Translation, and demonstrate the importance of global normalization under different conditions while using search-aware training. |
Tasks | CCG Supertagging, Machine Translation |
Published | 2019-04-15 |
URL | http://arxiv.org/abs/1904.06834v1 |
http://arxiv.org/pdf/1904.06834v1.pdf | |
PWC | https://paperswithcode.com/paper/an-empirical-investigation-of-global-and |
Repo | |
Framework | |
A Multimodal Target-Source Classifier with Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects
Title | A Multimodal Target-Source Classifier with Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects |
Authors | Aly Magassouba, Komei Sugiura, Hisashi Kawai |
Abstract | In this study, we focus on multimodal language understanding for fetching instructions in the domestic service robots context. This task consists of predicting a target object, as instructed by the user, given an image and an unstructured sentence, such as “Bring me the yellow box (from the wooden cabinet).” This is challenging because of the ambiguity of natural language, i.e., the relevant information may be missing or there might be several candidates. To solve such a task, we propose the multimodal target-source classifier model with attention branches (MTCM-AB), which is an extension of the MTCM. Our methodology uses the attention branch network (ABN) to develop a multimodal attention mechanism based on linguistic and visual inputs. Experimental validation using a standard dataset showed that the MTCM-AB outperformed both state-of-the-art methods and the MTCM. In particular the MTCM-AB accuracy on average was 90.1% while human performance was 90.3% on the PFN-PIC dataset. |
Tasks | |
Published | 2019-12-23 |
URL | https://arxiv.org/abs/1912.10675v2 |
https://arxiv.org/pdf/1912.10675v2.pdf | |
PWC | https://paperswithcode.com/paper/a-multimodal-target-source-classifier-with |
Repo | |
Framework | |
Measuring the Completeness of Theories
Title | Measuring the Completeness of Theories |
Authors | Drew Fudenberg, Jon Kleinberg, Annie Liang, Sendhil Mullainathan |
Abstract | We use machine learning to provide a tractable measure of the amount of predictable variation in the data that a theory captures, which we call its “completeness.” We apply this measure to three problems: assigning certain equivalents to lotteries, initial play in games, and human generation of random sequences. We discover considerable variation in the completeness of existing models, which sheds light on whether to focus on developing better models with the same features or instead to look for new features that will improve predictions. We also illustrate how and why completeness varies with the experiments considered, which highlights the role played in choosing which experiments to run. |
Tasks | |
Published | 2019-10-15 |
URL | https://arxiv.org/abs/1910.07022v1 |
https://arxiv.org/pdf/1910.07022v1.pdf | |
PWC | https://paperswithcode.com/paper/measuring-the-completeness-of-theories |
Repo | |
Framework | |
Transfer Learning for Relation Extraction via Relation-Gated Adversarial Learning
Title | Transfer Learning for Relation Extraction via Relation-Gated Adversarial Learning |
Authors | Ningyu Zhang, Shumin Deng, Zhanlin Sun, Jiaoyan Chen, Wei Zhang, Huajun Chen |
Abstract | Relation extraction aims to extract relational facts from sentences. Previous models mainly rely on manually labeled datasets, seed instances or human-crafted patterns, and distant supervision. However, the human annotation is expensive, while human-crafted patterns suffer from semantic drift and distant supervision samples are usually noisy. Domain adaptation methods enable leveraging labeled data from a different but related domain. However, different domains usually have various textual relation descriptions and different label space (the source label space is usually a superset of the target label space). To solve these problems, we propose a novel model of relation-gated adversarial learning for relation extraction, which extends the adversarial based domain adaptation. Experimental results have shown that the proposed approach outperforms previous domain adaptation methods regarding partial domain adaptation and can improve the accuracy of distance supervised relation extraction through fine-tuning. |
Tasks | Domain Adaptation, Partial Domain Adaptation, Relation Extraction, Transfer Learning |
Published | 2019-08-22 |
URL | https://arxiv.org/abs/1908.08507v1 |
https://arxiv.org/pdf/1908.08507v1.pdf | |
PWC | https://paperswithcode.com/paper/transfer-learning-for-relation-extraction-via |
Repo | |
Framework | |
Data-Centric Mixed-Variable Bayesian Optimization For Materials Design
Title | Data-Centric Mixed-Variable Bayesian Optimization For Materials Design |
Authors | Akshay Iyer, Yichi Zhang, Aditya Prasad, Siyu Tao, Yixing Wang, Linda Schadler, L Catherine Brinson, Wei Chen |
Abstract | Materials design can be cast as an optimization problem with the goal of achieving desired properties, by varying material composition, microstructure morphology, and processing conditions. Existence of both qualitative and quantitative material design variables leads to disjointed regions in property space, making the search for optimal design challenging. Limited availability of experimental data and the high cost of simulations magnify the challenge. This situation calls for design methodologies that can extract useful information from existing data and guide the search for optimal designs efficiently. To this end, we present a data-centric, mixed-variable Bayesian Optimization framework that integrates data from literature, experiments, and simulations for knowledge discovery and computational materials design. Our framework pivots around the Latent Variable Gaussian Process (LVGP), a novel Gaussian Process technique which projects qualitative variables on a continuous latent space for covariance formulation, as the surrogate model to quantify “lack of data” uncertainty. Expected improvement, an acquisition criterion that balances exploration and exploitation, helps navigate a complex, nonlinear design space to locate the optimum design. The proposed framework is tested through a case study which seeks to concurrently identify the optimal composition and morphology for insulating polymer nanocomposites. We also present an extension of mixed-variable Bayesian Optimization for multiple objectives to identify the Pareto Frontier within tens of iterations. These findings project Bayesian Optimization as a powerful tool for design of engineered material systems. |
Tasks | |
Published | 2019-07-04 |
URL | https://arxiv.org/abs/1907.02577v1 |
https://arxiv.org/pdf/1907.02577v1.pdf | |
PWC | https://paperswithcode.com/paper/data-centric-mixed-variable-bayesian |
Repo | |
Framework | |
Word Sense Disambiguation using Knowledge-based Word Similarity
Title | Word Sense Disambiguation using Knowledge-based Word Similarity |
Authors | Sunjae Kwon, Dongsuk Oh, Youngjoong Ko |
Abstract | In natural language processing, word-sense disambiguation (WSD) is an open problem concerned with identifying the correct sense of words in a particular context. To address this problem, we introduce a novel knowledge-based WSD system. We suggest the adoption of two methods in our system. First, we suggest a novel method to encode the word vector representation by considering the graphical semantic relationships from the lexical knowledge-base. Second, we propose a method for extracting the contextual words from the text for analyzing an ambiguous word based on the similarity of word vector representations. To validate the effectiveness of our WSD system, we conducted experiments on the five benchmark English WSD corpora (Senseval-02, Senseval-03, SemEval-07, SemEval-13, and SemEval-15). The obtained results demonstrated that the suggested methods significantly enhanced the WSD performance. Furthermore, our system outperformed the existing knowledge-based WSD systems and showed a performance comparable to that of the state-of-the-art supervised WSD systems. |
Tasks | Word Sense Disambiguation |
Published | 2019-11-11 |
URL | https://arxiv.org/abs/1911.04015v1 |
https://arxiv.org/pdf/1911.04015v1.pdf | |
PWC | https://paperswithcode.com/paper/word-sense-disambiguation-using-knowledge |
Repo | |
Framework | |
Automated detection of oral pre-cancerous tongue lesions using deep learning for early diagnosis of oral cavity cancer
Title | Automated detection of oral pre-cancerous tongue lesions using deep learning for early diagnosis of oral cavity cancer |
Authors | Mohammed Zubair M. Shamim, Sadatullah Syed, Mohammad Shiblee, Mohammed Usman, Syed Ali |
Abstract | Discovering oral cavity cancer (OCC) at an early stage is an effective way to increase patient survival rate. However, current initial screening process is done manually and is expensive for the average individual, especially in developing countries worldwide. This problem is further compounded due to the lack of specialists in such areas. Automating the initial screening process using artificial intelligence (AI) to detect pre-cancerous lesions can prove to be an effective and inexpensive technique that would allow patients to be triaged accordingly to receive appropriate clinical management. In this study, we have applied and evaluated the efficacy of six deep convolutional neural network (DCNN) models using transfer learning, for identifying pre-cancerous tongue lesions directly using a small data set of clinically annotated photographic images to diagnose early signs of OCC. DCNN model based on Vgg19 architecture was able to differentiate between benign and pre-cancerous tongue lesions with a mean classification accuracy of 0.98, sensitivity 0.89 and specificity 0.97. Additionally, the ResNet50 DCNN model was able to distinguish between five types of tongue lesions i.e. hairy tongue, fissured tongue, geographic tongue, strawberry tongue and oral hairy leukoplakia with a mean classification accuracy of 0.97. Preliminary results using an (AI+Physician) ensemble model demonstrate that an automated initial screening process of tongue lesions using DCNNs can achieve near-human level classification performance for diagnosing early signs of OCC in patients. |
Tasks | Transfer Learning |
Published | 2019-09-18 |
URL | https://arxiv.org/abs/1909.08987v1 |
https://arxiv.org/pdf/1909.08987v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-detection-of-oral-pre-cancerous |
Repo | |
Framework | |
MobileFAN: Transferring Deep Hidden Representation for Face Alignment
Title | MobileFAN: Transferring Deep Hidden Representation for Face Alignment |
Authors | Yang Zhao, Yifan Liu, Chunhua Shen, Yongsheng Gao, Shengwu Xiong |
Abstract | Facial landmark detection is a crucial prerequisite for many face analysis applications. Deep learning-based methods currently dominate the approach of addressing the facial landmark detection. However, such works generally introduce a large number of parameters, resulting in high memory cost. In this paper, we aim for a lightweight as well as effective solution to facial landmark detection. To this end, we propose an effective lightweight model, namely Mobile Face Alignment Network (MobileFAN), using a simple backbone MobileNetV2 as the encoder and three deconvolutional layers as the decoder. The proposed MobileFAN, with only 8% of the model size and lower computational cost, achieves superior or equivalent performance compared with state-of-the-art models. Moreover, by transferring the geometric structural information of a face graph from a large complex model to our proposed MobileFAN through feature-aligned distillation and feature-similarity distillation, the performance of MobileFAN is further improved in effectiveness and efficiency for face alignment. Extensive experiment results on three challenging facial landmark estimation benchmarks including COFW, 300W and WFLW show the superiority of our proposed MobileFAN against state-of-the-art methods. |
Tasks | Face Alignment, Facial Landmark Detection |
Published | 2019-08-11 |
URL | https://arxiv.org/abs/1908.03839v3 |
https://arxiv.org/pdf/1908.03839v3.pdf | |
PWC | https://paperswithcode.com/paper/mobilefan-transferring-deep-hidden |
Repo | |
Framework | |