July 30, 2019

3407 words 16 mins read

Paper Group AWR 27

Demystifying Neural Style Transfer. Deep Keyphrase Generation. Learning a CNN-based End-to-End Controller for a Formula SAE Racecar. Drug-Drug Interaction Extraction from Biomedical Text Using Long Short Term Memory Network. Question Answering through Transfer Learning from Large Fine-grained Supervision Data. Model-Powered Conditional Independence …

Demystifying Neural Style Transfer


Title	Demystifying Neural Style Transfer
Authors	Yanghao Li, Naiyan Wang, Jiaying Liu, Xiaodi Hou
Abstract	Neural Style Transfer has recently demonstrated very exciting results which catches eyes in both academia and industry. Despite the amazing results, the principle of neural style transfer, especially why the Gram matrices could represent style remains unclear. In this paper, we propose a novel interpretation of neural style transfer by treating it as a domain adaptation problem. Specifically, we theoretically show that matching the Gram matrices of feature maps is equivalent to minimize the Maximum Mean Discrepancy (MMD) with the second order polynomial kernel. Thus, we argue that the essence of neural style transfer is to match the feature distributions between the style images and the generated images. To further support our standpoint, we experiment with several other distribution alignment methods, and achieve appealing results. We believe this novel interpretation connects these two important research fields, and could enlighten future researches.
Tasks	Domain Adaptation, Style Transfer
Published	2017-01-04
URL	http://arxiv.org/abs/1701.01036v2
PDF	http://arxiv.org/pdf/1701.01036v2.pdf
PWC	https://paperswithcode.com/paper/demystifying-neural-style-transfer
Repo	https://github.com/aryan-mann/style-transfer
Framework	none

Deep Keyphrase Generation


Title	Deep Keyphrase Generation
Authors	Rui Meng, Sanqiang Zhao, Shuguang Han, Daqing He, Peter Brusilovsky, Yu Chi
Abstract	Keyphrase provides highly-summative information that can be effectively used for understanding, organizing and retrieving text content. Though previous studies have provided many workable solutions for automated keyphrase extraction, they commonly divided the to-be-summarized content into multiple text chunks, then ranked and selected the most meaningful ones. These approaches could neither identify keyphrases that do not appear in the text, nor capture the real semantic meaning behind the text. We propose a generative model for keyphrase prediction with an encoder-decoder framework, which can effectively overcome the above drawbacks. We name it as deep keyphrase generation since it attempts to capture the deep semantic meaning of the content with a deep learning method. Empirical analysis on six datasets demonstrates that our proposed model not only achieves a significant performance boost on extracting keyphrases that appear in the source text, but also can generate absent keyphrases based on the semantic meaning of the text. Code and dataset are available at https://github.com/memray/seq2seq-keyphrase.
Tasks
Published	2017-04-23
URL	http://arxiv.org/abs/1704.06879v2
PDF	http://arxiv.org/pdf/1704.06879v2.pdf
PWC	https://paperswithcode.com/paper/deep-keyphrase-generation
Repo	https://github.com/supercoderhawk/deep-keyphrase
Framework	pytorch

Learning a CNN-based End-to-End Controller for a Formula SAE Racecar


Title	Learning a CNN-based End-to-End Controller for a Formula SAE Racecar
Authors	Skanda Koppula
Abstract	We present a set of CNN-based end-to-end models for controls of a Formula SAE racecar, along with various benchmarking and visualization tools to understand model performance. We tackled three main problems in the context of cone-delineated racetrack driving: (1) discretized steering, which translates a first-person frame along to the track to a predicted steering direction. (2) real-value steering, which translates a frame view to a real-value steering angle, and (3) a network design for predicting brake and throttle. We demonstrate high accuracy on our discretization task, low theoretical testing errors with our model for real-value steering, and a starting point for future work regarding a controller for our vehicle’s brake and throttle. Timing benchmarks suggests that the networks we propose have the latency and throughput required for real-time controllers, when run on GPU-enabled hardware.
Tasks
Published	2017-07-12
URL	http://arxiv.org/abs/1708.02215v1
PDF	http://arxiv.org/pdf/1708.02215v1.pdf
PWC	https://paperswithcode.com/paper/learning-a-cnn-based-end-to-end-controller
Repo	https://github.com/vasubansal1033/FS-Electric
Framework	none

Drug-Drug Interaction Extraction from Biomedical Text Using Long Short Term Memory Network


Title	Drug-Drug Interaction Extraction from Biomedical Text Using Long Short Term Memory Network
Authors	Sunil Kumar Sahu, Ashish Anand
Abstract	Simultaneous administration of multiple drugs can have synergistic or antagonistic effects as one drug can affect activities of other drugs. Synergistic effects lead to improved therapeutic outcomes, whereas, antagonistic effects can be life-threatening, may lead to increased healthcare cost, or may even cause death. Thus identification of unknown drug-drug interaction (DDI) is an important concern for efficient and effective healthcare. Although multiple resources for DDI exist, they are often unable to keep pace with rich amount of information available in fast growing biomedical texts. Most existing methods model DDI extraction from text as a classification problem and mainly rely on handcrafted features. Some of these features further depend on domain specific tools. Recently neural network models using latent features have been shown to give similar or better performance than the other existing models dependent on handcrafted features. In this paper, we present three models namely, {\it B-LSTM}, {\it AB-LSTM} and {\it Joint AB-LSTM} based on long short-term memory (LSTM) network. All three models utilize word and position embedding as latent features and thus do not rely on explicit feature engineering. Further use of bidirectional long short-term memory (Bi-LSTM) networks allow implicit feature extraction from the whole sentence. The two models, {\it AB-LSTM} and {\it Joint AB-LSTM} also use attentive pooling in the output of Bi-LSTM layer to assign weights to features. Our experimental results on the SemEval-2013 DDI extraction dataset show that the {\it Joint AB-LSTM} model outperforms all the existing methods, including those relying on handcrafted features. The other two proposed LSTM models also perform competitively with state-of-the-art methods.
Tasks	Feature Engineering, Medical Relation Extraction
Published	2017-01-28
URL	http://arxiv.org/abs/1701.08303v2
PDF	http://arxiv.org/pdf/1701.08303v2.pdf
PWC	https://paperswithcode.com/paper/drug-drug-interaction-extraction-from
Repo	https://github.com/sunilitggu/DDI-extraction-through-LSTM
Framework	tf

Question Answering through Transfer Learning from Large Fine-grained Supervision Data


Title	Question Answering through Transfer Learning from Large Fine-grained Supervision Data
Authors	Sewon Min, Minjoon Seo, Hannaneh Hajishirzi
Abstract	We show that the task of question answering (QA) can significantly benefit from the transfer learning of models trained on a different large, fine-grained QA dataset. We achieve the state of the art in two well-studied QA datasets, WikiQA and SemEval-2016 (Task 3A), through a basic transfer learning technique from SQuAD. For WikiQA, our model outperforms the previous best model by more than 8%. We demonstrate that finer supervision provides better guidance for learning lexical and syntactic information than coarser supervision, through quantitative results and visual analysis. We also show that a similar transfer learning procedure achieves the state of the art on an entailment task.
Tasks	Question Answering, Transfer Learning
Published	2017-02-07
URL	http://arxiv.org/abs/1702.02171v6
PDF	http://arxiv.org/pdf/1702.02171v6.pdf
PWC	https://paperswithcode.com/paper/question-answering-through-transfer-learning
Repo	https://github.com/shmsw25/qa-transfer
Framework	tf

Model-Powered Conditional Independence Test


Title	Model-Powered Conditional Independence Test
Authors	Rajat Sen, Ananda Theertha Suresh, Karthikeyan Shanmugam, Alexandros G. Dimakis, Sanjay Shakkottai
Abstract	We consider the problem of non-parametric Conditional Independence testing (CI testing) for continuous random variables. Given i.i.d samples from the joint distribution $f(x,y,z)$ of continuous random vectors $X,Y$ and $Z,$ we determine whether $X \perp Y Z$. We approach this by converting the conditional independence test into a classification problem. This allows us to harness very powerful classifiers like gradient-boosted trees and deep neural networks. These models can handle complex probability distributions and allow us to perform significantly better compared to the prior state of the art, for high-dimensional CI testing. The main technical challenge in the classification problem is the need for samples from the conditional product distribution $f^{CI}(x,y,z) = f(xz)f(yz)f(z)$ – the joint distribution if and only if $X \perp Y Z.$ – when given access only to i.i.d. samples from the true joint distribution $f(x,y,z)$. To tackle this problem we propose a novel nearest neighbor bootstrap procedure and theoretically show that our generated samples are indeed close to $f^{CI}$ in terms of total variational distance. We then develop theoretical results regarding the generalization bounds for classification for our problem, which translate into error bounds for CI testing. We provide a novel analysis of Rademacher type classification bounds in the presence of non-i.i.d near-independent samples. We empirically validate the performance of our algorithm on simulated and real datasets and show performance gains over previous methods.
Tasks
Published	2017-09-18
URL	http://arxiv.org/abs/1709.06138v1
PDF	http://arxiv.org/pdf/1709.06138v1.pdf
PWC	https://paperswithcode.com/paper/model-powered-conditional-independence-test
Repo	https://github.com/rajatsen91/CCIT
Framework	none

Semantic Document Distance Measures and Unsupervised Document Revision Detection


Title	Semantic Document Distance Measures and Unsupervised Document Revision Detection
Authors	Xiaofeng Zhu, Diego Klabjan, Patrick Bless
Abstract	In this paper, we model the document revision detection problem as a minimum cost branching problem that relies on computing document distances. Furthermore, we propose two new document distance measures, word vector-based Dynamic Time Warping (wDTW) and word vector-based Tree Edit Distance (wTED). Our revision detection system is designed for a large scale corpus and implemented in Apache Spark. We demonstrate that our system can more precisely detect revisions than state-of-the-art methods by utilizing the Wikipedia revision dumps https://snap.stanford.edu/data/wiki-meta.html and simulated data sets.
Tasks
Published	2017-09-05
URL	http://arxiv.org/abs/1709.01256v2
PDF	http://arxiv.org/pdf/1709.01256v2.pdf
PWC	https://paperswithcode.com/paper/semantic-document-distance-measures-and
Repo	https://github.com/XiaofengZhu/wDTW-wTED
Framework	none

Feature-Fused SSD: Fast Detection for Small Objects


Title	Feature-Fused SSD: Fast Detection for Small Objects
Authors	Guimei Cao, Xuemei Xie, Wenzhe Yang, Quan Liao, Guangming Shi, Jinjian Wu
Abstract	Small objects detection is a challenging task in computer vision due to its limited resolution and information. In order to solve this problem, the majority of existing methods sacrifice speed for improvement in accuracy. In this paper, we aim to detect small objects at a fast speed, using the best object detector Single Shot Multibox Detector (SSD) with respect to accuracy-vs-speed trade-off as base architecture. We propose a multi-level feature fusion method for introducing contextual information in SSD, in order to improve the accuracy for small objects. In detailed fusion operation, we design two feature fusion modules, concatenation module and element-sum module, different in the way of adding contextual information. Experimental results show that these two fusion modules obtain higher mAP on PASCALVOC2007 than baseline SSD by 1.6 and 1.7 points respectively, especially with 2-3 points improvement on some smallobjects categories. The testing speed of them is 43 and 40 FPS respectively, superior to the state of the art Deconvolutional single shot detector (DSSD) by 29.4 and 26.4 FPS. Code is available at https://github.com/wnzhyee/Feature-Fused-SSD. Keywords: small object detection, feature fusion, real-time, single shot multi-box detector
Tasks	Object Detection, Small Object Detection
Published	2017-09-15
URL	http://arxiv.org/abs/1709.05054v3
PDF	http://arxiv.org/pdf/1709.05054v3.pdf
PWC	https://paperswithcode.com/paper/feature-fused-ssd-fast-detection-for-small
Repo	https://github.com/wnzhyee/Feature-Fused-SSD
Framework	none

Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture


Title	Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture
Authors	Katsunori Ohnishi, Shohei Yamamoto, Yoshitaka Ushiku, Tatsuya Harada
Abstract	Learning to represent and generate videos from unlabeled data is a very challenging problem. To generate realistic videos, it is important not only to ensure that the appearance of each frame is real, but also to ensure the plausibility of a video motion and consistency of a video appearance in the time direction. The process of video generation should be divided according to these intrinsic difficulties. In this study, we focus on the motion and appearance information as two important orthogonal components of a video, and propose Flow-and-Texture-Generative Adversarial Networks (FTGAN) consisting of FlowGAN and TextureGAN. In order to avoid a huge annotation cost, we have to explore a way to learn from unlabeled data. Thus, we employ optical flow as motion information to generate videos. FlowGAN generates optical flow, which contains only the edge and motion of the videos to be begerated. On the other hand, TextureGAN specializes in giving a texture to optical flow generated by FlowGAN. This hierarchical approach brings more realistic videos with plausible motion and appearance consistency. Our experiments show that our model generates more plausible motion videos and also achieves significantly improved performance for unsupervised action classification in comparison to previous GAN works. In addition, because our model generates videos from two independent information, our model can generate new combinations of motion and attribute that are not seen in training data, such as a video in which a person is doing sit-up in a baseball ground.
Tasks	Action Classification, Optical Flow Estimation, Video Generation
Published	2017-11-27
URL	http://arxiv.org/abs/1711.09618v2
PDF	http://arxiv.org/pdf/1711.09618v2.pdf
PWC	https://paperswithcode.com/paper/hierarchical-video-generation-from-orthogonal
Repo	https://github.com/mil-tokyo/FTGAN
Framework	none

Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser


Title	Defense against Adversarial Attacks Using High-Level Representation Guided Denoiser
Authors	Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Xiaolin Hu, Jun Zhu
Abstract	Neural networks are vulnerable to adversarial examples, which poses a threat to their application in security sensitive systems. We propose high-level representation guided denoiser (HGD) as a defense for image classification. Standard denoiser suffers from the error amplification effect, in which small residual adversarial noise is progressively amplified and leads to wrong classifications. HGD overcomes this problem by using a loss function defined as the difference between the target model’s outputs activated by the clean image and denoised image. Compared with ensemble adversarial training which is the state-of-the-art defending method on large images, HGD has three advantages. First, with HGD as a defense, the target model is more robust to either white-box or black-box adversarial attacks. Second, HGD can be trained on a small subset of the images and generalizes well to other images and unseen classes. Third, HGD can be transferred to defend models other than the one guiding it. In NIPS competition on defense against adversarial attacks, our HGD solution won the first place and outperformed other models by a large margin.
Tasks	Adversarial Attack, Adversarial Defense, Image Classification
Published	2017-12-08
URL	http://arxiv.org/abs/1712.02976v2
PDF	http://arxiv.org/pdf/1712.02976v2.pdf
PWC	https://paperswithcode.com/paper/defense-against-adversarial-attacks-using
Repo	https://github.com/anishathalye/Guided-Denoise
Framework	tf

Preserving Differential Privacy in Convolutional Deep Belief Networks


Title	Preserving Differential Privacy in Convolutional Deep Belief Networks
Authors	NhatHai Phan, Xintao Wu, Dejing Dou
Abstract	The remarkable development of deep learning in medicine and healthcare domain presents obvious privacy issues, when deep neural networks are built on users’ personal and highly sensitive data, e.g., clinical records, user profiles, biomedical images, etc. However, only a few scientific studies on preserving privacy in deep learning have been conducted. In this paper, we focus on developing a private convolutional deep belief network (pCDBN), which essentially is a convolutional deep belief network (CDBN) under differential privacy. Our main idea of enforcing epsilon-differential privacy is to leverage the functional mechanism to perturb the energy-based objective functions of traditional CDBNs, rather than their results. One key contribution of this work is that we propose the use of Chebyshev expansion to derive the approximate polynomial representation of objective functions. Our theoretical analysis shows that we can further derive the sensitivity and error bounds of the approximate polynomial representation. As a result, preserving differential privacy in CDBNs is feasible. We applied our model in a health social network, i.e., YesiWell data, and in a handwriting digit dataset, i.e., MNIST data, for human behavior prediction, human behavior classification, and handwriting digit recognition tasks. Theoretical analysis and rigorous experimental evaluations show that the pCDBN is highly effective. It significantly outperforms existing solutions.
Tasks
Published	2017-06-25
URL	http://arxiv.org/abs/1706.08839v2
PDF	http://arxiv.org/pdf/1706.08839v2.pdf
PWC	https://paperswithcode.com/paper/preserving-differential-privacy-in
Repo	https://github.com/haiphanNJIT/PrivateDeepLearning
Framework	tf

StarSpace: Embed All The Things!


Title	StarSpace: Embed All The Things!
Authors	Ledell Wu, Adam Fisch, Sumit Chopra, Keith Adams, Antoine Bordes, Jason Weston
Abstract	A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
Tasks	Text Classification, Word Embeddings
Published	2017-09-12
URL	http://arxiv.org/abs/1709.03856v5
PDF	http://arxiv.org/pdf/1709.03856v5.pdf
PWC	https://paperswithcode.com/paper/starspace-embed-all-the-things
Repo	https://github.com/facebookresearch/StarSpace
Framework	none

Can you tell where in India I am from? Comparing humans and computers on fine-grained race face classification


Title	Can you tell where in India I am from? Comparing humans and computers on fine-grained race face classification
Authors	Harish Katti, S. P. Arun
Abstract	Faces form the basis for a rich variety of judgments in humans, yet the underlying features remain poorly understood. Although fine-grained distinctions within a race might more strongly constrain possible facial features used by humans than in case of coarse categories such as race or gender, such fine grained distinctions are relatively less studied. Fine-grained race classification is also interesting because even humans may not be perfectly accurate on these tasks. This allows us to compare errors made by humans and machines, in contrast to standard object detection tasks where human performance is nearly perfect. We have developed a novel face database of close to 1650 diverse Indian faces labeled for fine-grained race (South vs North India) as well as for age, weight, height and gender. We then asked close to 130 human subjects who were instructed to categorize each face as belonging toa Northern or Southern state in India. We then compared human performance on this task with that of computational models trained on the ground-truth labels. Our main results are as follows: (1) Humans are highly consistent (average accuracy: 63.6%), with some faces being consistently classified with > 90% accuracy and others consistently misclassified with < 30% accuracy; (2) Models trained on ground-truth labels showed slightly worse performance (average accuracy: 62%) but showed higher accuracy (72.2%) on faces classified with > 80% accuracy by humans. This was true for models trained on simple spatial and intensity measurements extracted from faces as well as deep neural networks trained on race or gender classification; (3) Using overcomplete banks of features derived from each face part, we found that mouth shape was the single largest contributor towards fine-grained race classification, whereas distances between face parts was the strongest predictor of gender.
Tasks	Object Detection
Published	2017-03-22
URL	http://arxiv.org/abs/1703.07595v2
PDF	http://arxiv.org/pdf/1703.07595v2.pdf
PWC	https://paperswithcode.com/paper/can-you-tell-where-in-india-i-am-from
Repo	https://github.com/harish2006/IISCIFD
Framework	none

Deep Hashing Network for Unsupervised Domain Adaptation


Title	Deep Hashing Network for Unsupervised Domain Adaptation
Authors	Hemanth Venkateswara, Jose Eusebio, Shayok Chakraborty, Sethuraman Panchanathan
Abstract	In recent years, deep neural networks have emerged as a dominant machine learning tool for a wide variety of application domains. However, training a deep neural network requires a large amount of labeled data, which is an expensive process in terms of time, labor and human expertise. Domain adaptation or transfer learning algorithms address this challenge by leveraging labeled data in a different, but related source domain, to develop a model for the target domain. Further, the explosive growth of digital data has posed a fundamental challenge concerning its storage and retrieval. Due to its storage and retrieval efficiency, recent years have witnessed a wide application of hashing in a variety of computer vision applications. In this paper, we first introduce a new dataset, Office-Home, to evaluate domain adaptation algorithms. The dataset contains images of a variety of everyday objects from multiple domains. We then propose a novel deep learning framework that can exploit labeled source data and unlabeled target data to learn informative hash codes, to accurately classify unseen target data. To the best of our knowledge, this is the first research effort to exploit the feature learning capabilities of deep neural networks to learn representative hash codes to address the domain adaptation problem. Our extensive empirical studies on multiple transfer tasks corroborate the usefulness of the framework in learning efficient hash codes which outperform existing competitive baselines for unsupervised domain adaptation.
Tasks	Domain Adaptation, Transfer Learning, Unsupervised Domain Adaptation
Published	2017-06-22
URL	http://arxiv.org/abs/1706.07522v1
PDF	http://arxiv.org/pdf/1706.07522v1.pdf
PWC	https://paperswithcode.com/paper/deep-hashing-network-for-unsupervised-domain
Repo	https://github.com/hemanthdv/da-hash
Framework	none

Adversarial-Playground: A Visualization Suite Showing How Adversarial Examples Fool Deep Learning


Title	Adversarial-Playground: A Visualization Suite Showing How Adversarial Examples Fool Deep Learning
Authors	Andrew P. Norton, Yanjun Qi
Abstract	Recent studies have shown that attackers can force deep learning models to misclassify so-called “adversarial examples”: maliciously generated images formed by making imperceptible modifications to pixel values. With growing interest in deep learning for security applications, it is important for security experts and users of machine learning to recognize how learning systems may be attacked. Due to the complex nature of deep learning, it is challenging to understand how deep models can be fooled by adversarial examples. Thus, we present a web-based visualization tool, Adversarial-Playground, to demonstrate the efficacy of common adversarial methods against a convolutional neural network (CNN) system. Adversarial-Playground is educational, modular and interactive. (1) It enables non-experts to compare examples visually and to understand why an adversarial example can fool a CNN-based image classifier. (2) It can help security experts explore more vulnerability of deep learning as a software module. (3) Building an interactive visualization is challenging in this domain due to the large feature space of image classification (generating adversarial examples is slow in general and visualizing images are costly). Through multiple novel design choices, our tool can provide fast and accurate responses to user requests. Empirically, we find that our client-server division strategy reduced the response time by an average of 1.5 seconds per sample. Our other innovation, a faster variant of JSMA evasion algorithm, empirically performed twice as fast as JSMA and yet maintains a comparable evasion rate. Project source code and data from our experiments available at: https://github.com/QData/AdversarialDNN-Playground
Tasks	Adversarial Attack, Adversarial Defense, Image Classification
Published	2017-08-01
URL	http://arxiv.org/abs/1708.00807v1
PDF	http://arxiv.org/pdf/1708.00807v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-playground-a-visualization-suite
Repo	https://github.com/QData/AdversarialDNN-Playground
Framework	tf