October 20, 2019

3095 words 15 mins read

Paper Group AWR 320

Two Local Models for Neural Constituent Parsing. Detecting Malicious PowerShell Commands using Deep Neural Networks. Data Augmentation using Random Image Cropping and Patching for Deep CNNs. Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion. Shift-Net: Image Inpainting via Deep Feature Rearrangement. DeepFlux for Skele …

Two Local Models for Neural Constituent Parsing


Title	Two Local Models for Neural Constituent Parsing
Authors	Zhiyang Teng, Yue Zhang
Abstract	Non-local features have been exploited by syntactic parsers for capturing dependencies between sub output structures. Such features have been a key to the success of state-of-the-art statistical parsers. With the rise of deep learning, however, it has been shown that local output decisions can give highly competitive accuracies, thanks to the power of dense neural input representations that embody global syntactic information. We investigate two conceptually simple local neural models for constituent parsing, which make local decisions to constituent spans and CFG rules, respectively. Consistent with previous findings along the line, our best model gives highly competitive results, achieving the labeled bracketing F1 scores of 92.4% on PTB and 87.3% on CTB 5.1.
Tasks
Published	2018-08-14
URL	http://arxiv.org/abs/1808.04850v2
PDF	http://arxiv.org/pdf/1808.04850v2.pdf
PWC	https://paperswithcode.com/paper/two-local-models-for-neural-constituent
Repo	https://github.com/zeeeyang/two-local-neural-conparsers
Framework	none

Detecting Malicious PowerShell Commands using Deep Neural Networks


Title	Detecting Malicious PowerShell Commands using Deep Neural Networks
Authors	Danny Hendler, Shay Kels, Amir Rubin
Abstract	Microsoft’s PowerShell is a command-line shell and scripting language that is installed by default on Windows machines. While PowerShell can be configured by administrators for restricting access and reducing vulnerabilities, these restrictions can be bypassed. Moreover, PowerShell commands can be easily generated dynamically, executed from memory, encoded and obfuscated, thus making the logging and forensic analysis of code executed by PowerShell challenging.For all these reasons, PowerShell is increasingly used by cybercriminals as part of their attacks’ tool chain, mainly for downloading malicious contents and for lateral movement. Indeed, a recent comprehensive technical report by Symantec dedicated to PowerShell’s abuse by cybercrimials reported on a sharp increase in the number of malicious PowerShell samples they received and in the number of penetration tools and frameworks that use PowerShell. This highlights the urgent need of developing effective methods for detecting malicious PowerShell commands.In this work, we address this challenge by implementing several novel detectors of malicious PowerShell commands and evaluating their performance. We implemented both “traditional” natural language processing (NLP) based detectors and detectors based on character-level convolutional neural networks (CNNs). Detectors’ performance was evaluated using a large real-world dataset.Our evaluation results show that, although our detectors individually yield high performance, an ensemble detector that combines an NLP-based classifier with a CNN-based classifier provides the best performance, since the latter classifier is able to detect malicious commands that succeed in evading the former. Our analysis of these evasive commands reveals that some obfuscation patterns automatically detected by the CNN classifier are intrinsically difficult to detect using the NLP techniques we applied.
Tasks
Published	2018-04-11
URL	http://arxiv.org/abs/1804.04177v2
PDF	http://arxiv.org/pdf/1804.04177v2.pdf
PWC	https://paperswithcode.com/paper/detecting-malicious-powershell-commands-using
Repo	https://github.com/ruchikagargdiwakar/ml_cyber_security_usecases
Framework	none

Data Augmentation using Random Image Cropping and Patching for Deep CNNs


Title	Data Augmentation using Random Image Cropping and Patching for Deep CNNs
Authors	Ryo Takahashi, Takashi Matsubara, Kuniaki Uehara
Abstract	Deep convolutional neural networks (CNNs) have achieved remarkable results in image processing tasks. However, their high expression ability risks overfitting. Consequently, data augmentation techniques have been proposed to prevent overfitting while enriching datasets. Recent CNN architectures with more parameters are rendering traditional data augmentation techniques insufficient. In this study, we propose a new data augmentation technique called random image cropping and patching (RICAP) which randomly crops four images and patches them to create a new training image. Moreover, RICAP mixes the class labels of the four images, resulting in an advantage similar to label smoothing. We evaluated RICAP with current state-of-the-art CNNs (e.g., the shake-shake regularization model) by comparison with competitive data augmentation techniques such as cutout and mixup. RICAP achieves a new state-of-the-art test error of $2.19%$ on CIFAR-10. We also confirmed that deep CNNs with RICAP achieve better results on classification tasks using CIFAR-100 and ImageNet and an image-caption retrieval task using Microsoft COCO.
Tasks	Data Augmentation, Image Augmentation, Image Cropping
Published	2018-11-22
URL	https://arxiv.org/abs/1811.09030v2
PDF	https://arxiv.org/pdf/1811.09030v2.pdf
PWC	https://paperswithcode.com/paper/data-augmentation-using-random-image-cropping
Repo	https://github.com/jackryo/ricap
Framework	pytorch

Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion


Title	Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion
Authors	Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Herve Jegou, Edouard Grave
Abstract	Continuous word representations learned separately on distinct languages can be aligned so that their words become comparable in a common space. Existing works typically solve a least-square regression problem to learn a rotation aligning a small bilingual lexicon, and use a retrieval criterion for inference. In this paper, we propose an unified formulation that directly optimizes a retrieval criterion in an end-to-end fashion. Our experiments on standard benchmarks show that our approach outperforms the state of the art on word translation, with the biggest improvements observed for distant language pairs such as English-Chinese.
Tasks
Published	2018-04-20
URL	http://arxiv.org/abs/1804.07745v3
PDF	http://arxiv.org/pdf/1804.07745v3.pdf
PWC	https://paperswithcode.com/paper/loss-in-translation-learning-bilingual-word
Repo	https://github.com/facebookresearch/MUSE
Framework	pytorch

Shift-Net: Image Inpainting via Deep Feature Rearrangement


Title	Shift-Net: Image Inpainting via Deep Feature Rearrangement
Authors	Zhaoyi Yan, Xiaoming Li, Mu Li, Wangmeng Zuo, Shiguang Shan
Abstract	Deep convolutional networks (CNNs) have exhibited their potential in image inpainting for producing plausible results. However, in most existing methods, e.g., context encoder, the missing parts are predicted by propagating the surrounding convolutional features through a fully connected layer, which intends to produce semantically plausible but blurry result. In this paper, we introduce a special shift-connection layer to the U-Net architecture, namely Shift-Net, for filling in missing regions of any shape with sharp structures and fine-detailed textures. To this end, the encoder feature of the known region is shifted to serve as an estimation of the missing parts. A guidance loss is introduced on decoder feature to minimize the distance between the decoder feature after fully connected layer and the ground-truth encoder feature of the missing parts. With such constraint, the decoder feature in missing region can be used to guide the shift of encoder feature in known region. An end-to-end learning algorithm is further developed to train the Shift-Net. Experiments on the Paris StreetView and Places datasets demonstrate the efficiency and effectiveness of our Shift-Net in producing sharper, fine-detailed, and visually plausible results. The codes and pre-trained models are available at https://github.com/Zhaoyi-Yan/Shift-Net.
Tasks	Image Inpainting
Published	2018-01-29
URL	http://arxiv.org/abs/1801.09392v2
PDF	http://arxiv.org/pdf/1801.09392v2.pdf
PWC	https://paperswithcode.com/paper/shift-net-image-inpainting-via-deep-feature
Repo	https://github.com/Zhaoyi-Yan/Shift-Net
Framework	pytorch

DeepFlux for Skeletons in the Wild


Title	DeepFlux for Skeletons in the Wild
Authors	Yukang Wang, Yongchao Xu, Stavros Tsogkas, Xiang Bai, Sven Dickinson, Kaleem Siddiqi
Abstract	Computing object skeletons in natural images is challenging, owing to large variations in object appearance and scale, and the complexity of handling background clutter. Many recent methods frame object skeleton detection as a binary pixel classification problem, which is similar in spirit to learning-based edge detection, as well as to semantic segmentation methods. In the present article, we depart from this strategy by training a CNN to predict a two-dimensional vector field, which maps each scene point to a candidate skeleton pixel, in the spirit of flux-based skeletonization algorithms. This “image context flux” representation has two major advantages over previous approaches. First, it explicitly encodes the relative position of skeletal pixels to semantically meaningful entities, such as the image points in their spatial context, and hence also the implied object boundaries. Second, since the skeleton detection context is a region-based vector field, it is better able to cope with object parts of large width. We evaluate the proposed method on three benchmark datasets for skeleton detection and two for symmetry detection, achieving consistently superior performance over state-of-the-art methods.
Tasks	Edge Detection, Object Skeleton Detection, Semantic Segmentation
Published	2018-11-30
URL	http://arxiv.org/abs/1811.12608v1
PDF	http://arxiv.org/pdf/1811.12608v1.pdf
PWC	https://paperswithcode.com/paper/deepflux-for-skeletons-in-the-wild
Repo	https://github.com/YukangWang/DeepFlux
Framework	none

Neural Abstract Style Transfer for Chinese Traditional Painting


Title	Neural Abstract Style Transfer for Chinese Traditional Painting
Authors	Bo Li, Caiming Xiong, Tianfu Wu, Yu Zhou, Lun Zhang, Rufeng Chu
Abstract	Chinese traditional painting is one of the most historical artworks in the world. It is very popular in Eastern and Southeast Asia due to being aesthetically appealing. Compared with western artistic painting, it is usually more visually abstract and textureless. Recently, neural network based style transfer methods have shown promising and appealing results which are mainly focused on western painting. It remains a challenging problem to preserve abstraction in neural style transfer. In this paper, we present a Neural Abstract Style Transfer method for Chinese traditional painting. It learns to preserve abstraction and other style jointly end-to-end via a novel MXDoG-guided filter (Modified version of the eXtended Difference-of-Gaussians) and three fully differentiable loss terms. To the best of our knowledge, there is little work study on neural style transfer of Chinese traditional painting. To promote research on this direction, we collect a new dataset with diverse photo-realistic images and Chinese traditional paintings. In experiments, the proposed method shows more appealing stylized results in transferring the style of Chinese traditional painting than state-of-the-art neural style transfer methods.
Tasks	Style Transfer
Published	2018-12-08
URL	http://arxiv.org/abs/1812.03264v2
PDF	http://arxiv.org/pdf/1812.03264v2.pdf
PWC	https://paperswithcode.com/paper/neural-abstract-style-transfer-for-chinese
Repo	https://github.com/lbsswu/Chinese_style_transfer
Framework	none

Morphological and Language-Agnostic Word Segmentation for NMT


Title	Morphological and Language-Agnostic Word Segmentation for NMT
Authors	Dominik Macháček, Jonáš Vidra, Ondřej Bojar
Abstract	The state of the art of handling rich morphology in neural machine translation (NMT) is to break word forms into subword units, so that the overall vocabulary size of these units fits the practical limits given by the NMT model and GPU memory capacity. In this paper, we compare two common but linguistically uninformed methods of subword construction (BPE and STE, the method implemented in Tensor2Tensor toolkit) and two linguistically-motivated methods: Morfessor and one novel method, based on a derivational dictionary. Our experiments with German-to-Czech translation, both morphologically rich, document that so far, the non-motivated methods perform better. Furthermore, we iden- tify a critical difference between BPE and STE and show a simple pre- processing step for BPE that considerably increases translation quality as evaluated by automatic measures.
Tasks	Machine Translation
Published	2018-06-14
URL	http://arxiv.org/abs/1806.05482v1
PDF	http://arxiv.org/pdf/1806.05482v1.pdf
PWC	https://paperswithcode.com/paper/morphological-and-language-agnostic-word
Repo	https://github.com/Gldkslfmsd/t2t_second
Framework	tf

On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters


Title	On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters
Authors	Mark Kozdoba, Jakub Marecek, Tigran Tchrakian, Shie Mannor
Abstract	Kalman filter is a key tool for time-series forecasting and analysis. We show that the dependence of a prediction of Kalman filter on the past is decaying exponentially, whenever the process noise is non-degenerate. Therefore, Kalman filter may be approximated by regression on a few recent observations. Surprisingly, we also show that having some process noise is essential for the exponential decay. With no process noise, it may happen that the forecast depends on all of the past uniformly, which makes forecasting more difficult. Based on this insight, we devise an on-line algorithm for improper learning of a linear dynamical system (LDS), which considers only a few most recent observations. We use our decay results to provide the first regret bounds w.r.t. to Kalman filters within learning an LDS. That is, we compare the results of our algorithm to the best, in hindsight, Kalman filter for a given signal. Also, the algorithm is practical: its per-update run-time is linear in the regression depth.
Tasks	Time Series, Time Series Forecasting
Published	2018-09-16
URL	http://arxiv.org/abs/1809.05870v1
PDF	http://arxiv.org/pdf/1809.05870v1.pdf
PWC	https://paperswithcode.com/paper/on-line-learning-of-linear-dynamical-systems
Repo	https://github.com/jmarecek/OnlineLDS
Framework	none

SeqSleepNet: End-to-End Hierarchical Recurrent Neural Network for Sequence-to-Sequence Automatic Sleep Staging


Title	SeqSleepNet: End-to-End Hierarchical Recurrent Neural Network for Sequence-to-Sequence Automatic Sleep Staging
Authors	Huy Phan, Fernando Andreotti, Navin Cooray, Oliver Y. Chén, Maarten De Vos
Abstract	Automatic sleep staging has been often treated as a simple classification problem that aims at determining the label of individual target polysomnography (PSG) epochs one at a time. In this work, we tackle the task as a sequence-to-sequence classification problem that receives a sequence of multiple epochs as input and classifies all of their labels at once. For this purpose, we propose a hierarchical recurrent neural network named SeqSleepNet. At the epoch processing level, the network consists of a filterbank layer tailored to learn frequency-domain filters for preprocessing and an attention-based recurrent layer designed for short-term sequential modelling. At the sequence processing level, a recurrent layer placed on top of the learned epoch-wise features for long-term modelling of sequential epochs. The classification is then carried out on the output vectors at every time step of the top recurrent layer to produce the sequence of output labels. Despite being hierarchical, we present a strategy to train the network in an end-to-end fashion. We show that the proposed network outperforms state-of-the-art approaches, achieving an overall accuracy, macro F1-score, and Cohen’s kappa of 87.1%, 83.3%, and 0.815 on a publicly available dataset with 200 subjects.
Tasks	Sleep Stage Detection
Published	2018-09-28
URL	http://arxiv.org/abs/1809.10932v3
PDF	http://arxiv.org/pdf/1809.10932v3.pdf
PWC	https://paperswithcode.com/paper/seqsleepnet-end-to-end-hierarchical-recurrent
Repo	https://github.com/pquochuy/SeqSleepNet
Framework	tf

A Dataset for Document Grounded Conversations


Title	A Dataset for Document Grounded Conversations
Authors	Kangyan Zhou, Shrimai Prabhumoye, Alan W Black
Abstract	This paper introduces a document grounded dataset for text conversations. We define “Document Grounded Conversations” as conversations that are about the contents of a specified document. In this dataset the specified documents were Wikipedia articles about popular movies. The dataset contains 4112 conversations with an average of 21.43 turns per conversation. This positions this dataset to not only provide a relevant chat history while generating responses but also provide a source of information that the models could use. We describe two neural architectures that provide benchmark performance on the task of generating the next response. We also evaluate our models for engagement and fluency, and find that the information from the document helps in generating more engaging and fluent responses.
Tasks
Published	2018-09-19
URL	http://arxiv.org/abs/1809.07358v1
PDF	http://arxiv.org/pdf/1809.07358v1.pdf
PWC	https://paperswithcode.com/paper/a-dataset-for-document-grounded-conversations
Repo	https://github.com/lizekang/ITDD
Framework	pytorch

Training and Inference with Integers in Deep Neural Networks


Title	Training and Inference with Integers in Deep Neural Networks
Authors	Shuang Wu, Guoqi Li, Feng Chen, Luping Shi
Abstract	Researches on deep neural networks with discrete parameters and their deployment in embedded systems have been active and promising topics. Although previous works have successfully reduced precision in inference, transferring both training and inference processes to low-bitwidth integers has not been demonstrated simultaneously. In this work, we develop a new method termed as “WAGE” to discretize both training and inference, where weights (W), activations (A), gradients (G) and errors (E) among layers are shifted and linearly constrained to low-bitwidth integers. To perform pure discrete dataflow for fixed-point devices, we further replace batch normalization by a constant scaling layer and simplify other components that are arduous for integer implementation. Improved accuracies can be obtained on multiple datasets, which indicates that WAGE somehow acts as a type of regularization. Empirically, we demonstrate the potential to deploy training in hardware systems such as integer-based deep learning accelerators and neuromorphic chips with comparable accuracy and higher energy efficiency, which is crucial to future AI applications in variable scenarios with transfer and continual learning demands.
Tasks	Continual Learning
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04680v1
PDF	http://arxiv.org/pdf/1802.04680v1.pdf
PWC	https://paperswithcode.com/paper/training-and-inference-with-integers-in-deep
Repo	https://github.com/Tiiiger/QPyTorch
Framework	pytorch

High-Quality Prediction Intervals for Deep Learning: A Distribution-Free, Ensembled Approach


Title	High-Quality Prediction Intervals for Deep Learning: A Distribution-Free, Ensembled Approach
Authors	Tim Pearce, Mohamed Zaki, Alexandra Brintrup, Andy Neely
Abstract	This paper considers the generation of prediction intervals (PIs) by neural networks for quantifying uncertainty in regression tasks. It is axiomatic that high-quality PIs should be as narrow as possible, whilst capturing a specified portion of data. We derive a loss function directly from this axiom that requires no distributional assumption. We show how its form derives from a likelihood principle, that it can be used with gradient descent, and that model uncertainty is accounted for in ensembled form. Benchmark experiments show the method outperforms current state-of-the-art uncertainty quantification methods, reducing average PI width by over 10%.
Tasks
Published	2018-02-20
URL	http://arxiv.org/abs/1802.07167v3
PDF	http://arxiv.org/pdf/1802.07167v3.pdf
PWC	https://paperswithcode.com/paper/high-quality-prediction-intervals-for-deep
Repo	https://github.com/Zaoyee/F-CNN-pytorch-simple
Framework	pytorch

A Knowledge-Grounded Multimodal Search-Based Conversational Agent


Title	A Knowledge-Grounded Multimodal Search-Based Conversational Agent
Authors	Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser
Abstract	Multimodal search-based dialogue is a challenging new task: It extends visually grounded question answering systems into multi-turn conversations with access to an external database. We address this new challenge by learning a neural response generation system from the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017). We introduce a knowledge-grounded multimodal conversational model where an encoded knowledge base (KB) representation is appended to the decoder input. Our model substantially outperforms strong baselines in terms of text-based similarity measures (over 9 BLEU points, 3 of which are solely due to the use of additional information from the KB.
Tasks	Question Answering
Published	2018-10-20
URL	http://arxiv.org/abs/1810.11954v1
PDF	http://arxiv.org/pdf/1810.11954v1.pdf
PWC	https://paperswithcode.com/paper/a-knowledge-grounded-multimodal-search-based
Repo	https://github.com/shubhamagarwal92/mmd
Framework	pytorch

Generalizable Data-free Objective for Crafting Universal Adversarial Perturbations


Title	Generalizable Data-free Objective for Crafting Universal Adversarial Perturbations
Authors	Konda Reddy Mopuri, Aditya Ganeshan, R. Venkatesh Babu
Abstract	Machine learning models are susceptible to adversarial perturbations: small changes to input that can cause large changes in output. It is also demonstrated that there exist input-agnostic perturbations, called universal adversarial perturbations, which can change the inference of target model on most of the data samples. However, existing methods to craft universal perturbations are (i) task specific, (ii) require samples from the training data distribution, and (iii) perform complex optimizations. Additionally, because of the data dependence, fooling ability of the crafted perturbations is proportional to the available training data. In this paper, we present a novel, generalizable and data-free approaches for crafting universal adversarial perturbations. Independent of the underlying task, our objective achieves fooling via corrupting the extracted features at multiple layers. Therefore, the proposed objective is generalizable to craft image-agnostic perturbations across multiple vision tasks such as object recognition, semantic segmentation, and depth estimation. In the practical setting of black-box attack scenario (when the attacker does not have access to the target model and it’s training data), we show that our objective outperforms the data dependent objectives to fool the learned models. Further, via exploiting simple priors related to the data distribution, our objective remarkably boosts the fooling ability of the crafted perturbations. Significant fooling rates achieved by our objective emphasize that the current deep learning models are now at an increased risk, since our objective generalizes across multiple tasks without the requirement of training data for crafting the perturbations. To encourage reproducible research, we have released the codes for our proposed algorithm.
Tasks	Adversarial Attack, Depth Estimation, Object Recognition, Semantic Segmentation
Published	2018-01-24
URL	http://arxiv.org/abs/1801.08092v3
PDF	http://arxiv.org/pdf/1801.08092v3.pdf
PWC	https://paperswithcode.com/paper/generalizable-data-free-objective-for
Repo	https://github.com/val-iisc/gd-uap
Framework	tf