Paper Group AWR 320
Two Local Models for Neural Constituent Parsing. Detecting Malicious PowerShell Commands using Deep Neural Networks. Data Augmentation using Random Image Cropping and Patching for Deep CNNs. Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion. Shift-Net: Image Inpainting via Deep Feature Rearrangement. DeepFlux for Skele …
Two Local Models for Neural Constituent Parsing
Title | Two Local Models for Neural Constituent Parsing |
Authors | Zhiyang Teng, Yue Zhang |
Abstract | Non-local features have been exploited by syntactic parsers for capturing dependencies between sub output structures. Such features have been a key to the success of state-of-the-art statistical parsers. With the rise of deep learning, however, it has been shown that local output decisions can give highly competitive accuracies, thanks to the power of dense neural input representations that embody global syntactic information. We investigate two conceptually simple local neural models for constituent parsing, which make local decisions to constituent spans and CFG rules, respectively. Consistent with previous findings along the line, our best model gives highly competitive results, achieving the labeled bracketing F1 scores of 92.4% on PTB and 87.3% on CTB 5.1. |
Tasks | |
Published | 2018-08-14 |
URL | http://arxiv.org/abs/1808.04850v2 |
http://arxiv.org/pdf/1808.04850v2.pdf | |
PWC | https://paperswithcode.com/paper/two-local-models-for-neural-constituent |
Repo | https://github.com/zeeeyang/two-local-neural-conparsers |
Framework | none |
Detecting Malicious PowerShell Commands using Deep Neural Networks
Title | Detecting Malicious PowerShell Commands using Deep Neural Networks |
Authors | Danny Hendler, Shay Kels, Amir Rubin |
Abstract | Microsoft’s PowerShell is a command-line shell and scripting language that is installed by default on Windows machines. While PowerShell can be configured by administrators for restricting access and reducing vulnerabilities, these restrictions can be bypassed. Moreover, PowerShell commands can be easily generated dynamically, executed from memory, encoded and obfuscated, thus making the logging and forensic analysis of code executed by PowerShell challenging.For all these reasons, PowerShell is increasingly used by cybercriminals as part of their attacks’ tool chain, mainly for downloading malicious contents and for lateral movement. Indeed, a recent comprehensive technical report by Symantec dedicated to PowerShell’s abuse by cybercrimials reported on a sharp increase in the number of malicious PowerShell samples they received and in the number of penetration tools and frameworks that use PowerShell. This highlights the urgent need of developing effective methods for detecting malicious PowerShell commands.In this work, we address this challenge by implementing several novel detectors of malicious PowerShell commands and evaluating their performance. We implemented both “traditional” natural language processing (NLP) based detectors and detectors based on character-level convolutional neural networks (CNNs). Detectors’ performance was evaluated using a large real-world dataset.Our evaluation results show that, although our detectors individually yield high performance, an ensemble detector that combines an NLP-based classifier with a CNN-based classifier provides the best performance, since the latter classifier is able to detect malicious commands that succeed in evading the former. Our analysis of these evasive commands reveals that some obfuscation patterns automatically detected by the CNN classifier are intrinsically difficult to detect using the NLP techniques we applied. |
Tasks | |
Published | 2018-04-11 |
URL | http://arxiv.org/abs/1804.04177v2 |
http://arxiv.org/pdf/1804.04177v2.pdf | |
PWC | https://paperswithcode.com/paper/detecting-malicious-powershell-commands-using |
Repo | https://github.com/ruchikagargdiwakar/ml_cyber_security_usecases |
Framework | none |
Data Augmentation using Random Image Cropping and Patching for Deep CNNs
Title | Data Augmentation using Random Image Cropping and Patching for Deep CNNs |
Authors | Ryo Takahashi, Takashi Matsubara, Kuniaki Uehara |
Abstract | Deep convolutional neural networks (CNNs) have achieved remarkable results in image processing tasks. However, their high expression ability risks overfitting. Consequently, data augmentation techniques have been proposed to prevent overfitting while enriching datasets. Recent CNN architectures with more parameters are rendering traditional data augmentation techniques insufficient. In this study, we propose a new data augmentation technique called random image cropping and patching (RICAP) which randomly crops four images and patches them to create a new training image. Moreover, RICAP mixes the class labels of the four images, resulting in an advantage similar to label smoothing. We evaluated RICAP with current state-of-the-art CNNs (e.g., the shake-shake regularization model) by comparison with competitive data augmentation techniques such as cutout and mixup. RICAP achieves a new state-of-the-art test error of $2.19%$ on CIFAR-10. We also confirmed that deep CNNs with RICAP achieve better results on classification tasks using CIFAR-100 and ImageNet and an image-caption retrieval task using Microsoft COCO. |
Tasks | Data Augmentation, Image Augmentation, Image Cropping |
Published | 2018-11-22 |
URL | https://arxiv.org/abs/1811.09030v2 |
https://arxiv.org/pdf/1811.09030v2.pdf | |
PWC | https://paperswithcode.com/paper/data-augmentation-using-random-image-cropping |
Repo | https://github.com/jackryo/ricap |
Framework | pytorch |
Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion
Title | Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion |
Authors | Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Herve Jegou, Edouard Grave |
Abstract | Continuous word representations learned separately on distinct languages can be aligned so that their words become comparable in a common space. Existing works typically solve a least-square regression problem to learn a rotation aligning a small bilingual lexicon, and use a retrieval criterion for inference. In this paper, we propose an unified formulation that directly optimizes a retrieval criterion in an end-to-end fashion. Our experiments on standard benchmarks show that our approach outperforms the state of the art on word translation, with the biggest improvements observed for distant language pairs such as English-Chinese. |
Tasks | |
Published | 2018-04-20 |
URL | http://arxiv.org/abs/1804.07745v3 |
http://arxiv.org/pdf/1804.07745v3.pdf | |
PWC | https://paperswithcode.com/paper/loss-in-translation-learning-bilingual-word |
Repo | https://github.com/facebookresearch/MUSE |
Framework | pytorch |
Shift-Net: Image Inpainting via Deep Feature Rearrangement
Title | Shift-Net: Image Inpainting via Deep Feature Rearrangement |
Authors | Zhaoyi Yan, Xiaoming Li, Mu Li, Wangmeng Zuo, Shiguang Shan |
Abstract | Deep convolutional networks (CNNs) have exhibited their potential in image inpainting for producing plausible results. However, in most existing methods, e.g., context encoder, the missing parts are predicted by propagating the surrounding convolutional features through a fully connected layer, which intends to produce semantically plausible but blurry result. In this paper, we introduce a special shift-connection layer to the U-Net architecture, namely Shift-Net, for filling in missing regions of any shape with sharp structures and fine-detailed textures. To this end, the encoder feature of the known region is shifted to serve as an estimation of the missing parts. A guidance loss is introduced on decoder feature to minimize the distance between the decoder feature after fully connected layer and the ground-truth encoder feature of the missing parts. With such constraint, the decoder feature in missing region can be used to guide the shift of encoder feature in known region. An end-to-end learning algorithm is further developed to train the Shift-Net. Experiments on the Paris StreetView and Places datasets demonstrate the efficiency and effectiveness of our Shift-Net in producing sharper, fine-detailed, and visually plausible results. The codes and pre-trained models are available at https://github.com/Zhaoyi-Yan/Shift-Net. |
Tasks | Image Inpainting |
Published | 2018-01-29 |
URL | http://arxiv.org/abs/1801.09392v2 |
http://arxiv.org/pdf/1801.09392v2.pdf | |
PWC | https://paperswithcode.com/paper/shift-net-image-inpainting-via-deep-feature |
Repo | https://github.com/Zhaoyi-Yan/Shift-Net |
Framework | pytorch |
DeepFlux for Skeletons in the Wild
Title | DeepFlux for Skeletons in the Wild |
Authors | Yukang Wang, Yongchao Xu, Stavros Tsogkas, Xiang Bai, Sven Dickinson, Kaleem Siddiqi |
Abstract | Computing object skeletons in natural images is challenging, owing to large variations in object appearance and scale, and the complexity of handling background clutter. Many recent methods frame object skeleton detection as a binary pixel classification problem, which is similar in spirit to learning-based edge detection, as well as to semantic segmentation methods. In the present article, we depart from this strategy by training a CNN to predict a two-dimensional vector field, which maps each scene point to a candidate skeleton pixel, in the spirit of flux-based skeletonization algorithms. This “image context flux” representation has two major advantages over previous approaches. First, it explicitly encodes the relative position of skeletal pixels to semantically meaningful entities, such as the image points in their spatial context, and hence also the implied object boundaries. Second, since the skeleton detection context is a region-based vector field, it is better able to cope with object parts of large width. We evaluate the proposed method on three benchmark datasets for skeleton detection and two for symmetry detection, achieving consistently superior performance over state-of-the-art methods. |
Tasks | Edge Detection, Object Skeleton Detection, Semantic Segmentation |
Published | 2018-11-30 |
URL | http://arxiv.org/abs/1811.12608v1 |
http://arxiv.org/pdf/1811.12608v1.pdf | |
PWC | https://paperswithcode.com/paper/deepflux-for-skeletons-in-the-wild |
Repo | https://github.com/YukangWang/DeepFlux |
Framework | none |
Neural Abstract Style Transfer for Chinese Traditional Painting
Title | Neural Abstract Style Transfer for Chinese Traditional Painting |
Authors | Bo Li, Caiming Xiong, Tianfu Wu, Yu Zhou, Lun Zhang, Rufeng Chu |
Abstract | Chinese traditional painting is one of the most historical artworks in the world. It is very popular in Eastern and Southeast Asia due to being aesthetically appealing. Compared with western artistic painting, it is usually more visually abstract and textureless. Recently, neural network based style transfer methods have shown promising and appealing results which are mainly focused on western painting. It remains a challenging problem to preserve abstraction in neural style transfer. In this paper, we present a Neural Abstract Style Transfer method for Chinese traditional painting. It learns to preserve abstraction and other style jointly end-to-end via a novel MXDoG-guided filter (Modified version of the eXtended Difference-of-Gaussians) and three fully differentiable loss terms. To the best of our knowledge, there is little work study on neural style transfer of Chinese traditional painting. To promote research on this direction, we collect a new dataset with diverse photo-realistic images and Chinese traditional paintings. In experiments, the proposed method shows more appealing stylized results in transferring the style of Chinese traditional painting than state-of-the-art neural style transfer methods. |
Tasks | Style Transfer |
Published | 2018-12-08 |
URL | http://arxiv.org/abs/1812.03264v2 |
http://arxiv.org/pdf/1812.03264v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-abstract-style-transfer-for-chinese |
Repo | https://github.com/lbsswu/Chinese_style_transfer |
Framework | none |
Morphological and Language-Agnostic Word Segmentation for NMT
Title | Morphological and Language-Agnostic Word Segmentation for NMT |
Authors | Dominik Macháček, Jonáš Vidra, Ondřej Bojar |
Abstract | The state of the art of handling rich morphology in neural machine translation (NMT) is to break word forms into subword units, so that the overall vocabulary size of these units fits the practical limits given by the NMT model and GPU memory capacity. In this paper, we compare two common but linguistically uninformed methods of subword construction (BPE and STE, the method implemented in Tensor2Tensor toolkit) and two linguistically-motivated methods: Morfessor and one novel method, based on a derivational dictionary. Our experiments with German-to-Czech translation, both morphologically rich, document that so far, the non-motivated methods perform better. Furthermore, we iden- tify a critical difference between BPE and STE and show a simple pre- processing step for BPE that considerably increases translation quality as evaluated by automatic measures. |
Tasks | Machine Translation |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05482v1 |
http://arxiv.org/pdf/1806.05482v1.pdf | |
PWC | https://paperswithcode.com/paper/morphological-and-language-agnostic-word |
Repo | https://github.com/Gldkslfmsd/t2t_second |
Framework | tf |
On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters
Title | On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters |
Authors | Mark Kozdoba, Jakub Marecek, Tigran Tchrakian, Shie Mannor |
Abstract | Kalman filter is a key tool for time-series forecasting and analysis. We show that the dependence of a prediction of Kalman filter on the past is decaying exponentially, whenever the process noise is non-degenerate. Therefore, Kalman filter may be approximated by regression on a few recent observations. Surprisingly, we also show that having some process noise is essential for the exponential decay. With no process noise, it may happen that the forecast depends on all of the past uniformly, which makes forecasting more difficult. Based on this insight, we devise an on-line algorithm for improper learning of a linear dynamical system (LDS), which considers only a few most recent observations. We use our decay results to provide the first regret bounds w.r.t. to Kalman filters within learning an LDS. That is, we compare the results of our algorithm to the best, in hindsight, Kalman filter for a given signal. Also, the algorithm is practical: its per-update run-time is linear in the regression depth. |
Tasks | Time Series, Time Series Forecasting |
Published | 2018-09-16 |
URL | http://arxiv.org/abs/1809.05870v1 |
http://arxiv.org/pdf/1809.05870v1.pdf | |
PWC | https://paperswithcode.com/paper/on-line-learning-of-linear-dynamical-systems |
Repo | https://github.com/jmarecek/OnlineLDS |
Framework | none |
SeqSleepNet: End-to-End Hierarchical Recurrent Neural Network for Sequence-to-Sequence Automatic Sleep Staging
Title | SeqSleepNet: End-to-End Hierarchical Recurrent Neural Network for Sequence-to-Sequence Automatic Sleep Staging |
Authors | Huy Phan, Fernando Andreotti, Navin Cooray, Oliver Y. Chén, Maarten De Vos |
Abstract | Automatic sleep staging has been often treated as a simple classification problem that aims at determining the label of individual target polysomnography (PSG) epochs one at a time. In this work, we tackle the task as a sequence-to-sequence classification problem that receives a sequence of multiple epochs as input and classifies all of their labels at once. For this purpose, we propose a hierarchical recurrent neural network named SeqSleepNet. At the epoch processing level, the network consists of a filterbank layer tailored to learn frequency-domain filters for preprocessing and an attention-based recurrent layer designed for short-term sequential modelling. At the sequence processing level, a recurrent layer placed on top of the learned epoch-wise features for long-term modelling of sequential epochs. The classification is then carried out on the output vectors at every time step of the top recurrent layer to produce the sequence of output labels. Despite being hierarchical, we present a strategy to train the network in an end-to-end fashion. We show that the proposed network outperforms state-of-the-art approaches, achieving an overall accuracy, macro F1-score, and Cohen’s kappa of 87.1%, 83.3%, and 0.815 on a publicly available dataset with 200 subjects. |
Tasks | Sleep Stage Detection |
Published | 2018-09-28 |
URL | http://arxiv.org/abs/1809.10932v3 |
http://arxiv.org/pdf/1809.10932v3.pdf | |
PWC | https://paperswithcode.com/paper/seqsleepnet-end-to-end-hierarchical-recurrent |
Repo | https://github.com/pquochuy/SeqSleepNet |
Framework | tf |
A Dataset for Document Grounded Conversations
Title | A Dataset for Document Grounded Conversations |
Authors | Kangyan Zhou, Shrimai Prabhumoye, Alan W Black |
Abstract | This paper introduces a document grounded dataset for text conversations. We define “Document Grounded Conversations” as conversations that are about the contents of a specified document. In this dataset the specified documents were Wikipedia articles about popular movies. The dataset contains 4112 conversations with an average of 21.43 turns per conversation. This positions this dataset to not only provide a relevant chat history while generating responses but also provide a source of information that the models could use. We describe two neural architectures that provide benchmark performance on the task of generating the next response. We also evaluate our models for engagement and fluency, and find that the information from the document helps in generating more engaging and fluent responses. |
Tasks | |
Published | 2018-09-19 |
URL | http://arxiv.org/abs/1809.07358v1 |
http://arxiv.org/pdf/1809.07358v1.pdf | |
PWC | https://paperswithcode.com/paper/a-dataset-for-document-grounded-conversations |
Repo | https://github.com/lizekang/ITDD |
Framework | pytorch |
Training and Inference with Integers in Deep Neural Networks
Title | Training and Inference with Integers in Deep Neural Networks |
Authors | Shuang Wu, Guoqi Li, Feng Chen, Luping Shi |
Abstract | Researches on deep neural networks with discrete parameters and their deployment in embedded systems have been active and promising topics. Although previous works have successfully reduced precision in inference, transferring both training and inference processes to low-bitwidth integers has not been demonstrated simultaneously. In this work, we develop a new method termed as “WAGE” to discretize both training and inference, where weights (W), activations (A), gradients (G) and errors (E) among layers are shifted and linearly constrained to low-bitwidth integers. To perform pure discrete dataflow for fixed-point devices, we further replace batch normalization by a constant scaling layer and simplify other components that are arduous for integer implementation. Improved accuracies can be obtained on multiple datasets, which indicates that WAGE somehow acts as a type of regularization. Empirically, we demonstrate the potential to deploy training in hardware systems such as integer-based deep learning accelerators and neuromorphic chips with comparable accuracy and higher energy efficiency, which is crucial to future AI applications in variable scenarios with transfer and continual learning demands. |
Tasks | Continual Learning |
Published | 2018-02-13 |
URL | http://arxiv.org/abs/1802.04680v1 |
http://arxiv.org/pdf/1802.04680v1.pdf | |
PWC | https://paperswithcode.com/paper/training-and-inference-with-integers-in-deep |
Repo | https://github.com/Tiiiger/QPyTorch |
Framework | pytorch |
High-Quality Prediction Intervals for Deep Learning: A Distribution-Free, Ensembled Approach
Title | High-Quality Prediction Intervals for Deep Learning: A Distribution-Free, Ensembled Approach |
Authors | Tim Pearce, Mohamed Zaki, Alexandra Brintrup, Andy Neely |
Abstract | This paper considers the generation of prediction intervals (PIs) by neural networks for quantifying uncertainty in regression tasks. It is axiomatic that high-quality PIs should be as narrow as possible, whilst capturing a specified portion of data. We derive a loss function directly from this axiom that requires no distributional assumption. We show how its form derives from a likelihood principle, that it can be used with gradient descent, and that model uncertainty is accounted for in ensembled form. Benchmark experiments show the method outperforms current state-of-the-art uncertainty quantification methods, reducing average PI width by over 10%. |
Tasks | |
Published | 2018-02-20 |
URL | http://arxiv.org/abs/1802.07167v3 |
http://arxiv.org/pdf/1802.07167v3.pdf | |
PWC | https://paperswithcode.com/paper/high-quality-prediction-intervals-for-deep |
Repo | https://github.com/Zaoyee/F-CNN-pytorch-simple |
Framework | pytorch |
A Knowledge-Grounded Multimodal Search-Based Conversational Agent
Title | A Knowledge-Grounded Multimodal Search-Based Conversational Agent |
Authors | Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser |
Abstract | Multimodal search-based dialogue is a challenging new task: It extends visually grounded question answering systems into multi-turn conversations with access to an external database. We address this new challenge by learning a neural response generation system from the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017). We introduce a knowledge-grounded multimodal conversational model where an encoded knowledge base (KB) representation is appended to the decoder input. Our model substantially outperforms strong baselines in terms of text-based similarity measures (over 9 BLEU points, 3 of which are solely due to the use of additional information from the KB. |
Tasks | Question Answering |
Published | 2018-10-20 |
URL | http://arxiv.org/abs/1810.11954v1 |
http://arxiv.org/pdf/1810.11954v1.pdf | |
PWC | https://paperswithcode.com/paper/a-knowledge-grounded-multimodal-search-based |
Repo | https://github.com/shubhamagarwal92/mmd |
Framework | pytorch |
Generalizable Data-free Objective for Crafting Universal Adversarial Perturbations
Title | Generalizable Data-free Objective for Crafting Universal Adversarial Perturbations |
Authors | Konda Reddy Mopuri, Aditya Ganeshan, R. Venkatesh Babu |
Abstract | Machine learning models are susceptible to adversarial perturbations: small changes to input that can cause large changes in output. It is also demonstrated that there exist input-agnostic perturbations, called universal adversarial perturbations, which can change the inference of target model on most of the data samples. However, existing methods to craft universal perturbations are (i) task specific, (ii) require samples from the training data distribution, and (iii) perform complex optimizations. Additionally, because of the data dependence, fooling ability of the crafted perturbations is proportional to the available training data. In this paper, we present a novel, generalizable and data-free approaches for crafting universal adversarial perturbations. Independent of the underlying task, our objective achieves fooling via corrupting the extracted features at multiple layers. Therefore, the proposed objective is generalizable to craft image-agnostic perturbations across multiple vision tasks such as object recognition, semantic segmentation, and depth estimation. In the practical setting of black-box attack scenario (when the attacker does not have access to the target model and it’s training data), we show that our objective outperforms the data dependent objectives to fool the learned models. Further, via exploiting simple priors related to the data distribution, our objective remarkably boosts the fooling ability of the crafted perturbations. Significant fooling rates achieved by our objective emphasize that the current deep learning models are now at an increased risk, since our objective generalizes across multiple tasks without the requirement of training data for crafting the perturbations. To encourage reproducible research, we have released the codes for our proposed algorithm. |
Tasks | Adversarial Attack, Depth Estimation, Object Recognition, Semantic Segmentation |
Published | 2018-01-24 |
URL | http://arxiv.org/abs/1801.08092v3 |
http://arxiv.org/pdf/1801.08092v3.pdf | |
PWC | https://paperswithcode.com/paper/generalizable-data-free-objective-for |
Repo | https://github.com/val-iisc/gd-uap |
Framework | tf |