October 21, 2019

3357 words 16 mins read

Paper Group AWR 50

Paper Group AWR 50

Parameter Sharing Methods for Multilingual Self-Attentional Translation Models. Neural Networks Regularization Through Representation Learning. Structured Adversarial Attack: Towards General Implementation and Better Interpretability. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation. Sequential Preference-Based Optimization. Au …

Parameter Sharing Methods for Multilingual Self-Attentional Translation Models

Title Parameter Sharing Methods for Multilingual Self-Attentional Translation Models
Authors Devendra Singh Sachan, Graham Neubig
Abstract In multilingual neural machine translation, it has been shown that sharing a single translation model between multiple languages can achieve competitive performance, sometimes even leading to performance gains over bilingually trained models. However, these improvements are not uniform; often multilingual parameter sharing results in a decrease in accuracy due to translation models not being able to accommodate different languages in their limited parameter space. In this work, we examine parameter sharing techniques that strike a happy medium between full sharing and individual training, specifically focusing on the self-attentional Transformer model. We find that the full parameter sharing approach leads to increases in BLEU scores mainly when the target languages are from a similar language family. However, even in the case where target languages are from different families where full parameter sharing leads to a noticeable drop in BLEU scores, our proposed methods for partial sharing of parameters can lead to substantial improvements in translation accuracy.
Tasks Machine Translation
Published 2018-09-01
URL http://arxiv.org/abs/1809.00252v2
PDF http://arxiv.org/pdf/1809.00252v2.pdf
PWC https://paperswithcode.com/paper/parameter-sharing-methods-for-multilingual
Repo https://github.com/DevSinghSachan/multilingual_nmt
Framework pytorch

Neural Networks Regularization Through Representation Learning

Title Neural Networks Regularization Through Representation Learning
Authors Soufiane Belharbi
Abstract Neural network models and deep models are one of the leading and state of the art models in machine learning. Most successful deep neural models are the ones with many layers which highly increases their number of parameters. Training such models requires a large number of training samples which is not always available. One of the fundamental issues in neural networks is overfitting which is the issue tackled in this thesis. Such problem often occurs when the training of large models is performed using few training samples. Many approaches have been proposed to prevent the network from overfitting and improve its generalization performance such as data augmentation, early stopping, parameters sharing, unsupervised learning, dropout, batch normalization, etc. In this thesis, we tackle the neural network overfitting issue from a representation learning perspective by considering the situation where few training samples are available which is the case of many real world applications. We propose three contributions. The first one presented in chapter 2 is dedicated to dealing with structured output problems to perform multivariate regression when the output variable y contains structural dependencies between its components. The second contribution described in chapter 3 deals with the classification task where we propose to exploit prior knowledge about the internal representation of the hidden layers in neural networks. Our last contribution presented in chapter 4 showed the interest of transfer learning in applications where only few samples are available. In this contribution, we provide an automatic system based on such learning scheme with an application to medical domain. In this application, the task consists in localizing the third lumbar vertebra in a 3D CT scan. This work has been done in collaboration with the clinic Rouen Henri Becquerel Center who provided us with data.
Tasks Data Augmentation, Representation Learning, Transfer Learning
Published 2018-07-13
URL http://arxiv.org/abs/1807.05292v1
PDF http://arxiv.org/pdf/1807.05292v1.pdf
PWC https://paperswithcode.com/paper/neural-networks-regularization-through
Repo https://github.com/sbelharbi/learning-class-invariant-features
Framework none

Structured Adversarial Attack: Towards General Implementation and Better Interpretability

Title Structured Adversarial Attack: Towards General Implementation and Better Interpretability
Authors Kaidi Xu, Sijia Liu, Pu Zhao, Pin-Yu Chen, Huan Zhang, Quanfu Fan, Deniz Erdogmus, Yanzhi Wang, Xue Lin
Abstract When generating adversarial examples to attack deep neural networks (DNNs), Lp norm of the added perturbation is usually used to measure the similarity between original image and adversarial example. However, such adversarial attacks perturbing the raw input spaces may fail to capture structural information hidden in the input. This work develops a more general attack model, i.e., the structured attack (StrAttack), which explores group sparsity in adversarial perturbations by sliding a mask through images aiming for extracting key spatial structures. An ADMM (alternating direction method of multipliers)-based framework is proposed that can split the original problem into a sequence of analytically solvable subproblems and can be generalized to implement other attacking methods. Strong group sparsity is achieved in adversarial perturbations even with the same level of Lp norm distortion as the state-of-the-art attacks. We demonstrate the effectiveness of StrAttack by extensive experimental results onMNIST, CIFAR-10, and ImageNet. We also show that StrAttack provides better interpretability (i.e., better correspondence with discriminative image regions)through adversarial saliency map (Papernot et al., 2016b) and class activation map(Zhou et al., 2016).
Tasks Adversarial Attack
Published 2018-08-05
URL http://arxiv.org/abs/1808.01664v3
PDF http://arxiv.org/pdf/1808.01664v3.pdf
PWC https://paperswithcode.com/paper/structured-adversarial-attack-towards-general
Repo https://github.com/KaidiXu/StrAttack
Framework tf

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

Title Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Authors Qiang Liu, Lihong Li, Ziyang Tang, Dengyong Zhou
Abstract We consider the off-policy estimation problem of estimating the expected reward of a target policy using samples collected by a different behavior policy. Importance sampling (IS) has been a key technique to derive (nearly) unbiased estimators, but is known to suffer from an excessively high variance in long-horizon problems. In the extreme case of in infinite-horizon problems, the variance of an IS-based estimator may even be unbounded. In this paper, we propose a new off-policy estimation method that applies IS directly on the stationary state-visitation distributions to avoid the exploding variance issue faced by existing estimators.Our key contribution is a novel approach to estimating the density ratio of two stationary distributions, with trajectories sampled from only the behavior distribution. We develop a mini-max loss function for the estimation problem, and derive a closed-form solution for the case of RKHS. We support our method with both theoretical and empirical analyses.
Tasks
Published 2018-10-29
URL http://arxiv.org/abs/1810.12429v1
PDF http://arxiv.org/pdf/1810.12429v1.pdf
PWC https://paperswithcode.com/paper/breaking-the-curse-of-horizon-infinite
Repo https://github.com/zt95/infinite-horizon-off-policy-estimation
Framework none

Sequential Preference-Based Optimization

Title Sequential Preference-Based Optimization
Authors Ian Dewancker, Jakob Bauer, Michael McCourt
Abstract Many real-world engineering problems rely on human preferences to guide their design and optimization. We present PrefOpt, an open source package to simplify sequential optimization tasks that incorporate human preference feedback. Our approach extends an existing latent variable model for binary preferences to allow for observations of equivalent preference from users.
Tasks
Published 2018-01-09
URL http://arxiv.org/abs/1801.02788v1
PDF http://arxiv.org/pdf/1801.02788v1.pdf
PWC https://paperswithcode.com/paper/sequential-preference-based-optimization
Repo https://github.com/prefopt/prefopt
Framework none

Automatic L3 slice detection in 3D CT images using fully-convolutional networks

Title Automatic L3 slice detection in 3D CT images using fully-convolutional networks
Authors Fahdi Kanavati, Shah Islam, Eric O. Aboagye, Andrea Rockall
Abstract The analysis of single CT slices extracted at the third lumbar vertebra (L3) has garnered significant clinical interest in the past few years, in particular in regards to quantifying sarcopenia (muscle loss). In this paper, we propose an efficient method to automatically detect the L3 slice in 3D CT images. Our method works with images with a variety of fields of view, occlusions, and slice thicknesses. 3D CT images are first converted into 2D via Maximal Intensity Projection (MIP), reducing the dimensionality of the problem. The MIP images are then used as input to a 2D fully-convolutional network to predict the L3 slice locations in the form of 2D confidence maps. In addition we propose a variant architecture with less parameters allowing 1D confidence map prediction and slightly faster prediction time without loss of accuracy. Quantitative evaluation of our method on a dataset of 1006 3D CT images yields a median error of 1mm, similar to the inter-rater median error of 1mm obtained from two annotators, demonstrating the effectiveness of our method in efficiently and accurately detecting the L3 slice.
Tasks
Published 2018-11-22
URL http://arxiv.org/abs/1811.09244v1
PDF http://arxiv.org/pdf/1811.09244v1.pdf
PWC https://paperswithcode.com/paper/automatic-l3-slice-detection-in-3d-ct-images
Repo https://github.com/fk128/ct-slice-detection
Framework none

An Analysis by Synthesis Approach for Automatic Vertebral Shape Identification in Clinical QCT

Title An Analysis by Synthesis Approach for Automatic Vertebral Shape Identification in Clinical QCT
Authors Stefan Reinhold. Timo Damm, Lukas Huber, Reimer Andresen, Reinhard Barkmann, Claus-C. Glüer, Reinhard Koch
Abstract Quantitative computed tomography (QCT) is a widely used tool for osteoporosis diagnosis and monitoring. The assessment of cortical markers like cortical bone mineral density (BMD) and thickness is a demanding task, mainly because of the limited spatial resolution of QCT. We propose a direct model based method to automatically identify the surface through the center of the cortex of human vertebra. We develop a statistical bone model and analyze its probability distribution after the imaging process. Using an as-rigid-as-possible deformation we find the cortical surface that maximizes the likelihood of our model given the input volume. Using the European Spine Phantom (ESP) and a high resolution \mu CT scan of a cadaveric vertebra, we show that the proposed method is able to accurately identify the real center of cortex ex-vivo. To demonstrate the in-vivo applicability of our method we use manually obtained surfaces for comparison.
Tasks
Published 2018-12-03
URL http://arxiv.org/abs/1812.00693v1
PDF http://arxiv.org/pdf/1812.00693v1.pdf
PWC https://paperswithcode.com/paper/an-analysis-by-synthesis-approach-for
Repo https://github.com/ithron/CortidQCT
Framework none

A Novel Focal Tversky loss function with improved Attention U-Net for lesion segmentation

Title A Novel Focal Tversky loss function with improved Attention U-Net for lesion segmentation
Authors Nabila Abraham, Naimul Mefraz Khan
Abstract We propose a generalized focal loss function based on the Tversky index to address the issue of data imbalance in medical image segmentation. Compared to the commonly used Dice loss, our loss function achieves a better trade off between precision and recall when training on small structures such as lesions. To evaluate our loss function, we improve the attention U-Net model by incorporating an image pyramid to preserve contextual features. We experiment on the BUS 2017 dataset and ISIC 2018 dataset where lesions occupy 4.84% and 21.4% of the images area and improve segmentation accuracy when compared to the standard U-Net by 25.7% and 3.6%, respectively.
Tasks Lesion Segmentation, Medical Image Segmentation, Semantic Segmentation
Published 2018-10-18
URL http://arxiv.org/abs/1810.07842v1
PDF http://arxiv.org/pdf/1810.07842v1.pdf
PWC https://paperswithcode.com/paper/a-novel-focal-tversky-loss-function-with
Repo https://github.com/nabsabraham/focal-tversky-unet
Framework tf

Subword Semantic Hashing for Intent Classification on Small Datasets

Title Subword Semantic Hashing for Intent Classification on Small Datasets
Authors Kumar Shridhar, Ayushman Dash, Amit Sahu, Gustav Grund Pihlgren, Pedro Alonso, Vinaychandran Pondenkandath, Gyorgy Kovacs, Foteini Simistira, Marcus Liwicki
Abstract In this paper, we introduce the use of Semantic Hashing as embedding for the task of Intent Classification and achieve state-of-the-art performance on three frequently used benchmarks. Intent Classification on a small dataset is a challenging task for data-hungry state-of-the-art Deep Learning based systems. Semantic Hashing is an attempt to overcome such a challenge and learn robust text classification. Current word embedding based are dependent on vocabularies. One of the major drawbacks of such methods is out-of-vocabulary terms, especially when having small training datasets and using a wider vocabulary. This is the case in Intent Classification for chatbots, where typically small datasets are extracted from internet communication. Two problems arise by the use of internet communication. First, such datasets miss a lot of terms in the vocabulary to use word embeddings efficiently. Second, users frequently make spelling errors. Typically, the models for intent classification are not trained with spelling errors and it is difficult to think about ways in which users will make mistakes. Models depending on a word vocabulary will always face such issues. An ideal classifier should handle spelling errors inherently. With Semantic Hashing, we overcome these challenges and achieve state-of-the-art results on three datasets: AskUbuntu, Chatbot, and Web Application. Our benchmarks are available online: https://github.com/kumar-shridhar/Know-Your-Intent
Tasks Chatbot, Intent Classification, Text Classification, Word Embeddings
Published 2018-10-16
URL https://arxiv.org/abs/1810.07150v3
PDF https://arxiv.org/pdf/1810.07150v3.pdf
PWC https://paperswithcode.com/paper/subword-semantic-hashing-for-intent
Repo https://github.com/MJahangeerQureshi/Text-Classification
Framework none

A Hierarchical Deep Architecture and Mini-Batch Selection Method For Joint Traffic Sign and Light Detection

Title A Hierarchical Deep Architecture and Mini-Batch Selection Method For Joint Traffic Sign and Light Detection
Authors Alex D. Pon, Oles Andrienko, Ali Harakeh, Steven L. Waslander
Abstract Traffic light and sign detectors on autonomous cars are integral for road scene perception. The literature is abundant with deep learning networks that detect either lights or signs, not both, which makes them unsuitable for real-life deployment due to the limited graphics processing unit (GPU) memory and power available on embedded systems. The root cause of this issue is that no public dataset contains both traffic light and sign labels, which leads to difficulties in developing a joint detection framework. We present a deep hierarchical architecture in conjunction with a mini-batch proposal selection mechanism that allows a network to detect both traffic lights and signs from training on separate traffic light and sign datasets. Our method solves the overlapping issue where instances from one dataset are not labelled in the other dataset. We are the first to present a network that performs joint detection on traffic lights and signs. We measure our network on the Tsinghua-Tencent 100K benchmark for traffic sign detection and the Bosch Small Traffic Lights benchmark for traffic light detection and show it outperforms the existing Bosch Small Traffic light state-of-the-art method. We focus on autonomous car deployment and show our network is more suitable than others because of its low memory footprint and real-time image processing time. Qualitative results can be viewed at https://youtu.be/_YmogPzBXOw
Tasks Traffic Sign Recognition
Published 2018-06-20
URL http://arxiv.org/abs/1806.07987v2
PDF http://arxiv.org/pdf/1806.07987v2.pdf
PWC https://paperswithcode.com/paper/a-hierarchical-deep-architecture-and-mini
Repo https://github.com/bosch-ros-pkg/bstld
Framework tf

Neural Architecture Optimization

Title Neural Architecture Optimization
Authors Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, Tie-Yan Liu
Abstract Automatic neural architecture design has shown its potential in discovering powerful neural network architectures. Existing methods, no matter based on reinforcement learning or evolutionary algorithms (EA), conduct architecture search in a discrete space, which is highly inefficient. In this paper, we propose a simple and efficient method to automatic neural architecture design based on continuous optimization. We call this new approach neural architecture optimization (NAO). There are three key components in our proposed approach: (1) An encoder embeds/maps neural network architectures into a continuous space. (2) A predictor takes the continuous representation of a network as input and predicts its accuracy. (3) A decoder maps a continuous representation of a network back to its architecture. The performance predictor and the encoder enable us to perform gradient based optimization in the continuous space to find the embedding of a new architecture with potentially better accuracy. Such a better embedding is then decoded to a network by the decoder. Experiments show that the architecture discovered by our method is very competitive for image classification task on CIFAR-10 and language modeling task on PTB, outperforming or on par with the best results of previous architecture search methods with a significantly reduction of computational resources. Specifically we obtain 1.93% test set error rate for CIFAR-10 image classification task and 56.0 test set perplexity of PTB language modeling task. Furthermore, combined with the recent proposed weight sharing mechanism, we discover powerful architecture on CIFAR-10 (with error rate 2.93%) and on PTB (with test set perplexity 56.6), with very limited computational resources (less than 10 GPU hours) for both tasks.
Tasks Image Classification, Language Modelling, Neural Architecture Search
Published 2018-08-22
URL https://arxiv.org/abs/1808.07233v5
PDF https://arxiv.org/pdf/1808.07233v5.pdf
PWC https://paperswithcode.com/paper/neural-architecture-optimization
Repo https://github.com/dicarlolab/archconvnets
Framework none

Intrusion Detection Using Mouse Dynamics

Title Intrusion Detection Using Mouse Dynamics
Authors Margit Antal, Elod Egyed-Zsigmond
Abstract Compared to other behavioural biometrics, mouse dynamics is a less explored area. General purpose data sets containing unrestricted mouse usage data are usually not available. The Balabit data set was released in 2016 for a data science competition, which against the few subjects, can be considered the first adequate publicly available one. This paper presents a performance evaluation study on this data set for impostor detection. The existence of very short test sessions makes this data set challenging. Raw data were segmented into mouse move, point and click and drag and drop types of mouse actions, then several features were extracted. In contrast to keystroke dynamics, mouse data is not sensitive, therefore it is possible to collect negative mouse dynamics data and to use two-class classifiers for impostor detection. Both action- and set of actions-based evaluations were performed. Set of actions-based evaluation achieves 0.92 AUC on the test part of the data set. However, the same type of evaluation conducted on the training part of the data set resulted in maximal AUC (1) using only 13 actions. Drag and drop mouse actions proved to be the best actions for impostor detection.
Tasks Intrusion Detection
Published 2018-10-10
URL http://arxiv.org/abs/1810.04668v1
PDF http://arxiv.org/pdf/1810.04668v1.pdf
PWC https://paperswithcode.com/paper/intrusion-detection-using-mouse-dynamics
Repo https://github.com/margitantal68/mouse_dynamics_balabit_chaoshen_dfl
Framework none

SEGEN: Sample-Ensemble Genetic Evolutional Network Model

Title SEGEN: Sample-Ensemble Genetic Evolutional Network Model
Authors Jiawei Zhang, Limeng Cui, Fisher B. Gouza
Abstract Deep learning, a rebranding of deep neural network research works, has achieved a remarkable success in recent years. With multiple hidden layers, deep learning models aim at computing the hierarchical feature representations of the observational data. Meanwhile, due to its severe disadvantages in data consumption, computational resources, parameter tuning costs and the lack of result explainability, deep learning has also suffered from lots of criticism. In this paper, we will introduce a new representation learning model, namely “Sample-Ensemble Genetic Evolutionary Network” (SEGEN), which can serve as an alternative approach to deep learning models. Instead of building one single deep model, based on a set of sampled sub-instances, SEGEN adopts a genetic-evolutionary learning strategy to build a group of unit models generations by generations. The unit models incorporated in SEGEN can be either traditional machine learning models or the recent deep learning models with a much “narrower” and “shallower” architecture. The learning results of each instance at the final generation will be effectively combined from each unit model via diffusive propagation and ensemble learning strategies. From the computational perspective, SEGEN requires far less data, fewer computational resources and parameter tuning efforts, but has sound theoretic interpretability of the learning process and results. Extensive experiments have been done on several different real-world benchmark datasets, and the experimental results obtained by SEGEN have demonstrated its advantages over the state-of-the-art representation learning models.
Tasks Representation Learning
Published 2018-03-23
URL http://arxiv.org/abs/1803.08631v2
PDF http://arxiv.org/pdf/1803.08631v2.pdf
PWC https://paperswithcode.com/paper/segen-sample-ensemble-genetic-evolutional
Repo https://github.com/jwzhanggy/Graph-Bert
Framework pytorch

Generalized Capsule Networks with Trainable Routing Procedure

Title Generalized Capsule Networks with Trainable Routing Procedure
Authors Zhenhua Chen, David Crandall
Abstract CapsNet (Capsule Network) was first proposed by~\citet{capsule} and later another version of CapsNet was proposed by~\citet{emrouting}. CapsNet has been proved effective in modeling spatial features with much fewer parameters. However, the routing procedures in both papers are not well incorporated into the whole training process. The optimal number of routing procedure is misery which has to be found manually. To overcome this disadvantages of current routing procedures in CapsNet, we embed the routing procedure into the optimization procedure with all other parameters in neural networks, namely, make coupling coefficients in the routing procedure become completely trainable. We call it Generalized CapsNet (G-CapsNet). We implement both “full-connected” version of G-CapsNet and “convolutional” version of G-CapsNet. G-CapsNet achieves a similar performance in the dataset MNIST as in the original papers. We also test two capsule packing method (cross feature maps or with feature maps) from previous convolutional layers and see no evident difference. Besides, we also explored possibility of stacking multiple capsule layers. The code is shared on \hyperlink{https://github.com/chenzhenhua986/CAFFE-CapsNet}{CAFFE-CapsNet}.
Tasks
Published 2018-08-27
URL http://arxiv.org/abs/1808.08692v1
PDF http://arxiv.org/pdf/1808.08692v1.pdf
PWC https://paperswithcode.com/paper/generalized-capsule-networks-with-trainable
Repo https://github.com/chenzhenhua986/CAFFE-CapsNet
Framework none

Learning with Random Learning Rates

Title Learning with Random Learning Rates
Authors Léonard Blier, Pierre Wolinski, Yann Ollivier
Abstract Hyperparameter tuning is a bothersome step in the training of deep learning models. One of the most sensitive hyperparameters is the learning rate of the gradient descent. We present the ‘All Learning Rates At Once’ (Alrao) optimization method for neural networks: each unit or feature in the network gets its own learning rate sampled from a random distribution spanning several orders of magnitude. This comes at practically no computational cost. Perhaps surprisingly, stochastic gradient descent (SGD) with Alrao performs close to SGD with an optimally tuned learning rate, for various architectures and problems. Alrao could save time when testing deep learning models: a range of models could be quickly assessed with Alrao, and the most promising models could then be trained more extensively. This text comes with a PyTorch implementation of the method, which can be plugged on an existing PyTorch model: https://github.com/leonardblier/alrao .
Tasks
Published 2018-10-02
URL http://arxiv.org/abs/1810.01322v3
PDF http://arxiv.org/pdf/1810.01322v3.pdf
PWC https://paperswithcode.com/paper/learning-with-random-learning-rates
Repo https://github.com/leonardblier/alrao
Framework pytorch
comments powered by Disqus