Paper Group AWR 50
Parameter Sharing Methods for Multilingual Self-Attentional Translation Models. Neural Networks Regularization Through Representation Learning. Structured Adversarial Attack: Towards General Implementation and Better Interpretability. Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation. Sequential Preference-Based Optimization. Au …
Parameter Sharing Methods for Multilingual Self-Attentional Translation Models
Title | Parameter Sharing Methods for Multilingual Self-Attentional Translation Models |
Authors | Devendra Singh Sachan, Graham Neubig |
Abstract | In multilingual neural machine translation, it has been shown that sharing a single translation model between multiple languages can achieve competitive performance, sometimes even leading to performance gains over bilingually trained models. However, these improvements are not uniform; often multilingual parameter sharing results in a decrease in accuracy due to translation models not being able to accommodate different languages in their limited parameter space. In this work, we examine parameter sharing techniques that strike a happy medium between full sharing and individual training, specifically focusing on the self-attentional Transformer model. We find that the full parameter sharing approach leads to increases in BLEU scores mainly when the target languages are from a similar language family. However, even in the case where target languages are from different families where full parameter sharing leads to a noticeable drop in BLEU scores, our proposed methods for partial sharing of parameters can lead to substantial improvements in translation accuracy. |
Tasks | Machine Translation |
Published | 2018-09-01 |
URL | http://arxiv.org/abs/1809.00252v2 |
http://arxiv.org/pdf/1809.00252v2.pdf | |
PWC | https://paperswithcode.com/paper/parameter-sharing-methods-for-multilingual |
Repo | https://github.com/DevSinghSachan/multilingual_nmt |
Framework | pytorch |
Neural Networks Regularization Through Representation Learning
Title | Neural Networks Regularization Through Representation Learning |
Authors | Soufiane Belharbi |
Abstract | Neural network models and deep models are one of the leading and state of the art models in machine learning. Most successful deep neural models are the ones with many layers which highly increases their number of parameters. Training such models requires a large number of training samples which is not always available. One of the fundamental issues in neural networks is overfitting which is the issue tackled in this thesis. Such problem often occurs when the training of large models is performed using few training samples. Many approaches have been proposed to prevent the network from overfitting and improve its generalization performance such as data augmentation, early stopping, parameters sharing, unsupervised learning, dropout, batch normalization, etc. In this thesis, we tackle the neural network overfitting issue from a representation learning perspective by considering the situation where few training samples are available which is the case of many real world applications. We propose three contributions. The first one presented in chapter 2 is dedicated to dealing with structured output problems to perform multivariate regression when the output variable y contains structural dependencies between its components. The second contribution described in chapter 3 deals with the classification task where we propose to exploit prior knowledge about the internal representation of the hidden layers in neural networks. Our last contribution presented in chapter 4 showed the interest of transfer learning in applications where only few samples are available. In this contribution, we provide an automatic system based on such learning scheme with an application to medical domain. In this application, the task consists in localizing the third lumbar vertebra in a 3D CT scan. This work has been done in collaboration with the clinic Rouen Henri Becquerel Center who provided us with data. |
Tasks | Data Augmentation, Representation Learning, Transfer Learning |
Published | 2018-07-13 |
URL | http://arxiv.org/abs/1807.05292v1 |
http://arxiv.org/pdf/1807.05292v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-networks-regularization-through |
Repo | https://github.com/sbelharbi/learning-class-invariant-features |
Framework | none |
Structured Adversarial Attack: Towards General Implementation and Better Interpretability
Title | Structured Adversarial Attack: Towards General Implementation and Better Interpretability |
Authors | Kaidi Xu, Sijia Liu, Pu Zhao, Pin-Yu Chen, Huan Zhang, Quanfu Fan, Deniz Erdogmus, Yanzhi Wang, Xue Lin |
Abstract | When generating adversarial examples to attack deep neural networks (DNNs), Lp norm of the added perturbation is usually used to measure the similarity between original image and adversarial example. However, such adversarial attacks perturbing the raw input spaces may fail to capture structural information hidden in the input. This work develops a more general attack model, i.e., the structured attack (StrAttack), which explores group sparsity in adversarial perturbations by sliding a mask through images aiming for extracting key spatial structures. An ADMM (alternating direction method of multipliers)-based framework is proposed that can split the original problem into a sequence of analytically solvable subproblems and can be generalized to implement other attacking methods. Strong group sparsity is achieved in adversarial perturbations even with the same level of Lp norm distortion as the state-of-the-art attacks. We demonstrate the effectiveness of StrAttack by extensive experimental results onMNIST, CIFAR-10, and ImageNet. We also show that StrAttack provides better interpretability (i.e., better correspondence with discriminative image regions)through adversarial saliency map (Papernot et al., 2016b) and class activation map(Zhou et al., 2016). |
Tasks | Adversarial Attack |
Published | 2018-08-05 |
URL | http://arxiv.org/abs/1808.01664v3 |
http://arxiv.org/pdf/1808.01664v3.pdf | |
PWC | https://paperswithcode.com/paper/structured-adversarial-attack-towards-general |
Repo | https://github.com/KaidiXu/StrAttack |
Framework | tf |
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Title | Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation |
Authors | Qiang Liu, Lihong Li, Ziyang Tang, Dengyong Zhou |
Abstract | We consider the off-policy estimation problem of estimating the expected reward of a target policy using samples collected by a different behavior policy. Importance sampling (IS) has been a key technique to derive (nearly) unbiased estimators, but is known to suffer from an excessively high variance in long-horizon problems. In the extreme case of in infinite-horizon problems, the variance of an IS-based estimator may even be unbounded. In this paper, we propose a new off-policy estimation method that applies IS directly on the stationary state-visitation distributions to avoid the exploding variance issue faced by existing estimators.Our key contribution is a novel approach to estimating the density ratio of two stationary distributions, with trajectories sampled from only the behavior distribution. We develop a mini-max loss function for the estimation problem, and derive a closed-form solution for the case of RKHS. We support our method with both theoretical and empirical analyses. |
Tasks | |
Published | 2018-10-29 |
URL | http://arxiv.org/abs/1810.12429v1 |
http://arxiv.org/pdf/1810.12429v1.pdf | |
PWC | https://paperswithcode.com/paper/breaking-the-curse-of-horizon-infinite |
Repo | https://github.com/zt95/infinite-horizon-off-policy-estimation |
Framework | none |
Sequential Preference-Based Optimization
Title | Sequential Preference-Based Optimization |
Authors | Ian Dewancker, Jakob Bauer, Michael McCourt |
Abstract | Many real-world engineering problems rely on human preferences to guide their design and optimization. We present PrefOpt, an open source package to simplify sequential optimization tasks that incorporate human preference feedback. Our approach extends an existing latent variable model for binary preferences to allow for observations of equivalent preference from users. |
Tasks | |
Published | 2018-01-09 |
URL | http://arxiv.org/abs/1801.02788v1 |
http://arxiv.org/pdf/1801.02788v1.pdf | |
PWC | https://paperswithcode.com/paper/sequential-preference-based-optimization |
Repo | https://github.com/prefopt/prefopt |
Framework | none |
Automatic L3 slice detection in 3D CT images using fully-convolutional networks
Title | Automatic L3 slice detection in 3D CT images using fully-convolutional networks |
Authors | Fahdi Kanavati, Shah Islam, Eric O. Aboagye, Andrea Rockall |
Abstract | The analysis of single CT slices extracted at the third lumbar vertebra (L3) has garnered significant clinical interest in the past few years, in particular in regards to quantifying sarcopenia (muscle loss). In this paper, we propose an efficient method to automatically detect the L3 slice in 3D CT images. Our method works with images with a variety of fields of view, occlusions, and slice thicknesses. 3D CT images are first converted into 2D via Maximal Intensity Projection (MIP), reducing the dimensionality of the problem. The MIP images are then used as input to a 2D fully-convolutional network to predict the L3 slice locations in the form of 2D confidence maps. In addition we propose a variant architecture with less parameters allowing 1D confidence map prediction and slightly faster prediction time without loss of accuracy. Quantitative evaluation of our method on a dataset of 1006 3D CT images yields a median error of 1mm, similar to the inter-rater median error of 1mm obtained from two annotators, demonstrating the effectiveness of our method in efficiently and accurately detecting the L3 slice. |
Tasks | |
Published | 2018-11-22 |
URL | http://arxiv.org/abs/1811.09244v1 |
http://arxiv.org/pdf/1811.09244v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-l3-slice-detection-in-3d-ct-images |
Repo | https://github.com/fk128/ct-slice-detection |
Framework | none |
An Analysis by Synthesis Approach for Automatic Vertebral Shape Identification in Clinical QCT
Title | An Analysis by Synthesis Approach for Automatic Vertebral Shape Identification in Clinical QCT |
Authors | Stefan Reinhold. Timo Damm, Lukas Huber, Reimer Andresen, Reinhard Barkmann, Claus-C. Glüer, Reinhard Koch |
Abstract | Quantitative computed tomography (QCT) is a widely used tool for osteoporosis diagnosis and monitoring. The assessment of cortical markers like cortical bone mineral density (BMD) and thickness is a demanding task, mainly because of the limited spatial resolution of QCT. We propose a direct model based method to automatically identify the surface through the center of the cortex of human vertebra. We develop a statistical bone model and analyze its probability distribution after the imaging process. Using an as-rigid-as-possible deformation we find the cortical surface that maximizes the likelihood of our model given the input volume. Using the European Spine Phantom (ESP) and a high resolution \mu CT scan of a cadaveric vertebra, we show that the proposed method is able to accurately identify the real center of cortex ex-vivo. To demonstrate the in-vivo applicability of our method we use manually obtained surfaces for comparison. |
Tasks | |
Published | 2018-12-03 |
URL | http://arxiv.org/abs/1812.00693v1 |
http://arxiv.org/pdf/1812.00693v1.pdf | |
PWC | https://paperswithcode.com/paper/an-analysis-by-synthesis-approach-for |
Repo | https://github.com/ithron/CortidQCT |
Framework | none |
A Novel Focal Tversky loss function with improved Attention U-Net for lesion segmentation
Title | A Novel Focal Tversky loss function with improved Attention U-Net for lesion segmentation |
Authors | Nabila Abraham, Naimul Mefraz Khan |
Abstract | We propose a generalized focal loss function based on the Tversky index to address the issue of data imbalance in medical image segmentation. Compared to the commonly used Dice loss, our loss function achieves a better trade off between precision and recall when training on small structures such as lesions. To evaluate our loss function, we improve the attention U-Net model by incorporating an image pyramid to preserve contextual features. We experiment on the BUS 2017 dataset and ISIC 2018 dataset where lesions occupy 4.84% and 21.4% of the images area and improve segmentation accuracy when compared to the standard U-Net by 25.7% and 3.6%, respectively. |
Tasks | Lesion Segmentation, Medical Image Segmentation, Semantic Segmentation |
Published | 2018-10-18 |
URL | http://arxiv.org/abs/1810.07842v1 |
http://arxiv.org/pdf/1810.07842v1.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-focal-tversky-loss-function-with |
Repo | https://github.com/nabsabraham/focal-tversky-unet |
Framework | tf |
Subword Semantic Hashing for Intent Classification on Small Datasets
Title | Subword Semantic Hashing for Intent Classification on Small Datasets |
Authors | Kumar Shridhar, Ayushman Dash, Amit Sahu, Gustav Grund Pihlgren, Pedro Alonso, Vinaychandran Pondenkandath, Gyorgy Kovacs, Foteini Simistira, Marcus Liwicki |
Abstract | In this paper, we introduce the use of Semantic Hashing as embedding for the task of Intent Classification and achieve state-of-the-art performance on three frequently used benchmarks. Intent Classification on a small dataset is a challenging task for data-hungry state-of-the-art Deep Learning based systems. Semantic Hashing is an attempt to overcome such a challenge and learn robust text classification. Current word embedding based are dependent on vocabularies. One of the major drawbacks of such methods is out-of-vocabulary terms, especially when having small training datasets and using a wider vocabulary. This is the case in Intent Classification for chatbots, where typically small datasets are extracted from internet communication. Two problems arise by the use of internet communication. First, such datasets miss a lot of terms in the vocabulary to use word embeddings efficiently. Second, users frequently make spelling errors. Typically, the models for intent classification are not trained with spelling errors and it is difficult to think about ways in which users will make mistakes. Models depending on a word vocabulary will always face such issues. An ideal classifier should handle spelling errors inherently. With Semantic Hashing, we overcome these challenges and achieve state-of-the-art results on three datasets: AskUbuntu, Chatbot, and Web Application. Our benchmarks are available online: https://github.com/kumar-shridhar/Know-Your-Intent |
Tasks | Chatbot, Intent Classification, Text Classification, Word Embeddings |
Published | 2018-10-16 |
URL | https://arxiv.org/abs/1810.07150v3 |
https://arxiv.org/pdf/1810.07150v3.pdf | |
PWC | https://paperswithcode.com/paper/subword-semantic-hashing-for-intent |
Repo | https://github.com/MJahangeerQureshi/Text-Classification |
Framework | none |
A Hierarchical Deep Architecture and Mini-Batch Selection Method For Joint Traffic Sign and Light Detection
Title | A Hierarchical Deep Architecture and Mini-Batch Selection Method For Joint Traffic Sign and Light Detection |
Authors | Alex D. Pon, Oles Andrienko, Ali Harakeh, Steven L. Waslander |
Abstract | Traffic light and sign detectors on autonomous cars are integral for road scene perception. The literature is abundant with deep learning networks that detect either lights or signs, not both, which makes them unsuitable for real-life deployment due to the limited graphics processing unit (GPU) memory and power available on embedded systems. The root cause of this issue is that no public dataset contains both traffic light and sign labels, which leads to difficulties in developing a joint detection framework. We present a deep hierarchical architecture in conjunction with a mini-batch proposal selection mechanism that allows a network to detect both traffic lights and signs from training on separate traffic light and sign datasets. Our method solves the overlapping issue where instances from one dataset are not labelled in the other dataset. We are the first to present a network that performs joint detection on traffic lights and signs. We measure our network on the Tsinghua-Tencent 100K benchmark for traffic sign detection and the Bosch Small Traffic Lights benchmark for traffic light detection and show it outperforms the existing Bosch Small Traffic light state-of-the-art method. We focus on autonomous car deployment and show our network is more suitable than others because of its low memory footprint and real-time image processing time. Qualitative results can be viewed at https://youtu.be/_YmogPzBXOw |
Tasks | Traffic Sign Recognition |
Published | 2018-06-20 |
URL | http://arxiv.org/abs/1806.07987v2 |
http://arxiv.org/pdf/1806.07987v2.pdf | |
PWC | https://paperswithcode.com/paper/a-hierarchical-deep-architecture-and-mini |
Repo | https://github.com/bosch-ros-pkg/bstld |
Framework | tf |
Neural Architecture Optimization
Title | Neural Architecture Optimization |
Authors | Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, Tie-Yan Liu |
Abstract | Automatic neural architecture design has shown its potential in discovering powerful neural network architectures. Existing methods, no matter based on reinforcement learning or evolutionary algorithms (EA), conduct architecture search in a discrete space, which is highly inefficient. In this paper, we propose a simple and efficient method to automatic neural architecture design based on continuous optimization. We call this new approach neural architecture optimization (NAO). There are three key components in our proposed approach: (1) An encoder embeds/maps neural network architectures into a continuous space. (2) A predictor takes the continuous representation of a network as input and predicts its accuracy. (3) A decoder maps a continuous representation of a network back to its architecture. The performance predictor and the encoder enable us to perform gradient based optimization in the continuous space to find the embedding of a new architecture with potentially better accuracy. Such a better embedding is then decoded to a network by the decoder. Experiments show that the architecture discovered by our method is very competitive for image classification task on CIFAR-10 and language modeling task on PTB, outperforming or on par with the best results of previous architecture search methods with a significantly reduction of computational resources. Specifically we obtain 1.93% test set error rate for CIFAR-10 image classification task and 56.0 test set perplexity of PTB language modeling task. Furthermore, combined with the recent proposed weight sharing mechanism, we discover powerful architecture on CIFAR-10 (with error rate 2.93%) and on PTB (with test set perplexity 56.6), with very limited computational resources (less than 10 GPU hours) for both tasks. |
Tasks | Image Classification, Language Modelling, Neural Architecture Search |
Published | 2018-08-22 |
URL | https://arxiv.org/abs/1808.07233v5 |
https://arxiv.org/pdf/1808.07233v5.pdf | |
PWC | https://paperswithcode.com/paper/neural-architecture-optimization |
Repo | https://github.com/dicarlolab/archconvnets |
Framework | none |
Intrusion Detection Using Mouse Dynamics
Title | Intrusion Detection Using Mouse Dynamics |
Authors | Margit Antal, Elod Egyed-Zsigmond |
Abstract | Compared to other behavioural biometrics, mouse dynamics is a less explored area. General purpose data sets containing unrestricted mouse usage data are usually not available. The Balabit data set was released in 2016 for a data science competition, which against the few subjects, can be considered the first adequate publicly available one. This paper presents a performance evaluation study on this data set for impostor detection. The existence of very short test sessions makes this data set challenging. Raw data were segmented into mouse move, point and click and drag and drop types of mouse actions, then several features were extracted. In contrast to keystroke dynamics, mouse data is not sensitive, therefore it is possible to collect negative mouse dynamics data and to use two-class classifiers for impostor detection. Both action- and set of actions-based evaluations were performed. Set of actions-based evaluation achieves 0.92 AUC on the test part of the data set. However, the same type of evaluation conducted on the training part of the data set resulted in maximal AUC (1) using only 13 actions. Drag and drop mouse actions proved to be the best actions for impostor detection. |
Tasks | Intrusion Detection |
Published | 2018-10-10 |
URL | http://arxiv.org/abs/1810.04668v1 |
http://arxiv.org/pdf/1810.04668v1.pdf | |
PWC | https://paperswithcode.com/paper/intrusion-detection-using-mouse-dynamics |
Repo | https://github.com/margitantal68/mouse_dynamics_balabit_chaoshen_dfl |
Framework | none |
SEGEN: Sample-Ensemble Genetic Evolutional Network Model
Title | SEGEN: Sample-Ensemble Genetic Evolutional Network Model |
Authors | Jiawei Zhang, Limeng Cui, Fisher B. Gouza |
Abstract | Deep learning, a rebranding of deep neural network research works, has achieved a remarkable success in recent years. With multiple hidden layers, deep learning models aim at computing the hierarchical feature representations of the observational data. Meanwhile, due to its severe disadvantages in data consumption, computational resources, parameter tuning costs and the lack of result explainability, deep learning has also suffered from lots of criticism. In this paper, we will introduce a new representation learning model, namely “Sample-Ensemble Genetic Evolutionary Network” (SEGEN), which can serve as an alternative approach to deep learning models. Instead of building one single deep model, based on a set of sampled sub-instances, SEGEN adopts a genetic-evolutionary learning strategy to build a group of unit models generations by generations. The unit models incorporated in SEGEN can be either traditional machine learning models or the recent deep learning models with a much “narrower” and “shallower” architecture. The learning results of each instance at the final generation will be effectively combined from each unit model via diffusive propagation and ensemble learning strategies. From the computational perspective, SEGEN requires far less data, fewer computational resources and parameter tuning efforts, but has sound theoretic interpretability of the learning process and results. Extensive experiments have been done on several different real-world benchmark datasets, and the experimental results obtained by SEGEN have demonstrated its advantages over the state-of-the-art representation learning models. |
Tasks | Representation Learning |
Published | 2018-03-23 |
URL | http://arxiv.org/abs/1803.08631v2 |
http://arxiv.org/pdf/1803.08631v2.pdf | |
PWC | https://paperswithcode.com/paper/segen-sample-ensemble-genetic-evolutional |
Repo | https://github.com/jwzhanggy/Graph-Bert |
Framework | pytorch |
Generalized Capsule Networks with Trainable Routing Procedure
Title | Generalized Capsule Networks with Trainable Routing Procedure |
Authors | Zhenhua Chen, David Crandall |
Abstract | CapsNet (Capsule Network) was first proposed by~\citet{capsule} and later another version of CapsNet was proposed by~\citet{emrouting}. CapsNet has been proved effective in modeling spatial features with much fewer parameters. However, the routing procedures in both papers are not well incorporated into the whole training process. The optimal number of routing procedure is misery which has to be found manually. To overcome this disadvantages of current routing procedures in CapsNet, we embed the routing procedure into the optimization procedure with all other parameters in neural networks, namely, make coupling coefficients in the routing procedure become completely trainable. We call it Generalized CapsNet (G-CapsNet). We implement both “full-connected” version of G-CapsNet and “convolutional” version of G-CapsNet. G-CapsNet achieves a similar performance in the dataset MNIST as in the original papers. We also test two capsule packing method (cross feature maps or with feature maps) from previous convolutional layers and see no evident difference. Besides, we also explored possibility of stacking multiple capsule layers. The code is shared on \hyperlink{https://github.com/chenzhenhua986/CAFFE-CapsNet}{CAFFE-CapsNet}. |
Tasks | |
Published | 2018-08-27 |
URL | http://arxiv.org/abs/1808.08692v1 |
http://arxiv.org/pdf/1808.08692v1.pdf | |
PWC | https://paperswithcode.com/paper/generalized-capsule-networks-with-trainable |
Repo | https://github.com/chenzhenhua986/CAFFE-CapsNet |
Framework | none |
Learning with Random Learning Rates
Title | Learning with Random Learning Rates |
Authors | Léonard Blier, Pierre Wolinski, Yann Ollivier |
Abstract | Hyperparameter tuning is a bothersome step in the training of deep learning models. One of the most sensitive hyperparameters is the learning rate of the gradient descent. We present the ‘All Learning Rates At Once’ (Alrao) optimization method for neural networks: each unit or feature in the network gets its own learning rate sampled from a random distribution spanning several orders of magnitude. This comes at practically no computational cost. Perhaps surprisingly, stochastic gradient descent (SGD) with Alrao performs close to SGD with an optimally tuned learning rate, for various architectures and problems. Alrao could save time when testing deep learning models: a range of models could be quickly assessed with Alrao, and the most promising models could then be trained more extensively. This text comes with a PyTorch implementation of the method, which can be plugged on an existing PyTorch model: https://github.com/leonardblier/alrao . |
Tasks | |
Published | 2018-10-02 |
URL | http://arxiv.org/abs/1810.01322v3 |
http://arxiv.org/pdf/1810.01322v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-with-random-learning-rates |
Repo | https://github.com/leonardblier/alrao |
Framework | pytorch |