October 21, 2019

2972 words 14 mins read

Paper Group AWR 10

Paper Group AWR 10

BCWS: Bilingual Contextual Word Similarity. Logit Pairing Methods Can Fool Gradient-Based Attacks. Hierarchical Attention-Based Recurrent Highway Networks for Time Series Prediction. A Statistical Recurrent Model on the Manifold of Symmetric Positive Definite Matrices. A Minimax Surrogate Loss Approach to Conditional Difference Estimation. Harmonio …

BCWS: Bilingual Contextual Word Similarity

Title BCWS: Bilingual Contextual Word Similarity
Authors Ta-Chung Chi, Ching-Yen Shih, Yun-Nung Chen
Abstract This paper introduces the first dataset for evaluating English-Chinese Bilingual Contextual Word Similarity, namely BCWS (https://github.com/MiuLab/BCWS). The dataset consists of 2,091 English-Chinese word pairs with the corresponding sentential contexts and their similarity scores annotated by the human. Our annotated dataset has higher consistency compared to other similar datasets. We establish several baselines for the bilingual embedding task to benchmark the experiments. Modeling cross-lingual sense representations as provided in this dataset has the potential of moving artificial intelligence from monolingual understanding towards multilingual understanding.
Tasks
Published 2018-10-21
URL http://arxiv.org/abs/1810.08951v1
PDF http://arxiv.org/pdf/1810.08951v1.pdf
PWC https://paperswithcode.com/paper/bcws-bilingual-contextual-word-similarity
Repo https://github.com/MiuLab/BCWS
Framework none

Logit Pairing Methods Can Fool Gradient-Based Attacks

Title Logit Pairing Methods Can Fool Gradient-Based Attacks
Authors Marius Mosbach, Maksym Andriushchenko, Thomas Trost, Matthias Hein, Dietrich Klakow
Abstract Recently, Kannan et al. [2018] proposed several logit regularization methods to improve the adversarial robustness of classifiers. We show that the computationally fast methods they propose - Clean Logit Pairing (CLP) and Logit Squeezing (LSQ) - just make the gradient-based optimization problem of crafting adversarial examples harder without providing actual robustness. We find that Adversarial Logit Pairing (ALP) may indeed provide robustness against adversarial examples, especially when combined with adversarial training, and we examine it in a variety of settings. However, the increase in adversarial accuracy is much smaller than previously claimed. Finally, our results suggest that the evaluation against an iterative PGD attack relies heavily on the parameters used and may result in false conclusions regarding robustness of a model.
Tasks
Published 2018-10-29
URL http://arxiv.org/abs/1810.12042v3
PDF http://arxiv.org/pdf/1810.12042v3.pdf
PWC https://paperswithcode.com/paper/logit-pairing-methods-can-fool-gradient-based
Repo https://github.com/uds-lsv/evaluating-logit-pairing-methods
Framework tf

Hierarchical Attention-Based Recurrent Highway Networks for Time Series Prediction

Title Hierarchical Attention-Based Recurrent Highway Networks for Time Series Prediction
Authors Yunzhe Tao, Lin Ma, Weizhong Zhang, Jian Liu, Wei Liu, Qiang Du
Abstract Time series prediction has been studied in a variety of domains. However, it is still challenging to predict future series given historical observations and past exogenous data. Existing methods either fail to consider the interactions among different components of exogenous variables which may affect the prediction accuracy, or cannot model the correlations between exogenous data and target data. Besides, the inherent temporal dynamics of exogenous data are also related to the target series prediction, and thus should be considered as well. To address these issues, we propose an end-to-end deep learning model, i.e., Hierarchical attention-based Recurrent Highway Network (HRHN), which incorporates spatio-temporal feature extraction of exogenous variables and temporal dynamics modeling of target variables into a single framework. Moreover, by introducing the hierarchical attention mechanism, HRHN can adaptively select the relevant exogenous features in different semantic levels. We carry out comprehensive empirical evaluations with various methods over several datasets, and show that HRHN outperforms the state of the arts in time series prediction, especially in capturing sudden changes and sudden oscillations of time series.
Tasks Time Series, Time Series Prediction
Published 2018-06-02
URL http://arxiv.org/abs/1806.00685v1
PDF http://arxiv.org/pdf/1806.00685v1.pdf
PWC https://paperswithcode.com/paper/hierarchical-attention-based-recurrent
Repo https://github.com/KurochkinAlexey/Hierarchical-Attention-Based-Recurrent-Highway-Networks-for-Time-Series-Prediction
Framework pytorch

A Statistical Recurrent Model on the Manifold of Symmetric Positive Definite Matrices

Title A Statistical Recurrent Model on the Manifold of Symmetric Positive Definite Matrices
Authors Rudrasis Chakraborty, Chun-Hao Yang, Xingjian Zhen, Monami Banerjee, Derek Archer, David Vaillancourt, Vikas Singh, Baba C. Vemuri
Abstract In a number of disciplines, the data (e.g., graphs, manifolds) to be analyzed are non-Euclidean in nature. Geometric deep learning corresponds to techniques that generalize deep neural network models to such non-Euclidean spaces. Several recent papers have shown how convolutional neural networks (CNNs) can be extended to learn with graph-based data. In this work, we study the setting where the data (or measurements) are ordered, longitudinal or temporal in nature and live on a Riemannian manifold – this setting is common in a variety of problems in statistical machine learning, vision and medical imaging. We show how recurrent statistical recurrent network models can be defined in such spaces. We give an efficient algorithm and conduct a rigorous analysis of its statistical properties. We perform extensive numerical experiments demonstrating competitive performance with state of the art methods but with significantly less number of parameters. We also show applications to a statistical analysis task in brain imaging, a regime where deep neural network models have only been utilized in limited ways.
Tasks
Published 2018-05-29
URL http://arxiv.org/abs/1805.11204v2
PDF http://arxiv.org/pdf/1805.11204v2.pdf
PWC https://paperswithcode.com/paper/a-statistical-recurrent-model-on-the-manifold
Repo https://github.com/zhenxingjian/SPD-SRU
Framework tf

A Minimax Surrogate Loss Approach to Conditional Difference Estimation

Title A Minimax Surrogate Loss Approach to Conditional Difference Estimation
Authors Siong Thye Goh, Cynthia Rudin
Abstract We present a new machine learning approach to estimate personalized treatment effects in the classical potential outcomes framework with binary outcomes. To overcome the problem that both treatment and control outcomes for the same unit are required for supervised learning, we propose surrogate loss functions that incorporate both treatment and control data. The new surrogates yield tighter bounds than the sum of losses for treatment and control groups. A specific choice of loss function, namely a type of hinge loss, yields a minimax support vector machine formulation. The resulting optimization problem requires the solution to only a single convex optimization problem, incorporating both treatment and control units, and it enables the kernel trick to be used to handle nonlinear (also non-parametric) estimation. Statistical learning bounds are also presented for the framework, and experimental results.
Tasks
Published 2018-03-10
URL http://arxiv.org/abs/1803.03769v2
PDF http://arxiv.org/pdf/1803.03769v2.pdf
PWC https://paperswithcode.com/paper/a-minimax-surrogate-loss-approach-to
Repo https://github.com/shangtai/githubcausalsvm
Framework none

Harmonious Attention Network for Person Re-Identification

Title Harmonious Attention Network for Person Re-Identification
Authors Wei Li, Xiatian Zhu, Shaogang Gong
Abstract Existing person re-identification (re-id) methods either assume the availability of well-aligned person bounding box images as model input or rely on constrained attention selection mechanisms to calibrate misaligned images. They are therefore sub-optimal for re-id matching in arbitrarily aligned person images potentially with large human pose variations and unconstrained auto-detection errors. In this work, we show the advantages of jointly learning attention selection and feature representation in a Convolutional Neural Network (CNN) by maximising the complementary information of different levels of visual attention subject to re-id discriminative learning constraints. Specifically, we formulate a novel Harmonious Attention CNN (HA-CNN) model for joint learning of soft pixel attention and hard regional attention along with simultaneous optimisation of feature representations, dedicated to optimise person re-id in uncontrolled (misaligned) images. Extensive comparative evaluations validate the superiority of this new HA-CNN model for person re-id over a wide variety of state-of-the-art methods on three large-scale benchmarks including CUHK03, Market-1501, and DukeMTMC-ReID.
Tasks Person Re-Identification
Published 2018-02-22
URL http://arxiv.org/abs/1802.08122v1
PDF http://arxiv.org/pdf/1802.08122v1.pdf
PWC https://paperswithcode.com/paper/harmonious-attention-network-for-person-re
Repo https://github.com/milkplz/keras-frcnn
Framework tf

Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees

Title Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees
Authors L. Elisa Celis, Lingxiao Huang, Vijay Keswani, Nisheeth K. Vishnoi
Abstract Developing classification algorithms that are fair with respect to sensitive attributes of the data has become an important problem due to the growing deployment of classification algorithms in various social contexts. Several recent works have focused on fairness with respect to a specific metric, modeled the corresponding fair classification problem as a constrained optimization problem, and developed tailored algorithms to solve them. Despite this, there still remain important metrics for which we do not have fair classifiers and many of the aforementioned algorithms do not come with theoretical guarantees; perhaps because the resulting optimization problem is non-convex. The main contribution of this paper is a new meta-algorithm for classification that takes as input a large class of fairness constraints, with respect to multiple non-disjoint sensitive attributes, and which comes with provable guarantees. This is achieved by first developing a meta-algorithm for a large family of classification problems with convex constraints, and then showing that classification problems with general types of fairness constraints can be reduced to those in this family. We present empirical results that show that our algorithm can achieve near-perfect fairness with respect to various fairness metrics, and that the loss in accuracy due to the imposed fairness constraints is often small. Overall, this work unifies several prior works on fair classification, presents a practical algorithm with theoretical guarantees, and can handle fairness metrics that were previously not possible.
Tasks
Published 2018-06-15
URL http://arxiv.org/abs/1806.06055v2
PDF http://arxiv.org/pdf/1806.06055v2.pdf
PWC https://paperswithcode.com/paper/classification-with-fairness-constraints-a
Repo https://github.com/aif360-learn/aif360-learn
Framework tf

Micro-Attention for Micro-Expression recognition

Title Micro-Attention for Micro-Expression recognition
Authors Chongyang Wang, Min Peng, Tao Bi, Tong Chen
Abstract Micro-expression, for its high objectivity in emotion detection, has emerged to be a promising modality in affective computing. Recently, deep learning methods have been successfully introduced into the micro-expression recognition area. Whilst the higher recognition accuracy achieved, substantial challenges in micro-expression recognition remain. The existence of micro expression in small-local areas on face and limited size of available databases still constrain the recognition accuracy on such emotional facial behavior. In this work, to tackle such challenges, we propose a novel attention mechanism called micro-attention cooperating with residual network. Micro-attention enables the network to learn to focus on facial areas of interest covering different action units. Moreover, coping with small datasets, the micro-attention is designed without adding noticeable parameters while a simple yet efficient transfer learning approach is together utilized to alleviate the overfitting risk. With extensive experimental evaluations on three benchmarks (CASMEII, SAMM and SMIC) and post-hoc feature visualizations, we demonstrate the effectiveness of the proposed micro-attention and push the boundary of automatic recognition of micro-expression.
Tasks Transfer Learning
Published 2018-11-06
URL https://arxiv.org/abs/1811.02360v5
PDF https://arxiv.org/pdf/1811.02360v5.pdf
PWC https://paperswithcode.com/paper/micro-attention-for-micro-expression
Repo https://github.com/CodeShareBot/Micro-Attention-for-Micro-Expression
Framework none

Variational Memory Encoder-Decoder

Title Variational Memory Encoder-Decoder
Authors Hung Le, Truyen Tran, Thin Nguyen, Svetha Venkatesh
Abstract Introducing variability while maintaining coherence is a core task in learning to generate utterances in conversation. Standard neural encoder-decoder models and their extensions using conditional variational autoencoder often result in either trivial or digressive responses. To overcome this, we explore a novel approach that injects variability into neural encoder-decoder via the use of external memory as a mixture model, namely Variational Memory Encoder-Decoder (VMED). By associating each memory read with a mode in the latent mixture distribution at each timestep, our model can capture the variability observed in sequential data such as natural conversations. We empirically compare the proposed model against other recent approaches on various conversational datasets. The results show that VMED consistently achieves significant improvement over others in both metric-based and qualitative evaluations.
Tasks
Published 2018-07-26
URL http://arxiv.org/abs/1807.09950v2
PDF http://arxiv.org/pdf/1807.09950v2.pdf
PWC https://paperswithcode.com/paper/variational-memory-encoder-decoder
Repo https://github.com/thaihungle/VMED
Framework tf

Meta-Learning for Semi-Supervised Few-Shot Classification

Title Meta-Learning for Semi-Supervised Few-Shot Classification
Authors Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B. Tenenbaum, Hugo Larochelle, Richard S. Zemel
Abstract In few-shot classification, we are interested in learning algorithms that train a classifier from only a handful of labeled examples. Recent progress in few-shot classification has featured meta-learning, in which a parameterized model for a learning algorithm is defined and trained on episodes representing different classification problems, each with a small labeled training set and its corresponding test set. In this work, we advance this few-shot classification paradigm towards a scenario where unlabeled examples are also available within each episode. We consider two situations: one where all unlabeled examples are assumed to belong to the same set of classes as the labeled examples of the episode, as well as the more challenging situation where examples from other distractor classes are also provided. To address this paradigm, we propose novel extensions of Prototypical Networks (Snell et al., 2017) that are augmented with the ability to use unlabeled examples when producing prototypes. These models are trained in an end-to-end way on episodes, to learn to leverage the unlabeled examples successfully. We evaluate these methods on versions of the Omniglot and miniImageNet benchmarks, adapted to this new framework augmented with unlabeled examples. We also propose a new split of ImageNet, consisting of a large set of classes, with a hierarchical structure. Our experiments confirm that our Prototypical Networks can learn to improve their predictions due to unlabeled examples, much like a semi-supervised algorithm would.
Tasks Meta-Learning, Omniglot
Published 2018-03-02
URL http://arxiv.org/abs/1803.00676v1
PDF http://arxiv.org/pdf/1803.00676v1.pdf
PWC https://paperswithcode.com/paper/meta-learning-for-semi-supervised-few-shot
Repo https://github.com/y2l/meta-transfer-learning-pytorch
Framework pytorch

A General Approach to Adding Differential Privacy to Iterative Training Procedures

Title A General Approach to Adding Differential Privacy to Iterative Training Procedures
Authors H. Brendan McMahan, Galen Andrew, Ulfar Erlingsson, Steve Chien, Ilya Mironov, Nicolas Papernot, Peter Kairouz
Abstract In this work we address the practical challenges of training machine learning models on privacy-sensitive datasets by introducing a modular approach that minimizes changes to training algorithms, provides a variety of configuration strategies for the privacy mechanism, and then isolates and simplifies the critical logic that computes the final privacy guarantees. A key challenge is that training algorithms often require estimating many different quantities (vectors) from the same set of examples — for example, gradients of different layers in a deep learning architecture, as well as metrics and batch normalization parameters. Each of these may have different properties like dimensionality, magnitude, and tolerance to noise. By extending previous work on the Moments Accountant for the subsampled Gaussian mechanism, we can provide privacy for such heterogeneous sets of vectors, while also structuring the approach to minimize software engineering challenges.
Tasks
Published 2018-12-15
URL http://arxiv.org/abs/1812.06210v2
PDF http://arxiv.org/pdf/1812.06210v2.pdf
PWC https://paperswithcode.com/paper/a-general-approach-to-adding-differential
Repo https://github.com/facebookresearch/pytorch-dp
Framework pytorch

Monte Carlo Methods for the Game Kingdomino

Title Monte Carlo Methods for the Game Kingdomino
Authors Magnus Gedda, Mikael Z. Lagerkvist, Martin Butler
Abstract Kingdomino is introduced as an interesting game for studying game playing: the game is multiplayer (4 independent players per game); it has a limited game depth (13 moves per player); and it has limited but not insignificant interaction among players. Several strategies based on locally greedy players, Monte Carlo Evaluation (MCE), and Monte Carlo Tree Search (MCTS) are presented with variants. We examine a variation of UCT called progressive win bias and a playout policy (Player-greedy) focused on selecting good moves for the player. A thorough evaluation is done showing how the strategies perform and how to choose parameters given specific time constraints. The evaluation shows that surprisingly MCE is stronger than MCTS for a game like Kingdomino. All experiments use a cloud-native design, with a game server in a Docker container, and agents communicating using a REST-style JSON protocol. This enables a multi-language approach to separating the game state, the strategy implementations, and the coordination layer.
Tasks
Published 2018-07-12
URL http://arxiv.org/abs/1807.04458v2
PDF http://arxiv.org/pdf/1807.04458v2.pdf
PWC https://paperswithcode.com/paper/monte-carlo-methods-for-the-game-kingdomino
Repo https://github.com/mgedda/kdom-ai
Framework none

Finding Influential Training Samples for Gradient Boosted Decision Trees

Title Finding Influential Training Samples for Gradient Boosted Decision Trees
Authors Boris Sharchilev, Yury Ustinovsky, Pavel Serdyukov, Maarten de Rijke
Abstract We address the problem of finding influential training samples for a particular case of tree ensemble-based models, e.g., Random Forest (RF) or Gradient Boosted Decision Trees (GBDT). A natural way of formalizing this problem is studying how the model’s predictions change upon leave-one-out retraining, leaving out each individual training sample. Recent work has shown that, for parametric models, this analysis can be conducted in a computationally efficient way. We propose several ways of extending this framework to non-parametric GBDT ensembles under the assumption that tree structures remain fixed. Furthermore, we introduce a general scheme of obtaining further approximations to our method that balance the trade-off between performance and computational complexity. We evaluate our approaches on various experimental setups and use-case scenarios and demonstrate both the quality of our approach to finding influential training samples in comparison to the baselines and its computational efficiency.
Tasks
Published 2018-02-19
URL http://arxiv.org/abs/1802.06640v2
PDF http://arxiv.org/pdf/1802.06640v2.pdf
PWC https://paperswithcode.com/paper/finding-influential-training-samples-for
Repo https://github.com/bsharchilev/influence_boosting
Framework tf

BAM: Bottleneck Attention Module

Title BAM: Bottleneck Attention Module
Authors Jongchan Park, Sanghyun Woo, Joon-Young Lee, In So Kweon
Abstract Recent advances in deep neural networks have been developed via architecture search for stronger representational power. In this work, we focus on the effect of attention in general deep neural networks. We propose a simple and effective attention module, named Bottleneck Attention Module (BAM), that can be integrated with any feed-forward convolutional neural networks. Our module infers an attention map along two separate pathways, channel and spatial. We place our module at each bottleneck of models where the downsampling of feature maps occurs. Our module constructs a hierarchical attention at bottlenecks with a number of parameters and it is trainable in an end-to-end manner jointly with any feed-forward models. We validate our BAM through extensive experiments on CIFAR-100, ImageNet-1K, VOC 2007 and MS COCO benchmarks. Our experiments show consistent improvement in classification and detection performances with various models, demonstrating the wide applicability of BAM. The code and models will be publicly available.
Tasks Neural Architecture Search
Published 2018-07-17
URL http://arxiv.org/abs/1807.06514v2
PDF http://arxiv.org/pdf/1807.06514v2.pdf
PWC https://paperswithcode.com/paper/bam-bottleneck-attention-module
Repo https://github.com/gan3sh500/custom-pooling
Framework pytorch

3D Context Enhanced Region-based Convolutional Neural Network for End-to-End Lesion Detection

Title 3D Context Enhanced Region-based Convolutional Neural Network for End-to-End Lesion Detection
Authors Ke Yan, Mohammadhadi Bagheri, Ronald M. Summers
Abstract Detecting lesions from computed tomography (CT) scans is an important but difficult problem because non-lesions and true lesions can appear similar. 3D context is known to be helpful in this differentiation task. However, existing end-to-end detection frameworks of convolutional neural networks (CNNs) are mostly designed for 2D images. In this paper, we propose 3D context enhanced region-based CNN (3DCE) to incorporate 3D context information efficiently by aggregating feature maps of 2D images. 3DCE is easy to train and end-to-end in training and inference. A universal lesion detector is developed to detect all kinds of lesions in one algorithm using the DeepLesion dataset. Experimental results on this challenging task prove the effectiveness of 3DCE. We have released the code of 3DCE in https://github.com/rsummers11/CADLab/tree/master/lesion_detector_3DCE.
Tasks Computed Tomography (CT)
Published 2018-06-25
URL http://arxiv.org/abs/1806.09648v2
PDF http://arxiv.org/pdf/1806.09648v2.pdf
PWC https://paperswithcode.com/paper/3d-context-enhanced-region-based
Repo https://github.com/fsafe/Capstone
Framework pytorch
comments powered by Disqus