October 21, 2019

2972 words 14 mins read

Paper Group AWR 10

BCWS: Bilingual Contextual Word Similarity. Logit Pairing Methods Can Fool Gradient-Based Attacks. Hierarchical Attention-Based Recurrent Highway Networks for Time Series Prediction. A Statistical Recurrent Model on the Manifold of Symmetric Positive Definite Matrices. A Minimax Surrogate Loss Approach to Conditional Difference Estimation. Harmonio …

BCWS: Bilingual Contextual Word Similarity


Title	BCWS: Bilingual Contextual Word Similarity
Authors	Ta-Chung Chi, Ching-Yen Shih, Yun-Nung Chen
Abstract	This paper introduces the first dataset for evaluating English-Chinese Bilingual Contextual Word Similarity, namely BCWS (https://github.com/MiuLab/BCWS). The dataset consists of 2,091 English-Chinese word pairs with the corresponding sentential contexts and their similarity scores annotated by the human. Our annotated dataset has higher consistency compared to other similar datasets. We establish several baselines for the bilingual embedding task to benchmark the experiments. Modeling cross-lingual sense representations as provided in this dataset has the potential of moving artificial intelligence from monolingual understanding towards multilingual understanding.
Tasks
Published	2018-10-21
URL	http://arxiv.org/abs/1810.08951v1
PDF	http://arxiv.org/pdf/1810.08951v1.pdf
PWC	https://paperswithcode.com/paper/bcws-bilingual-contextual-word-similarity
Repo	https://github.com/MiuLab/BCWS
Framework	none

Logit Pairing Methods Can Fool Gradient-Based Attacks


Title	Logit Pairing Methods Can Fool Gradient-Based Attacks
Authors	Marius Mosbach, Maksym Andriushchenko, Thomas Trost, Matthias Hein, Dietrich Klakow
Abstract	Recently, Kannan et al. [2018] proposed several logit regularization methods to improve the adversarial robustness of classifiers. We show that the computationally fast methods they propose - Clean Logit Pairing (CLP) and Logit Squeezing (LSQ) - just make the gradient-based optimization problem of crafting adversarial examples harder without providing actual robustness. We find that Adversarial Logit Pairing (ALP) may indeed provide robustness against adversarial examples, especially when combined with adversarial training, and we examine it in a variety of settings. However, the increase in adversarial accuracy is much smaller than previously claimed. Finally, our results suggest that the evaluation against an iterative PGD attack relies heavily on the parameters used and may result in false conclusions regarding robustness of a model.
Tasks
Published	2018-10-29
URL	http://arxiv.org/abs/1810.12042v3
PDF	http://arxiv.org/pdf/1810.12042v3.pdf
PWC	https://paperswithcode.com/paper/logit-pairing-methods-can-fool-gradient-based
Repo	https://github.com/uds-lsv/evaluating-logit-pairing-methods
Framework	tf

Hierarchical Attention-Based Recurrent Highway Networks for Time Series Prediction


Title	Hierarchical Attention-Based Recurrent Highway Networks for Time Series Prediction
Authors	Yunzhe Tao, Lin Ma, Weizhong Zhang, Jian Liu, Wei Liu, Qiang Du
Abstract	Time series prediction has been studied in a variety of domains. However, it is still challenging to predict future series given historical observations and past exogenous data. Existing methods either fail to consider the interactions among different components of exogenous variables which may affect the prediction accuracy, or cannot model the correlations between exogenous data and target data. Besides, the inherent temporal dynamics of exogenous data are also related to the target series prediction, and thus should be considered as well. To address these issues, we propose an end-to-end deep learning model, i.e., Hierarchical attention-based Recurrent Highway Network (HRHN), which incorporates spatio-temporal feature extraction of exogenous variables and temporal dynamics modeling of target variables into a single framework. Moreover, by introducing the hierarchical attention mechanism, HRHN can adaptively select the relevant exogenous features in different semantic levels. We carry out comprehensive empirical evaluations with various methods over several datasets, and show that HRHN outperforms the state of the arts in time series prediction, especially in capturing sudden changes and sudden oscillations of time series.
Tasks	Time Series, Time Series Prediction
Published	2018-06-02
URL	http://arxiv.org/abs/1806.00685v1
PDF	http://arxiv.org/pdf/1806.00685v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-attention-based-recurrent
Repo	https://github.com/KurochkinAlexey/Hierarchical-Attention-Based-Recurrent-Highway-Networks-for-Time-Series-Prediction
Framework	pytorch

A Statistical Recurrent Model on the Manifold of Symmetric Positive Definite Matrices


Title	A Statistical Recurrent Model on the Manifold of Symmetric Positive Definite Matrices
Authors	Rudrasis Chakraborty, Chun-Hao Yang, Xingjian Zhen, Monami Banerjee, Derek Archer, David Vaillancourt, Vikas Singh, Baba C. Vemuri
Abstract	In a number of disciplines, the data (e.g., graphs, manifolds) to be analyzed are non-Euclidean in nature. Geometric deep learning corresponds to techniques that generalize deep neural network models to such non-Euclidean spaces. Several recent papers have shown how convolutional neural networks (CNNs) can be extended to learn with graph-based data. In this work, we study the setting where the data (or measurements) are ordered, longitudinal or temporal in nature and live on a Riemannian manifold – this setting is common in a variety of problems in statistical machine learning, vision and medical imaging. We show how recurrent statistical recurrent network models can be defined in such spaces. We give an efficient algorithm and conduct a rigorous analysis of its statistical properties. We perform extensive numerical experiments demonstrating competitive performance with state of the art methods but with significantly less number of parameters. We also show applications to a statistical analysis task in brain imaging, a regime where deep neural network models have only been utilized in limited ways.
Tasks
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11204v2
PDF	http://arxiv.org/pdf/1805.11204v2.pdf
PWC	https://paperswithcode.com/paper/a-statistical-recurrent-model-on-the-manifold
Repo	https://github.com/zhenxingjian/SPD-SRU
Framework	tf

A Minimax Surrogate Loss Approach to Conditional Difference Estimation


Title	A Minimax Surrogate Loss Approach to Conditional Difference Estimation
Authors	Siong Thye Goh, Cynthia Rudin
Abstract	We present a new machine learning approach to estimate personalized treatment effects in the classical potential outcomes framework with binary outcomes. To overcome the problem that both treatment and control outcomes for the same unit are required for supervised learning, we propose surrogate loss functions that incorporate both treatment and control data. The new surrogates yield tighter bounds than the sum of losses for treatment and control groups. A specific choice of loss function, namely a type of hinge loss, yields a minimax support vector machine formulation. The resulting optimization problem requires the solution to only a single convex optimization problem, incorporating both treatment and control units, and it enables the kernel trick to be used to handle nonlinear (also non-parametric) estimation. Statistical learning bounds are also presented for the framework, and experimental results.
Tasks
Published	2018-03-10
URL	http://arxiv.org/abs/1803.03769v2
PDF	http://arxiv.org/pdf/1803.03769v2.pdf
PWC	https://paperswithcode.com/paper/a-minimax-surrogate-loss-approach-to
Repo	https://github.com/shangtai/githubcausalsvm
Framework	none

Harmonious Attention Network for Person Re-Identification


Title	Harmonious Attention Network for Person Re-Identification
Authors	Wei Li, Xiatian Zhu, Shaogang Gong
Abstract	Existing person re-identification (re-id) methods either assume the availability of well-aligned person bounding box images as model input or rely on constrained attention selection mechanisms to calibrate misaligned images. They are therefore sub-optimal for re-id matching in arbitrarily aligned person images potentially with large human pose variations and unconstrained auto-detection errors. In this work, we show the advantages of jointly learning attention selection and feature representation in a Convolutional Neural Network (CNN) by maximising the complementary information of different levels of visual attention subject to re-id discriminative learning constraints. Specifically, we formulate a novel Harmonious Attention CNN (HA-CNN) model for joint learning of soft pixel attention and hard regional attention along with simultaneous optimisation of feature representations, dedicated to optimise person re-id in uncontrolled (misaligned) images. Extensive comparative evaluations validate the superiority of this new HA-CNN model for person re-id over a wide variety of state-of-the-art methods on three large-scale benchmarks including CUHK03, Market-1501, and DukeMTMC-ReID.
Tasks	Person Re-Identification
Published	2018-02-22
URL	http://arxiv.org/abs/1802.08122v1
PDF	http://arxiv.org/pdf/1802.08122v1.pdf
PWC	https://paperswithcode.com/paper/harmonious-attention-network-for-person-re
Repo	https://github.com/milkplz/keras-frcnn
Framework	tf

Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees


Title	Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees
Authors	L. Elisa Celis, Lingxiao Huang, Vijay Keswani, Nisheeth K. Vishnoi
Abstract	Developing classification algorithms that are fair with respect to sensitive attributes of the data has become an important problem due to the growing deployment of classification algorithms in various social contexts. Several recent works have focused on fairness with respect to a specific metric, modeled the corresponding fair classification problem as a constrained optimization problem, and developed tailored algorithms to solve them. Despite this, there still remain important metrics for which we do not have fair classifiers and many of the aforementioned algorithms do not come with theoretical guarantees; perhaps because the resulting optimization problem is non-convex. The main contribution of this paper is a new meta-algorithm for classification that takes as input a large class of fairness constraints, with respect to multiple non-disjoint sensitive attributes, and which comes with provable guarantees. This is achieved by first developing a meta-algorithm for a large family of classification problems with convex constraints, and then showing that classification problems with general types of fairness constraints can be reduced to those in this family. We present empirical results that show that our algorithm can achieve near-perfect fairness with respect to various fairness metrics, and that the loss in accuracy due to the imposed fairness constraints is often small. Overall, this work unifies several prior works on fair classification, presents a practical algorithm with theoretical guarantees, and can handle fairness metrics that were previously not possible.
Tasks
Published	2018-06-15
URL	http://arxiv.org/abs/1806.06055v2
PDF	http://arxiv.org/pdf/1806.06055v2.pdf
PWC	https://paperswithcode.com/paper/classification-with-fairness-constraints-a
Repo	https://github.com/aif360-learn/aif360-learn
Framework	tf

Micro-Attention for Micro-Expression recognition


Title	Micro-Attention for Micro-Expression recognition
Authors	Chongyang Wang, Min Peng, Tao Bi, Tong Chen
Abstract	Micro-expression, for its high objectivity in emotion detection, has emerged to be a promising modality in affective computing. Recently, deep learning methods have been successfully introduced into the micro-expression recognition area. Whilst the higher recognition accuracy achieved, substantial challenges in micro-expression recognition remain. The existence of micro expression in small-local areas on face and limited size of available databases still constrain the recognition accuracy on such emotional facial behavior. In this work, to tackle such challenges, we propose a novel attention mechanism called micro-attention cooperating with residual network. Micro-attention enables the network to learn to focus on facial areas of interest covering different action units. Moreover, coping with small datasets, the micro-attention is designed without adding noticeable parameters while a simple yet efficient transfer learning approach is together utilized to alleviate the overfitting risk. With extensive experimental evaluations on three benchmarks (CASMEII, SAMM and SMIC) and post-hoc feature visualizations, we demonstrate the effectiveness of the proposed micro-attention and push the boundary of automatic recognition of micro-expression.
Tasks	Transfer Learning
Published	2018-11-06
URL	https://arxiv.org/abs/1811.02360v5
PDF	https://arxiv.org/pdf/1811.02360v5.pdf
PWC	https://paperswithcode.com/paper/micro-attention-for-micro-expression
Repo	https://github.com/CodeShareBot/Micro-Attention-for-Micro-Expression
Framework	none

Variational Memory Encoder-Decoder


Title	Variational Memory Encoder-Decoder
Authors	Hung Le, Truyen Tran, Thin Nguyen, Svetha Venkatesh
Abstract	Introducing variability while maintaining coherence is a core task in learning to generate utterances in conversation. Standard neural encoder-decoder models and their extensions using conditional variational autoencoder often result in either trivial or digressive responses. To overcome this, we explore a novel approach that injects variability into neural encoder-decoder via the use of external memory as a mixture model, namely Variational Memory Encoder-Decoder (VMED). By associating each memory read with a mode in the latent mixture distribution at each timestep, our model can capture the variability observed in sequential data such as natural conversations. We empirically compare the proposed model against other recent approaches on various conversational datasets. The results show that VMED consistently achieves significant improvement over others in both metric-based and qualitative evaluations.
Tasks
Published	2018-07-26
URL	http://arxiv.org/abs/1807.09950v2
PDF	http://arxiv.org/pdf/1807.09950v2.pdf
PWC	https://paperswithcode.com/paper/variational-memory-encoder-decoder
Repo	https://github.com/thaihungle/VMED
Framework	tf

Meta-Learning for Semi-Supervised Few-Shot Classification


Title	Meta-Learning for Semi-Supervised Few-Shot Classification
Authors	Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B. Tenenbaum, Hugo Larochelle, Richard S. Zemel
Abstract	In few-shot classification, we are interested in learning algorithms that train a classifier from only a handful of labeled examples. Recent progress in few-shot classification has featured meta-learning, in which a parameterized model for a learning algorithm is defined and trained on episodes representing different classification problems, each with a small labeled training set and its corresponding test set. In this work, we advance this few-shot classification paradigm towards a scenario where unlabeled examples are also available within each episode. We consider two situations: one where all unlabeled examples are assumed to belong to the same set of classes as the labeled examples of the episode, as well as the more challenging situation where examples from other distractor classes are also provided. To address this paradigm, we propose novel extensions of Prototypical Networks (Snell et al., 2017) that are augmented with the ability to use unlabeled examples when producing prototypes. These models are trained in an end-to-end way on episodes, to learn to leverage the unlabeled examples successfully. We evaluate these methods on versions of the Omniglot and miniImageNet benchmarks, adapted to this new framework augmented with unlabeled examples. We also propose a new split of ImageNet, consisting of a large set of classes, with a hierarchical structure. Our experiments confirm that our Prototypical Networks can learn to improve their predictions due to unlabeled examples, much like a semi-supervised algorithm would.
Tasks	Meta-Learning, Omniglot
Published	2018-03-02
URL	http://arxiv.org/abs/1803.00676v1
PDF	http://arxiv.org/pdf/1803.00676v1.pdf
PWC	https://paperswithcode.com/paper/meta-learning-for-semi-supervised-few-shot
Repo	https://github.com/y2l/meta-transfer-learning-pytorch
Framework	pytorch

A General Approach to Adding Differential Privacy to Iterative Training Procedures


Title	A General Approach to Adding Differential Privacy to Iterative Training Procedures
Authors	H. Brendan McMahan, Galen Andrew, Ulfar Erlingsson, Steve Chien, Ilya Mironov, Nicolas Papernot, Peter Kairouz
Abstract	In this work we address the practical challenges of training machine learning models on privacy-sensitive datasets by introducing a modular approach that minimizes changes to training algorithms, provides a variety of configuration strategies for the privacy mechanism, and then isolates and simplifies the critical logic that computes the final privacy guarantees. A key challenge is that training algorithms often require estimating many different quantities (vectors) from the same set of examples — for example, gradients of different layers in a deep learning architecture, as well as metrics and batch normalization parameters. Each of these may have different properties like dimensionality, magnitude, and tolerance to noise. By extending previous work on the Moments Accountant for the subsampled Gaussian mechanism, we can provide privacy for such heterogeneous sets of vectors, while also structuring the approach to minimize software engineering challenges.
Tasks
Published	2018-12-15
URL	http://arxiv.org/abs/1812.06210v2
PDF	http://arxiv.org/pdf/1812.06210v2.pdf
PWC	https://paperswithcode.com/paper/a-general-approach-to-adding-differential
Repo	https://github.com/facebookresearch/pytorch-dp
Framework	pytorch

Monte Carlo Methods for the Game Kingdomino


Title	Monte Carlo Methods for the Game Kingdomino
Authors	Magnus Gedda, Mikael Z. Lagerkvist, Martin Butler
Abstract	Kingdomino is introduced as an interesting game for studying game playing: the game is multiplayer (4 independent players per game); it has a limited game depth (13 moves per player); and it has limited but not insignificant interaction among players. Several strategies based on locally greedy players, Monte Carlo Evaluation (MCE), and Monte Carlo Tree Search (MCTS) are presented with variants. We examine a variation of UCT called progressive win bias and a playout policy (Player-greedy) focused on selecting good moves for the player. A thorough evaluation is done showing how the strategies perform and how to choose parameters given specific time constraints. The evaluation shows that surprisingly MCE is stronger than MCTS for a game like Kingdomino. All experiments use a cloud-native design, with a game server in a Docker container, and agents communicating using a REST-style JSON protocol. This enables a multi-language approach to separating the game state, the strategy implementations, and the coordination layer.
Tasks
Published	2018-07-12
URL	http://arxiv.org/abs/1807.04458v2
PDF	http://arxiv.org/pdf/1807.04458v2.pdf
PWC	https://paperswithcode.com/paper/monte-carlo-methods-for-the-game-kingdomino
Repo	https://github.com/mgedda/kdom-ai
Framework	none

Finding Influential Training Samples for Gradient Boosted Decision Trees


Title	Finding Influential Training Samples for Gradient Boosted Decision Trees
Authors	Boris Sharchilev, Yury Ustinovsky, Pavel Serdyukov, Maarten de Rijke
Abstract	We address the problem of finding influential training samples for a particular case of tree ensemble-based models, e.g., Random Forest (RF) or Gradient Boosted Decision Trees (GBDT). A natural way of formalizing this problem is studying how the model’s predictions change upon leave-one-out retraining, leaving out each individual training sample. Recent work has shown that, for parametric models, this analysis can be conducted in a computationally efficient way. We propose several ways of extending this framework to non-parametric GBDT ensembles under the assumption that tree structures remain fixed. Furthermore, we introduce a general scheme of obtaining further approximations to our method that balance the trade-off between performance and computational complexity. We evaluate our approaches on various experimental setups and use-case scenarios and demonstrate both the quality of our approach to finding influential training samples in comparison to the baselines and its computational efficiency.
Tasks
Published	2018-02-19
URL	http://arxiv.org/abs/1802.06640v2
PDF	http://arxiv.org/pdf/1802.06640v2.pdf
PWC	https://paperswithcode.com/paper/finding-influential-training-samples-for
Repo	https://github.com/bsharchilev/influence_boosting
Framework	tf

BAM: Bottleneck Attention Module


Title	BAM: Bottleneck Attention Module
Authors	Jongchan Park, Sanghyun Woo, Joon-Young Lee, In So Kweon
Abstract	Recent advances in deep neural networks have been developed via architecture search for stronger representational power. In this work, we focus on the effect of attention in general deep neural networks. We propose a simple and effective attention module, named Bottleneck Attention Module (BAM), that can be integrated with any feed-forward convolutional neural networks. Our module infers an attention map along two separate pathways, channel and spatial. We place our module at each bottleneck of models where the downsampling of feature maps occurs. Our module constructs a hierarchical attention at bottlenecks with a number of parameters and it is trainable in an end-to-end manner jointly with any feed-forward models. We validate our BAM through extensive experiments on CIFAR-100, ImageNet-1K, VOC 2007 and MS COCO benchmarks. Our experiments show consistent improvement in classification and detection performances with various models, demonstrating the wide applicability of BAM. The code and models will be publicly available.
Tasks	Neural Architecture Search
Published	2018-07-17
URL	http://arxiv.org/abs/1807.06514v2
PDF	http://arxiv.org/pdf/1807.06514v2.pdf
PWC	https://paperswithcode.com/paper/bam-bottleneck-attention-module
Repo	https://github.com/gan3sh500/custom-pooling
Framework	pytorch

3D Context Enhanced Region-based Convolutional Neural Network for End-to-End Lesion Detection


Title	3D Context Enhanced Region-based Convolutional Neural Network for End-to-End Lesion Detection
Authors	Ke Yan, Mohammadhadi Bagheri, Ronald M. Summers
Abstract	Detecting lesions from computed tomography (CT) scans is an important but difficult problem because non-lesions and true lesions can appear similar. 3D context is known to be helpful in this differentiation task. However, existing end-to-end detection frameworks of convolutional neural networks (CNNs) are mostly designed for 2D images. In this paper, we propose 3D context enhanced region-based CNN (3DCE) to incorporate 3D context information efficiently by aggregating feature maps of 2D images. 3DCE is easy to train and end-to-end in training and inference. A universal lesion detector is developed to detect all kinds of lesions in one algorithm using the DeepLesion dataset. Experimental results on this challenging task prove the effectiveness of 3DCE. We have released the code of 3DCE in https://github.com/rsummers11/CADLab/tree/master/lesion_detector_3DCE.
Tasks	Computed Tomography (CT)
Published	2018-06-25
URL	http://arxiv.org/abs/1806.09648v2
PDF	http://arxiv.org/pdf/1806.09648v2.pdf
PWC	https://paperswithcode.com/paper/3d-context-enhanced-region-based
Repo	https://github.com/fsafe/Capstone
Framework	pytorch