Paper Group AWR 10
BCWS: Bilingual Contextual Word Similarity. Logit Pairing Methods Can Fool Gradient-Based Attacks. Hierarchical Attention-Based Recurrent Highway Networks for Time Series Prediction. A Statistical Recurrent Model on the Manifold of Symmetric Positive Definite Matrices. A Minimax Surrogate Loss Approach to Conditional Difference Estimation. Harmonio …
BCWS: Bilingual Contextual Word Similarity
Title | BCWS: Bilingual Contextual Word Similarity |
Authors | Ta-Chung Chi, Ching-Yen Shih, Yun-Nung Chen |
Abstract | This paper introduces the first dataset for evaluating English-Chinese Bilingual Contextual Word Similarity, namely BCWS (https://github.com/MiuLab/BCWS). The dataset consists of 2,091 English-Chinese word pairs with the corresponding sentential contexts and their similarity scores annotated by the human. Our annotated dataset has higher consistency compared to other similar datasets. We establish several baselines for the bilingual embedding task to benchmark the experiments. Modeling cross-lingual sense representations as provided in this dataset has the potential of moving artificial intelligence from monolingual understanding towards multilingual understanding. |
Tasks | |
Published | 2018-10-21 |
URL | http://arxiv.org/abs/1810.08951v1 |
http://arxiv.org/pdf/1810.08951v1.pdf | |
PWC | https://paperswithcode.com/paper/bcws-bilingual-contextual-word-similarity |
Repo | https://github.com/MiuLab/BCWS |
Framework | none |
Logit Pairing Methods Can Fool Gradient-Based Attacks
Title | Logit Pairing Methods Can Fool Gradient-Based Attacks |
Authors | Marius Mosbach, Maksym Andriushchenko, Thomas Trost, Matthias Hein, Dietrich Klakow |
Abstract | Recently, Kannan et al. [2018] proposed several logit regularization methods to improve the adversarial robustness of classifiers. We show that the computationally fast methods they propose - Clean Logit Pairing (CLP) and Logit Squeezing (LSQ) - just make the gradient-based optimization problem of crafting adversarial examples harder without providing actual robustness. We find that Adversarial Logit Pairing (ALP) may indeed provide robustness against adversarial examples, especially when combined with adversarial training, and we examine it in a variety of settings. However, the increase in adversarial accuracy is much smaller than previously claimed. Finally, our results suggest that the evaluation against an iterative PGD attack relies heavily on the parameters used and may result in false conclusions regarding robustness of a model. |
Tasks | |
Published | 2018-10-29 |
URL | http://arxiv.org/abs/1810.12042v3 |
http://arxiv.org/pdf/1810.12042v3.pdf | |
PWC | https://paperswithcode.com/paper/logit-pairing-methods-can-fool-gradient-based |
Repo | https://github.com/uds-lsv/evaluating-logit-pairing-methods |
Framework | tf |
Hierarchical Attention-Based Recurrent Highway Networks for Time Series Prediction
Title | Hierarchical Attention-Based Recurrent Highway Networks for Time Series Prediction |
Authors | Yunzhe Tao, Lin Ma, Weizhong Zhang, Jian Liu, Wei Liu, Qiang Du |
Abstract | Time series prediction has been studied in a variety of domains. However, it is still challenging to predict future series given historical observations and past exogenous data. Existing methods either fail to consider the interactions among different components of exogenous variables which may affect the prediction accuracy, or cannot model the correlations between exogenous data and target data. Besides, the inherent temporal dynamics of exogenous data are also related to the target series prediction, and thus should be considered as well. To address these issues, we propose an end-to-end deep learning model, i.e., Hierarchical attention-based Recurrent Highway Network (HRHN), which incorporates spatio-temporal feature extraction of exogenous variables and temporal dynamics modeling of target variables into a single framework. Moreover, by introducing the hierarchical attention mechanism, HRHN can adaptively select the relevant exogenous features in different semantic levels. We carry out comprehensive empirical evaluations with various methods over several datasets, and show that HRHN outperforms the state of the arts in time series prediction, especially in capturing sudden changes and sudden oscillations of time series. |
Tasks | Time Series, Time Series Prediction |
Published | 2018-06-02 |
URL | http://arxiv.org/abs/1806.00685v1 |
http://arxiv.org/pdf/1806.00685v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-attention-based-recurrent |
Repo | https://github.com/KurochkinAlexey/Hierarchical-Attention-Based-Recurrent-Highway-Networks-for-Time-Series-Prediction |
Framework | pytorch |
A Statistical Recurrent Model on the Manifold of Symmetric Positive Definite Matrices
Title | A Statistical Recurrent Model on the Manifold of Symmetric Positive Definite Matrices |
Authors | Rudrasis Chakraborty, Chun-Hao Yang, Xingjian Zhen, Monami Banerjee, Derek Archer, David Vaillancourt, Vikas Singh, Baba C. Vemuri |
Abstract | In a number of disciplines, the data (e.g., graphs, manifolds) to be analyzed are non-Euclidean in nature. Geometric deep learning corresponds to techniques that generalize deep neural network models to such non-Euclidean spaces. Several recent papers have shown how convolutional neural networks (CNNs) can be extended to learn with graph-based data. In this work, we study the setting where the data (or measurements) are ordered, longitudinal or temporal in nature and live on a Riemannian manifold – this setting is common in a variety of problems in statistical machine learning, vision and medical imaging. We show how recurrent statistical recurrent network models can be defined in such spaces. We give an efficient algorithm and conduct a rigorous analysis of its statistical properties. We perform extensive numerical experiments demonstrating competitive performance with state of the art methods but with significantly less number of parameters. We also show applications to a statistical analysis task in brain imaging, a regime where deep neural network models have only been utilized in limited ways. |
Tasks | |
Published | 2018-05-29 |
URL | http://arxiv.org/abs/1805.11204v2 |
http://arxiv.org/pdf/1805.11204v2.pdf | |
PWC | https://paperswithcode.com/paper/a-statistical-recurrent-model-on-the-manifold |
Repo | https://github.com/zhenxingjian/SPD-SRU |
Framework | tf |
A Minimax Surrogate Loss Approach to Conditional Difference Estimation
Title | A Minimax Surrogate Loss Approach to Conditional Difference Estimation |
Authors | Siong Thye Goh, Cynthia Rudin |
Abstract | We present a new machine learning approach to estimate personalized treatment effects in the classical potential outcomes framework with binary outcomes. To overcome the problem that both treatment and control outcomes for the same unit are required for supervised learning, we propose surrogate loss functions that incorporate both treatment and control data. The new surrogates yield tighter bounds than the sum of losses for treatment and control groups. A specific choice of loss function, namely a type of hinge loss, yields a minimax support vector machine formulation. The resulting optimization problem requires the solution to only a single convex optimization problem, incorporating both treatment and control units, and it enables the kernel trick to be used to handle nonlinear (also non-parametric) estimation. Statistical learning bounds are also presented for the framework, and experimental results. |
Tasks | |
Published | 2018-03-10 |
URL | http://arxiv.org/abs/1803.03769v2 |
http://arxiv.org/pdf/1803.03769v2.pdf | |
PWC | https://paperswithcode.com/paper/a-minimax-surrogate-loss-approach-to |
Repo | https://github.com/shangtai/githubcausalsvm |
Framework | none |
Harmonious Attention Network for Person Re-Identification
Title | Harmonious Attention Network for Person Re-Identification |
Authors | Wei Li, Xiatian Zhu, Shaogang Gong |
Abstract | Existing person re-identification (re-id) methods either assume the availability of well-aligned person bounding box images as model input or rely on constrained attention selection mechanisms to calibrate misaligned images. They are therefore sub-optimal for re-id matching in arbitrarily aligned person images potentially with large human pose variations and unconstrained auto-detection errors. In this work, we show the advantages of jointly learning attention selection and feature representation in a Convolutional Neural Network (CNN) by maximising the complementary information of different levels of visual attention subject to re-id discriminative learning constraints. Specifically, we formulate a novel Harmonious Attention CNN (HA-CNN) model for joint learning of soft pixel attention and hard regional attention along with simultaneous optimisation of feature representations, dedicated to optimise person re-id in uncontrolled (misaligned) images. Extensive comparative evaluations validate the superiority of this new HA-CNN model for person re-id over a wide variety of state-of-the-art methods on three large-scale benchmarks including CUHK03, Market-1501, and DukeMTMC-ReID. |
Tasks | Person Re-Identification |
Published | 2018-02-22 |
URL | http://arxiv.org/abs/1802.08122v1 |
http://arxiv.org/pdf/1802.08122v1.pdf | |
PWC | https://paperswithcode.com/paper/harmonious-attention-network-for-person-re |
Repo | https://github.com/milkplz/keras-frcnn |
Framework | tf |
Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees
Title | Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees |
Authors | L. Elisa Celis, Lingxiao Huang, Vijay Keswani, Nisheeth K. Vishnoi |
Abstract | Developing classification algorithms that are fair with respect to sensitive attributes of the data has become an important problem due to the growing deployment of classification algorithms in various social contexts. Several recent works have focused on fairness with respect to a specific metric, modeled the corresponding fair classification problem as a constrained optimization problem, and developed tailored algorithms to solve them. Despite this, there still remain important metrics for which we do not have fair classifiers and many of the aforementioned algorithms do not come with theoretical guarantees; perhaps because the resulting optimization problem is non-convex. The main contribution of this paper is a new meta-algorithm for classification that takes as input a large class of fairness constraints, with respect to multiple non-disjoint sensitive attributes, and which comes with provable guarantees. This is achieved by first developing a meta-algorithm for a large family of classification problems with convex constraints, and then showing that classification problems with general types of fairness constraints can be reduced to those in this family. We present empirical results that show that our algorithm can achieve near-perfect fairness with respect to various fairness metrics, and that the loss in accuracy due to the imposed fairness constraints is often small. Overall, this work unifies several prior works on fair classification, presents a practical algorithm with theoretical guarantees, and can handle fairness metrics that were previously not possible. |
Tasks | |
Published | 2018-06-15 |
URL | http://arxiv.org/abs/1806.06055v2 |
http://arxiv.org/pdf/1806.06055v2.pdf | |
PWC | https://paperswithcode.com/paper/classification-with-fairness-constraints-a |
Repo | https://github.com/aif360-learn/aif360-learn |
Framework | tf |
Micro-Attention for Micro-Expression recognition
Title | Micro-Attention for Micro-Expression recognition |
Authors | Chongyang Wang, Min Peng, Tao Bi, Tong Chen |
Abstract | Micro-expression, for its high objectivity in emotion detection, has emerged to be a promising modality in affective computing. Recently, deep learning methods have been successfully introduced into the micro-expression recognition area. Whilst the higher recognition accuracy achieved, substantial challenges in micro-expression recognition remain. The existence of micro expression in small-local areas on face and limited size of available databases still constrain the recognition accuracy on such emotional facial behavior. In this work, to tackle such challenges, we propose a novel attention mechanism called micro-attention cooperating with residual network. Micro-attention enables the network to learn to focus on facial areas of interest covering different action units. Moreover, coping with small datasets, the micro-attention is designed without adding noticeable parameters while a simple yet efficient transfer learning approach is together utilized to alleviate the overfitting risk. With extensive experimental evaluations on three benchmarks (CASMEII, SAMM and SMIC) and post-hoc feature visualizations, we demonstrate the effectiveness of the proposed micro-attention and push the boundary of automatic recognition of micro-expression. |
Tasks | Transfer Learning |
Published | 2018-11-06 |
URL | https://arxiv.org/abs/1811.02360v5 |
https://arxiv.org/pdf/1811.02360v5.pdf | |
PWC | https://paperswithcode.com/paper/micro-attention-for-micro-expression |
Repo | https://github.com/CodeShareBot/Micro-Attention-for-Micro-Expression |
Framework | none |
Variational Memory Encoder-Decoder
Title | Variational Memory Encoder-Decoder |
Authors | Hung Le, Truyen Tran, Thin Nguyen, Svetha Venkatesh |
Abstract | Introducing variability while maintaining coherence is a core task in learning to generate utterances in conversation. Standard neural encoder-decoder models and their extensions using conditional variational autoencoder often result in either trivial or digressive responses. To overcome this, we explore a novel approach that injects variability into neural encoder-decoder via the use of external memory as a mixture model, namely Variational Memory Encoder-Decoder (VMED). By associating each memory read with a mode in the latent mixture distribution at each timestep, our model can capture the variability observed in sequential data such as natural conversations. We empirically compare the proposed model against other recent approaches on various conversational datasets. The results show that VMED consistently achieves significant improvement over others in both metric-based and qualitative evaluations. |
Tasks | |
Published | 2018-07-26 |
URL | http://arxiv.org/abs/1807.09950v2 |
http://arxiv.org/pdf/1807.09950v2.pdf | |
PWC | https://paperswithcode.com/paper/variational-memory-encoder-decoder |
Repo | https://github.com/thaihungle/VMED |
Framework | tf |
Meta-Learning for Semi-Supervised Few-Shot Classification
Title | Meta-Learning for Semi-Supervised Few-Shot Classification |
Authors | Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Joshua B. Tenenbaum, Hugo Larochelle, Richard S. Zemel |
Abstract | In few-shot classification, we are interested in learning algorithms that train a classifier from only a handful of labeled examples. Recent progress in few-shot classification has featured meta-learning, in which a parameterized model for a learning algorithm is defined and trained on episodes representing different classification problems, each with a small labeled training set and its corresponding test set. In this work, we advance this few-shot classification paradigm towards a scenario where unlabeled examples are also available within each episode. We consider two situations: one where all unlabeled examples are assumed to belong to the same set of classes as the labeled examples of the episode, as well as the more challenging situation where examples from other distractor classes are also provided. To address this paradigm, we propose novel extensions of Prototypical Networks (Snell et al., 2017) that are augmented with the ability to use unlabeled examples when producing prototypes. These models are trained in an end-to-end way on episodes, to learn to leverage the unlabeled examples successfully. We evaluate these methods on versions of the Omniglot and miniImageNet benchmarks, adapted to this new framework augmented with unlabeled examples. We also propose a new split of ImageNet, consisting of a large set of classes, with a hierarchical structure. Our experiments confirm that our Prototypical Networks can learn to improve their predictions due to unlabeled examples, much like a semi-supervised algorithm would. |
Tasks | Meta-Learning, Omniglot |
Published | 2018-03-02 |
URL | http://arxiv.org/abs/1803.00676v1 |
http://arxiv.org/pdf/1803.00676v1.pdf | |
PWC | https://paperswithcode.com/paper/meta-learning-for-semi-supervised-few-shot |
Repo | https://github.com/y2l/meta-transfer-learning-pytorch |
Framework | pytorch |
A General Approach to Adding Differential Privacy to Iterative Training Procedures
Title | A General Approach to Adding Differential Privacy to Iterative Training Procedures |
Authors | H. Brendan McMahan, Galen Andrew, Ulfar Erlingsson, Steve Chien, Ilya Mironov, Nicolas Papernot, Peter Kairouz |
Abstract | In this work we address the practical challenges of training machine learning models on privacy-sensitive datasets by introducing a modular approach that minimizes changes to training algorithms, provides a variety of configuration strategies for the privacy mechanism, and then isolates and simplifies the critical logic that computes the final privacy guarantees. A key challenge is that training algorithms often require estimating many different quantities (vectors) from the same set of examples — for example, gradients of different layers in a deep learning architecture, as well as metrics and batch normalization parameters. Each of these may have different properties like dimensionality, magnitude, and tolerance to noise. By extending previous work on the Moments Accountant for the subsampled Gaussian mechanism, we can provide privacy for such heterogeneous sets of vectors, while also structuring the approach to minimize software engineering challenges. |
Tasks | |
Published | 2018-12-15 |
URL | http://arxiv.org/abs/1812.06210v2 |
http://arxiv.org/pdf/1812.06210v2.pdf | |
PWC | https://paperswithcode.com/paper/a-general-approach-to-adding-differential |
Repo | https://github.com/facebookresearch/pytorch-dp |
Framework | pytorch |
Monte Carlo Methods for the Game Kingdomino
Title | Monte Carlo Methods for the Game Kingdomino |
Authors | Magnus Gedda, Mikael Z. Lagerkvist, Martin Butler |
Abstract | Kingdomino is introduced as an interesting game for studying game playing: the game is multiplayer (4 independent players per game); it has a limited game depth (13 moves per player); and it has limited but not insignificant interaction among players. Several strategies based on locally greedy players, Monte Carlo Evaluation (MCE), and Monte Carlo Tree Search (MCTS) are presented with variants. We examine a variation of UCT called progressive win bias and a playout policy (Player-greedy) focused on selecting good moves for the player. A thorough evaluation is done showing how the strategies perform and how to choose parameters given specific time constraints. The evaluation shows that surprisingly MCE is stronger than MCTS for a game like Kingdomino. All experiments use a cloud-native design, with a game server in a Docker container, and agents communicating using a REST-style JSON protocol. This enables a multi-language approach to separating the game state, the strategy implementations, and the coordination layer. |
Tasks | |
Published | 2018-07-12 |
URL | http://arxiv.org/abs/1807.04458v2 |
http://arxiv.org/pdf/1807.04458v2.pdf | |
PWC | https://paperswithcode.com/paper/monte-carlo-methods-for-the-game-kingdomino |
Repo | https://github.com/mgedda/kdom-ai |
Framework | none |
Finding Influential Training Samples for Gradient Boosted Decision Trees
Title | Finding Influential Training Samples for Gradient Boosted Decision Trees |
Authors | Boris Sharchilev, Yury Ustinovsky, Pavel Serdyukov, Maarten de Rijke |
Abstract | We address the problem of finding influential training samples for a particular case of tree ensemble-based models, e.g., Random Forest (RF) or Gradient Boosted Decision Trees (GBDT). A natural way of formalizing this problem is studying how the model’s predictions change upon leave-one-out retraining, leaving out each individual training sample. Recent work has shown that, for parametric models, this analysis can be conducted in a computationally efficient way. We propose several ways of extending this framework to non-parametric GBDT ensembles under the assumption that tree structures remain fixed. Furthermore, we introduce a general scheme of obtaining further approximations to our method that balance the trade-off between performance and computational complexity. We evaluate our approaches on various experimental setups and use-case scenarios and demonstrate both the quality of our approach to finding influential training samples in comparison to the baselines and its computational efficiency. |
Tasks | |
Published | 2018-02-19 |
URL | http://arxiv.org/abs/1802.06640v2 |
http://arxiv.org/pdf/1802.06640v2.pdf | |
PWC | https://paperswithcode.com/paper/finding-influential-training-samples-for |
Repo | https://github.com/bsharchilev/influence_boosting |
Framework | tf |
BAM: Bottleneck Attention Module
Title | BAM: Bottleneck Attention Module |
Authors | Jongchan Park, Sanghyun Woo, Joon-Young Lee, In So Kweon |
Abstract | Recent advances in deep neural networks have been developed via architecture search for stronger representational power. In this work, we focus on the effect of attention in general deep neural networks. We propose a simple and effective attention module, named Bottleneck Attention Module (BAM), that can be integrated with any feed-forward convolutional neural networks. Our module infers an attention map along two separate pathways, channel and spatial. We place our module at each bottleneck of models where the downsampling of feature maps occurs. Our module constructs a hierarchical attention at bottlenecks with a number of parameters and it is trainable in an end-to-end manner jointly with any feed-forward models. We validate our BAM through extensive experiments on CIFAR-100, ImageNet-1K, VOC 2007 and MS COCO benchmarks. Our experiments show consistent improvement in classification and detection performances with various models, demonstrating the wide applicability of BAM. The code and models will be publicly available. |
Tasks | Neural Architecture Search |
Published | 2018-07-17 |
URL | http://arxiv.org/abs/1807.06514v2 |
http://arxiv.org/pdf/1807.06514v2.pdf | |
PWC | https://paperswithcode.com/paper/bam-bottleneck-attention-module |
Repo | https://github.com/gan3sh500/custom-pooling |
Framework | pytorch |
3D Context Enhanced Region-based Convolutional Neural Network for End-to-End Lesion Detection
Title | 3D Context Enhanced Region-based Convolutional Neural Network for End-to-End Lesion Detection |
Authors | Ke Yan, Mohammadhadi Bagheri, Ronald M. Summers |
Abstract | Detecting lesions from computed tomography (CT) scans is an important but difficult problem because non-lesions and true lesions can appear similar. 3D context is known to be helpful in this differentiation task. However, existing end-to-end detection frameworks of convolutional neural networks (CNNs) are mostly designed for 2D images. In this paper, we propose 3D context enhanced region-based CNN (3DCE) to incorporate 3D context information efficiently by aggregating feature maps of 2D images. 3DCE is easy to train and end-to-end in training and inference. A universal lesion detector is developed to detect all kinds of lesions in one algorithm using the DeepLesion dataset. Experimental results on this challenging task prove the effectiveness of 3DCE. We have released the code of 3DCE in https://github.com/rsummers11/CADLab/tree/master/lesion_detector_3DCE. |
Tasks | Computed Tomography (CT) |
Published | 2018-06-25 |
URL | http://arxiv.org/abs/1806.09648v2 |
http://arxiv.org/pdf/1806.09648v2.pdf | |
PWC | https://paperswithcode.com/paper/3d-context-enhanced-region-based |
Repo | https://github.com/fsafe/Capstone |
Framework | pytorch |