October 21, 2019

3588 words 17 mins read

Paper Group AWR 33

Paper Group AWR 33

A comparison of methods for model selection when estimating individual treatment effects. AllenNLP: A Deep Semantic Natural Language Processing Platform. Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality. Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior. Simplicial Closure and higher-order …

A comparison of methods for model selection when estimating individual treatment effects

Title A comparison of methods for model selection when estimating individual treatment effects
Authors Alejandro Schuler, Michael Baiocchi, Robert Tibshirani, Nigam Shah
Abstract Practitioners in medicine, business, political science, and other fields are increasingly aware that decisions should be personalized to each patient, customer, or voter. A given treatment (e.g. a drug or advertisement) should be administered only to those who will respond most positively, and certainly not to those who will be harmed by it. Individual-level treatment effects can be estimated with tools adapted from machine learning, but different models can yield contradictory estimates. Unlike risk prediction models, however, treatment effect models cannot be easily evaluated against each other using a held-out test set because the true treatment effect itself is never directly observed. Besides outcome prediction accuracy, several metrics that can leverage held-out data to evaluate treatment effects models have been proposed, but they are not widely used. We provide a didactic framework that elucidates the relationships between the different approaches and compare them all using a variety of simulations of both randomized and observational data. Our results show that researchers estimating heterogenous treatment effects need not limit themselves to a single model-fitting algorithm. Instead of relying on a single method, multiple models fit by a diverse set of algorithms should be evaluated against each other using an objective function learned from the validation set. The model minimizing that objective should be used for estimating the individual treatment effect for future individuals.
Tasks Model Selection
Published 2018-04-14
URL http://arxiv.org/abs/1804.05146v2
PDF http://arxiv.org/pdf/1804.05146v2.pdf
PWC https://paperswithcode.com/paper/a-comparison-of-methods-for-model-selection
Repo https://github.com/tonyduan/hte-prediction-rcts
Framework none

AllenNLP: A Deep Semantic Natural Language Processing Platform

Title AllenNLP: A Deep Semantic Natural Language Processing Platform
Authors Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson Liu, Matthew Peters, Michael Schmitz, Luke Zettlemoyer
Abstract This paper describes AllenNLP, a platform for research on deep learning methods in natural language understanding. AllenNLP is designed to support researchers who want to build novel language understanding models quickly and easily. It is built on top of PyTorch, allowing for dynamic computation graphs, and provides (1) a flexible data API that handles intelligent batching and padding, (2) high-level abstractions for common operations in working with text, and (3) a modular and extensible experiment framework that makes doing good science easy. It also includes reference implementations of high quality approaches for both core semantic problems (e.g. semantic role labeling (Palmer et al., 2005)) and language understanding applications (e.g. machine comprehension (Rajpurkar et al., 2016)). AllenNLP is an ongoing open-source effort maintained by engineers and researchers at the Allen Institute for Artificial Intelligence.
Tasks Reading Comprehension, Semantic Role Labeling
Published 2018-03-20
URL http://arxiv.org/abs/1803.07640v2
PDF http://arxiv.org/pdf/1803.07640v2.pdf
PWC https://paperswithcode.com/paper/allennlp-a-deep-semantic-natural-language
Repo https://github.com/allenai/allennlp
Framework pytorch

Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality

Title Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality
Authors Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi Wijewickrema, Grant Schoenebeck, Dawn Song, Michael E. Houle, James Bailey
Abstract Deep Neural Networks (DNNs) have recently been shown to be vulnerable against adversarial examples, which are carefully crafted instances that can mislead DNNs to make errors during prediction. To better understand such attacks, a characterization is needed of the properties of regions (the so-called ‘adversarial subspaces’) in which adversarial examples lie. We tackle this challenge by characterizing the dimensional properties of adversarial regions, via the use of Local Intrinsic Dimensionality (LID). LID assesses the space-filling capability of the region surrounding a reference example, based on the distance distribution of the example to its neighbors. We first provide explanations about how adversarial perturbation can affect the LID characteristic of adversarial regions, and then show empirically that LID characteristics can facilitate the distinction of adversarial examples generated using state-of-the-art attacks. As a proof-of-concept, we show that a potential application of LID is to distinguish adversarial examples, and the preliminary results show that it can outperform several state-of-the-art detection measures by large margins for five attack strategies considered in this paper across three benchmark datasets. Our analysis of the LID characteristic for adversarial regions not only motivates new directions of effective adversarial defense, but also opens up more challenges for developing new attacks to better understand the vulnerabilities of DNNs.
Tasks Adversarial Defense
Published 2018-01-08
URL http://arxiv.org/abs/1801.02613v3
PDF http://arxiv.org/pdf/1801.02613v3.pdf
PWC https://paperswithcode.com/paper/characterizing-adversarial-subspaces-using
Repo https://github.com/xingjunm/lid_adversarial_subspace_detection
Framework tf

Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior

Title Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior
Authors Zi Wang, Beomjoon Kim, Leslie Pack Kaelbling
Abstract Bayesian optimization usually assumes that a Bayesian prior is given. However, the strong theoretical guarantees in Bayesian optimization are often regrettably compromised in practice because of unknown parameters in the prior. In this paper, we adopt a variant of empirical Bayes and show that, by estimating the Gaussian process prior from offline data sampled from the same prior and constructing unbiased estimators of the posterior, variants of both GP-UCB and probability of improvement achieve a near-zero regret bound, which decreases to a constant proportional to the observational noise as the number of offline data and the number of online evaluations increase. Empirically, we have verified our approach on challenging simulated robotic problems featuring task and motion planning.
Tasks Motion Planning
Published 2018-11-23
URL http://arxiv.org/abs/1811.09558v1
PDF http://arxiv.org/pdf/1811.09558v1.pdf
PWC https://paperswithcode.com/paper/regret-bounds-for-meta-bayesian-optimization
Repo https://github.com/beomjoonkim/MetaLearnBO
Framework none
Title Simplicial Closure and higher-order link prediction
Authors Austin R. Benson, Rediet Abebe, Michael T. Schaub, Ali Jadbabaie, Jon Kleinberg
Abstract Networks provide a powerful formalism for modeling complex systems by using a model of pairwise interactions. But much of the structure within these systems involves interactions that take place among more than two nodes at once; for example, communication within a group rather than person-to person, collaboration among a team rather than a pair of coauthors, or biological interaction between a set of molecules rather than just two. Such higher-order interactions are ubiquitous, but their empirical study has received limited attention, and little is known about possible organizational principles of such structures. Here we study the temporal evolution of 19 datasets with explicit accounting for higher-order interactions. We show that there is a rich variety of structure in our datasets but datasets from the same system types have consistent patterns of higher-order structure. Furthermore, we find that tie strength and edge density are competing positive indicators of higher-order organization, and these trends are consistent across interactions involving differing numbers of nodes. To systematically further the study of theories for such higher-order structures, we propose higher-order link prediction as a benchmark problem to assess models and algorithms that predict higher-order structure. We find a fundamental differences from traditional pairwise link prediction, with a greater role for local rather than long-range information in predicting the appearance of new interactions.
Tasks Link Prediction
Published 2018-02-20
URL http://arxiv.org/abs/1802.06916v2
PDF http://arxiv.org/pdf/1802.06916v2.pdf
PWC https://paperswithcode.com/paper/simplicial-closure-and-higher-order-link
Repo https://github.com/ZerixU/CSNS
Framework none

Transfer learning for time series classification

Title Transfer learning for time series classification
Authors Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, Pierre-Alain Muller
Abstract Transfer learning for deep neural networks is the process of first training a base network on a source dataset, and then transferring the learned features (the network’s weights) to a second network to be trained on a target dataset. This idea has been shown to improve deep neural network’s generalization capabilities in many computer vision tasks such as image recognition and object localization. Apart from these applications, deep Convolutional Neural Networks (CNNs) have also recently gained popularity in the Time Series Classification (TSC) community. However, unlike for image recognition problems, transfer learning techniques have not yet been investigated thoroughly for the TSC task. This is surprising as the accuracy of deep learning models for TSC could potentially be improved if the model is fine-tuned from a pre-trained neural network instead of training it from scratch. In this paper, we fill this gap by investigating how to transfer deep CNNs for the TSC task. To evaluate the potential of transfer learning, we performed extensive experiments using the UCR archive which is the largest publicly available TSC benchmark containing 85 datasets. For each dataset in the archive, we pre-trained a model and then fine-tuned it on the other datasets resulting in 7140 different deep neural networks. These experiments revealed that transfer learning can improve or degrade the model’s predictions depending on the dataset used for transfer. Therefore, in an effort to predict the best source dataset for a given target dataset, we propose a new method relying on Dynamic Time Warping to measure inter-datasets similarities. We describe how our method can guide the transfer to choose the best source dataset leading to an improvement in accuracy on 71 out of 85 datasets.
Tasks Object Localization, Time Series, Time Series Classification, Transfer Learning
Published 2018-11-05
URL http://arxiv.org/abs/1811.01533v1
PDF http://arxiv.org/pdf/1811.01533v1.pdf
PWC https://paperswithcode.com/paper/transfer-learning-for-time-series
Repo https://github.com/hfawaz/bigdata18
Framework tf

Im2Avatar: Colorful 3D Reconstruction from a Single Image

Title Im2Avatar: Colorful 3D Reconstruction from a Single Image
Authors Yongbin Sun, Ziwei Liu, Yue Wang, Sanjay E. Sarma
Abstract Existing works on single-image 3D reconstruction mainly focus on shape recovery. In this work, we study a new problem, that is, simultaneously recovering 3D shape and surface color from a single image, namely “colorful 3D reconstruction”. This problem is both challenging and intriguing because the ability to infer textured 3D model from a single image is at the core of visual understanding. Here, we propose an end-to-end trainable framework, Colorful Voxel Network (CVN), to tackle this problem. Conditioned on a single 2D input, CVN learns to decompose shape and surface color information of a 3D object into a 3D shape branch and a surface color branch, respectively. Specifically, for the shape recovery, we generate a shape volume with the state of its voxels indicating occupancy. For the surface color recovery, we combine the strength of appearance hallucination and geometric projection by concurrently learning a regressed color volume and a 2D-to-3D flow volume, which are then fused into a blended color volume. The final textured 3D model is obtained by sampling color from the blended color volume at the positions of occupied voxels in the shape volume. To handle the severe sparse volume representations, a novel loss function, Mean Squared False Cross-Entropy Loss (MSFCEL), is designed. Extensive experiments demonstrate that our approach achieves significant improvement over baselines, and shows great generalization across diverse object categories and arbitrary viewpoints.
Tasks 3D Reconstruction
Published 2018-04-17
URL http://arxiv.org/abs/1804.06375v1
PDF http://arxiv.org/pdf/1804.06375v1.pdf
PWC https://paperswithcode.com/paper/im2avatar-colorful-3d-reconstruction-from-a
Repo https://github.com/syb7573330/im2avatar
Framework tf

Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training

Title Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training
Authors Bei Liu, Jianlong Fu, Makoto P. Kato, Masatoshi Yoshikawa
Abstract Automatic generation of natural language from images has attracted extensive attention. In this paper, we take one step further to investigate generation of poetic language (with multiple lines) to an image for automatic poetry creation. This task involves multiple challenges, including discovering poetic clues from the image (e.g., hope from green), and generating poems to satisfy both relevance to the image and poeticness in language level. To solve the above challenges, we formulate the task of poem generation into two correlated sub-tasks by multi-adversarial training via policy gradient, through which the cross-modal relevance and poetic language style can be ensured. To extract poetic clues from images, we propose to learn a deep coupled visual-poetic embedding, in which the poetic representation from objects, sentiments and scenes in an image can be jointly learned. Two discriminative networks are further introduced to guide the poem generation, including a multi-modal discriminator and a poem-style discriminator. To facilitate the research, we have released two poem datasets by human annotators with two distinct properties: 1) the first human annotated image-to-poem pair dataset (with 8,292 pairs in total), and 2) to-date the largest public English poem corpus dataset (with 92,265 different poems in total). Extensive experiments are conducted with 8K images, among which 1.5K image are randomly picked for evaluation. Both objective and subjective evaluations show the superior performances against the state-of-the-art methods for poem generation from images. Turing test carried out with over 500 human subjects, among which 30 evaluators are poetry experts, demonstrates the effectiveness of our approach.
Tasks
Published 2018-04-23
URL http://arxiv.org/abs/1804.08473v4
PDF http://arxiv.org/pdf/1804.08473v4.pdf
PWC https://paperswithcode.com/paper/beyond-narrative-description-generating
Repo https://github.com/forrestbing/chinese-poetry-generation
Framework none

Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift

Title Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift
Authors Xiang Li, Shuo Chen, Xiaolin Hu, Jian Yang
Abstract This paper first answers the question “why do the two most powerful techniques Dropout and Batch Normalization (BN) often lead to a worse performance when they are combined together?” in both theoretical and statistical aspects. Theoretically, we find that Dropout would shift the variance of a specific neural unit when we transfer the state of that network from train to test. However, BN would maintain its statistical variance, which is accumulated from the entire learning procedure, in the test phase. The inconsistency of that variance (we name this scheme as “variance shift”) causes the unstable numerical behavior in inference that leads to more erroneous predictions finally, when applying Dropout before BN. Thorough experiments on DenseNet, ResNet, ResNeXt and Wide ResNet confirm our findings. According to the uncovered mechanism, we next explore several strategies that modifies Dropout and try to overcome the limitations of their combination by avoiding the variance shift risks.
Tasks
Published 2018-01-16
URL http://arxiv.org/abs/1801.05134v1
PDF http://arxiv.org/pdf/1801.05134v1.pdf
PWC https://paperswithcode.com/paper/understanding-the-disharmony-between-dropout
Repo https://github.com/m090009/Behavioral_Cloning
Framework tf

SMILER: Saliency Model Implementation Library for Experimental Research

Title SMILER: Saliency Model Implementation Library for Experimental Research
Authors Calden Wloka, Toni Kunić, Iuliia Kotseruba, Ramin Fahimi, Nicholas Frosst, Neil D. B. Bruce, John K. Tsotsos
Abstract The Saliency Model Implementation Library for Experimental Research (SMILER) is a new software package which provides an open, standardized, and extensible framework for maintaining and executing computational saliency models. This work drastically reduces the human effort required to apply saliency algorithms to new tasks and datasets, while also ensuring consistency and procedural correctness for results and conclusions produced by different parties. At its launch SMILER already includes twenty three saliency models (fourteen models based in MATLAB and nine supported through containerization), and the open design of SMILER encourages this number to grow with future contributions from the community. The project may be downloaded and contributed to through its GitHub page: https://github.com/tsotsoslab/smiler
Tasks
Published 2018-12-20
URL http://arxiv.org/abs/1812.08848v1
PDF http://arxiv.org/pdf/1812.08848v1.pdf
PWC https://paperswithcode.com/paper/smiler-saliency-model-implementation-library
Repo https://github.com/tsotsoslab/smiler
Framework none

Learning Sequence Encoders for Temporal Knowledge Graph Completion

Title Learning Sequence Encoders for Temporal Knowledge Graph Completion
Authors Alberto García-Durán, Sebastijan Dumančić, Mathias Niepert
Abstract Research on link prediction in knowledge graphs has mainly focused on static multi-relational data. In this work we consider temporal knowledge graphs where relations between entities may only hold for a time interval or a specific point in time. In line with previous work on static knowledge graphs, we propose to address this problem by learning latent entity and relation type representations. To incorporate temporal information, we utilize recurrent neural networks to learn time-aware representations of relation types which can be used in conjunction with existing latent factorization methods. The proposed approach is shown to be robust to common challenges in real-world KGs: the sparsity and heterogeneity of temporal expressions. Experiments show the benefits of our approach on four temporal KGs. The data sets are available under a permissive BSD-3 license 1.
Tasks Knowledge Graph Completion, Knowledge Graphs, Link Prediction
Published 2018-09-10
URL http://arxiv.org/abs/1809.03202v1
PDF http://arxiv.org/pdf/1809.03202v1.pdf
PWC https://paperswithcode.com/paper/learning-sequence-encoders-for-temporal
Repo https://github.com/nle-ml/mmkb
Framework none

Generating Multi-Agent Trajectories using Programmatic Weak Supervision

Title Generating Multi-Agent Trajectories using Programmatic Weak Supervision
Authors Eric Zhan, Stephan Zheng, Yisong Yue, Long Sha, Patrick Lucey
Abstract We study the problem of training sequential generative models for capturing coordinated multi-agent trajectory behavior, such as offensive basketball gameplay. When modeling such settings, it is often beneficial to design hierarchical models that can capture long-term coordination using intermediate variables. Furthermore, these intermediate variables should capture interesting high-level behavioral semantics in an interpretable and manipulatable way. We present a hierarchical framework that can effectively learn such sequential generative models. Our approach is inspired by recent work on leveraging programmatically produced weak labels, which we extend to the spatiotemporal regime. In addition to synthetic settings, we show how to instantiate our framework to effectively model complex interactions between basketball players and generate realistic multi-agent trajectories of basketball gameplay over long time periods. We validate our approach using both quantitative and qualitative evaluations, including a user study comparison conducted with professional sports analysts.
Tasks Imitation Learning
Published 2018-03-20
URL http://arxiv.org/abs/1803.07612v6
PDF http://arxiv.org/pdf/1803.07612v6.pdf
PWC https://paperswithcode.com/paper/generating-multi-agent-trajectories-using
Repo https://github.com/ezhan94/multiagent-programmatic-supervision
Framework pytorch

Human Motion Analysis with Deep Metric Learning

Title Human Motion Analysis with Deep Metric Learning
Authors Huseyin Coskun, David Joseph Tan, Sailesh Conjeti, Nassir Navab, Federico Tombari
Abstract Effectively measuring the similarity between two human motions is necessary for several computer vision tasks such as gait analysis, person identi- fication and action retrieval. Nevertheless, we believe that traditional approaches such as L2 distance or Dynamic Time Warping based on hand-crafted local pose metrics fail to appropriately capture the semantic relationship across motions and, as such, are not suitable for being employed as metrics within these tasks. This work addresses this limitation by means of a triplet-based deep metric learning specifically tailored to deal with human motion data, in particular with the prob- lem of varying input size and computationally expensive hard negative mining due to motion pair alignment. Specifically, we propose (1) a novel metric learn- ing objective based on a triplet architecture and Maximum Mean Discrepancy; as well as, (2) a novel deep architecture based on attentive recurrent neural networks. One benefit of our objective function is that it enforces a better separation within the learned embedding space of the different motion categories by means of the associated distribution moments. At the same time, our attentive recurrent neural network allows processing varying input sizes to a fixed size of embedding while learning to focus on those motion parts that are semantically distinctive. Our ex- periments on two different datasets demonstrate significant improvements over conventional human motion metrics.
Tasks Metric Learning
Published 2018-07-30
URL http://arxiv.org/abs/1807.11176v2
PDF http://arxiv.org/pdf/1807.11176v2.pdf
PWC https://paperswithcode.com/paper/human-motion-analysis-with-deep-metric
Repo https://github.com/xrenaa/Human-Motion-Analysis-with-Deep-Metric-Learning
Framework pytorch

Self-Attentive Sequential Recommendation

Title Self-Attentive Sequential Recommendation
Authors Wang-Cheng Kang, Julian McAuley
Abstract Sequential dynamics are a key feature of many modern recommender systems, which seek to capture the context' of users' activities on the basis of actions they have performed recently. To capture such patterns, two approaches have proliferated: Markov Chains (MCs) and Recurrent Neural Networks (RNNs). Markov Chains assume that a user's next action can be predicted on the basis of just their last (or last few) actions, while RNNs in principle allow for longer-term semantics to be uncovered. Generally speaking, MC-based methods perform best in extremely sparse datasets, where model parsimony is critical, while RNNs perform better in denser datasets where higher model complexity is affordable. The goal of our work is to balance these two goals, by proposing a self-attention based sequential model (SASRec) that allows us to capture long-term semantics (like an RNN), but, using an attention mechanism, makes its predictions based on relatively few actions (like an MC). At each time step, SASRec seeks to identify which items are relevant’ from a user’s action history, and use them to predict the next item. Extensive empirical studies show that our method outperforms various state-of-the-art sequential models (including MC/CNN/RNN-based approaches) on both sparse and dense datasets. Moreover, the model is an order of magnitude more efficient than comparable CNN/RNN-based models. Visualizations on attention weights also show how our model adaptively handles datasets with various density, and uncovers meaningful patterns in activity sequences.
Tasks Recommendation Systems
Published 2018-08-20
URL http://arxiv.org/abs/1808.09781v1
PDF http://arxiv.org/pdf/1808.09781v1.pdf
PWC https://paperswithcode.com/paper/180809781
Repo https://github.com/kang205/SASRec
Framework tf

DKN: Deep Knowledge-Aware Network for News Recommendation

Title DKN: Deep Knowledge-Aware Network for News Recommendation
Authors Hongwei Wang, Fuzheng Zhang, Xing Xie, Minyi Guo
Abstract Online news recommender systems aim to address the information explosion of news and make personalized recommendation for users. In general, news language is highly condensed, full of knowledge entities and common sense. However, existing methods are unaware of such external knowledge and cannot fully discover latent knowledge-level connections among news. The recommended results for a user are consequently limited to simple patterns and cannot be extended reasonably. Moreover, news recommendation also faces the challenges of high time-sensitivity of news and dynamic diversity of users’ interests. To solve the above problems, in this paper, we propose a deep knowledge-aware network (DKN) that incorporates knowledge graph representation into news recommendation. DKN is a content-based deep recommendation framework for click-through rate prediction. The key component of DKN is a multi-channel and word-entity-aligned knowledge-aware convolutional neural network (KCNN) that fuses semantic-level and knowledge-level representations of news. KCNN treats words and entities as multiple channels, and explicitly keeps their alignment relationship during convolution. In addition, to address users’ diverse interests, we also design an attention module in DKN to dynamically aggregate a user’s history with respect to current candidate news. Through extensive experiments on a real online news platform, we demonstrate that DKN achieves substantial gains over state-of-the-art deep recommendation models. We also validate the efficacy of the usage of knowledge in DKN.
Tasks Click-Through Rate Prediction, Common Sense Reasoning, Recommendation Systems
Published 2018-01-25
URL http://arxiv.org/abs/1801.08284v2
PDF http://arxiv.org/pdf/1801.08284v2.pdf
PWC https://paperswithcode.com/paper/dkn-deep-knowledge-aware-network-for-news
Repo https://github.com/hwwang55/DKN
Framework tf
comments powered by Disqus