Paper Group AWR 389
Orthogonal Statistical Learning. Learning to Transfer: Unsupervised Meta Domain Translation. WONDER: Weighted one-shot distributed ridge regression in high dimensions. TGG: Transferable Graph Generation for Zero-shot and Few-shot Learning. Outfit Compatibility Prediction and Diagnosis with Multi-Layered Comparison Network. Overcoming Data Limitatio …
Orthogonal Statistical Learning
Title | Orthogonal Statistical Learning |
Authors | Dylan J. Foster, Vasilis Syrgkanis |
Abstract | We provide excess risk guarantees for statistical learning in a setting where the population risk with respect to which we evaluate the target model depends on an unknown model that must be to be estimated from data (a “nuisance model”). We analyze a two-stage sample splitting meta-algorithm that takes as input two arbitrary estimation algorithms: one for the target model and one for the nuisance model. We show that if the population risk satisfies a condition called Neyman orthogonality, the impact of the nuisance estimation error on the excess risk bound achieved by the meta-algorithm is of second order. Our theorem is agnostic to the particular algorithms used for the target and nuisance and only makes an assumption on their individual performance. This enables the use of a plethora of existing results from statistical learning and machine learning literature to give new guarantees for learning with a nuisance component. Moreover, by focusing on excess risk rather than parameter estimation, we can give guarantees under weaker assumptions than in previous works and accommodate the case where the target parameter belongs to a complex nonparametric class. We characterize conditions on the metric entropy such that oracle rates—rates of the same order as if we knew the nuisance model—are achieved. We also analyze the rates achieved by specific estimation algorithms such as variance-penalized empirical risk minimization, neural network estimation and sparse high-dimensional linear model estimation. We highlight the applicability of our results in four settings of central importance in the literature: 1) heterogeneous treatment effect estimation, 2) offline policy optimization, 3) domain adaptation, and 4) learning with missing data. |
Tasks | Domain Adaptation |
Published | 2019-01-25 |
URL | http://arxiv.org/abs/1901.09036v2 |
http://arxiv.org/pdf/1901.09036v2.pdf | |
PWC | https://paperswithcode.com/paper/orthogonal-statistical-learning |
Repo | https://github.com/Microsoft/EconML |
Framework | none |
Learning to Transfer: Unsupervised Meta Domain Translation
Title | Learning to Transfer: Unsupervised Meta Domain Translation |
Authors | Jianxin Lin, Yijun Wang, Tianyu He, Zhibo Chen |
Abstract | Unsupervised domain translation has recently achieved impressive performance with Generative Adversarial Network (GAN) and sufficient (unpaired) training data. However, existing domain translation frameworks form in a disposable way where the learning experiences are ignored and the obtained model cannot be adapted to a new coming domain. In this work, we take on unsupervised domain translation problems from a meta-learning perspective. We propose a model called Meta-Translation GAN (MT-GAN) to find good initialization of translation models. In the meta-training procedure, MT-GAN is explicitly trained with a primary translation task and a synthesized dual translation task. A cycle-consistency meta-optimization objective is designed to ensure the generalization ability. We demonstrate effectiveness of our model on ten diverse two-domain translation tasks and multiple face identity translation tasks. We show that our proposed approach significantly outperforms the existing domain translation methods when each domain contains no more than ten training samples. |
Tasks | Meta-Learning |
Published | 2019-06-01 |
URL | https://arxiv.org/abs/1906.00181v3 |
https://arxiv.org/pdf/1906.00181v3.pdf | |
PWC | https://paperswithcode.com/paper/190600181 |
Repo | https://github.com/linjx-ustc1106/MT-GAN-PyTorch |
Framework | pytorch |
WONDER: Weighted one-shot distributed ridge regression in high dimensions
Title | WONDER: Weighted one-shot distributed ridge regression in high dimensions |
Authors | Edgar Dobriban, Yue Sheng |
Abstract | In many areas, practitioners need to analyze large datasets that challenge conventional single-machine computing. To scale up data analysis, distributed and parallel computing approaches are increasingly needed. Here we study a fundamental and highly important problem in this area: How to do ridge regression in a distributed computing environment? Ridge regression is an extremely popular method for supervised learning, and has several optimality properties, thus it is important to study. We study one-shot methods that construct weighted combinations of ridge regression estimators computed on each machine. By analyzing the mean squared error in a high dimensional random-effects model where each predictor has a small effect, we discover several new phenomena. 1. Infinite-worker limit: The distributed estimator works well for very large numbers of machines, a phenomenon we call “infinite-worker limit”. 2. Optimal weights: The optimal weights for combining local estimators sum to more than unity, due to the downward bias of ridge. Thus, all averaging methods are suboptimal. We also propose a new Weighted ONe-shot DistributEd Ridge regression (WONDER) algorithm. We test WONDER in simulation studies and using the Million Song Dataset as an example. There it can save at least 100x in computation time, while nearly preserving test accuracy. |
Tasks | |
Published | 2019-03-22 |
URL | https://arxiv.org/abs/1903.09321v2 |
https://arxiv.org/pdf/1903.09321v2.pdf | |
PWC | https://paperswithcode.com/paper/one-shot-distributed-ridge-regression-in-high |
Repo | https://github.com/dobriban/dist_ridge |
Framework | none |
TGG: Transferable Graph Generation for Zero-shot and Few-shot Learning
Title | TGG: Transferable Graph Generation for Zero-shot and Few-shot Learning |
Authors | Chenrui Zhang, Xiaoqing Lyu, Zhi Tang |
Abstract | Zero-shot and few-shot learning aim to improve generalization to unseen concepts, which are promising in many realistic scenarios. Due to the lack of data in unseen domain, relation modeling between seen and unseen domains is vital for knowledge transfer in these tasks. Most existing methods capture seen-unseen relation implicitly via semantic embedding or feature generation, resulting in inadequate use of relation and some issues remain (e.g. domain shift). To tackle these challenges, we propose a Transferable Graph Generation (TGG) approach, in which the relation is modeled and utilized explicitly via graph generation. Specifically, our proposed TGG contains two main components: (1) Graph generation for relation modeling. An attention-based aggregate network and a relation kernel are proposed, which generate instance-level graph based on a class-level prototype graph and visual features. Proximity information aggregating is guided by a multi-head graph attention mechanism, where seen and unseen features synthesized by GAN are revised as node embeddings. The relation kernel further generates edges with GCN and graph kernel method, to capture instance-level topological structure while tackling data imbalance and noise. (2) Relation propagation for relation utilization. A dual relation propagation approach is proposed, where relations captured by the generated graph are separately propagated from the seen and unseen subgraphs. The two propagations learn from each other in a dual learning fashion, which performs as an adaptation way for mitigating domain shift. All components are jointly optimized with a meta-learning strategy, and our TGG acts as an end-to-end framework unifying conventional zero-shot, generalized zero-shot and few-shot learning. Extensive experiments demonstrate that it consistently surpasses existing methods of the above three fields by a significant margin. |
Tasks | Few-Shot Learning, Graph Generation, Meta-Learning, Transfer Learning |
Published | 2019-08-30 |
URL | https://arxiv.org/abs/1908.11503v1 |
https://arxiv.org/pdf/1908.11503v1.pdf | |
PWC | https://paperswithcode.com/paper/tgg-transferable-graph-generation-for-zero |
Repo | https://github.com/zcrwind/tgg-pytorch |
Framework | pytorch |
Outfit Compatibility Prediction and Diagnosis with Multi-Layered Comparison Network
Title | Outfit Compatibility Prediction and Diagnosis with Multi-Layered Comparison Network |
Authors | Xin Wang, Bo Wu, Yun Ye, Yueqi Zhong |
Abstract | Existing works about fashion outfit compatibility focus on predicting the overall compatibility of a set of fashion items with their information from different modalities. However, there are few works explore how to explain the prediction, which limits the persuasiveness and effectiveness of the model. In this work, we propose an approach to not only predict but also diagnose the outfit compatibility. We introduce an end-to-end framework for this goal, which features for: (1) The overall compatibility is learned from all type-specified pairwise similarities between items, and the backpropagation gradients are used to diagnose the incompatible factors. (2) We leverage the hierarchy of CNN and compare the features at different layers to take into account the compatibilities of different aspects from the low level (such as color, texture) to the high level (such as style). To support the proposed method, we build a new type-specified outfit dataset named Polyvore-T based on Polyvore dataset. We compare our method with the prior state-of-the-art in two tasks: outfit compatibility prediction and fill-in-the-blank. Experiments show that our approach has advantages in both prediction performance and diagnosis ability. |
Tasks | |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11496v2 |
https://arxiv.org/pdf/1907.11496v2.pdf | |
PWC | https://paperswithcode.com/paper/outfit-compatibility-prediction-and-diagnosis |
Repo | https://github.com/WangXin93/fashion_compatibility_mcn |
Framework | pytorch |
Overcoming Data Limitation in Medical Visual Question Answering
Title | Overcoming Data Limitation in Medical Visual Question Answering |
Authors | Binh D. Nguyen, Thanh-Toan Do, Binh X. Nguyen, Tuong Do, Erman Tjiputra, Quang D. Tran |
Abstract | Traditional approaches for Visual Question Answering (VQA) require large amount of labeled data for training. Unfortunately, such large scale data is usually not available for medical domain. In this paper, we propose a novel medical VQA framework that overcomes the labeled data limitation. The proposed framework explores the use of the unsupervised Denoising Auto-Encoder (DAE) and the supervised Meta-Learning. The advantage of DAE is to leverage the large amount of unlabeled images while the advantage of Meta-Learning is to learn meta-weights that quickly adapt to VQA problem with limited labeled data. By leveraging the advantages of these techniques, it allows the proposed framework to be efficiently trained using a small labeled training set. The experimental results show that our proposed method significantly outperforms the state-of-the-art medical VQA. |
Tasks | Denoising, Meta-Learning, Question Answering, Visual Question Answering |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.11867v1 |
https://arxiv.org/pdf/1909.11867v1.pdf | |
PWC | https://paperswithcode.com/paper/overcoming-data-limitation-in-medical-visual |
Repo | https://github.com/aioz-ai/MICCAI19-MedVQA |
Framework | pytorch |
KitcheNette: Predicting and Recommending Food Ingredient Pairings using Siamese Neural Networks
Title | KitcheNette: Predicting and Recommending Food Ingredient Pairings using Siamese Neural Networks |
Authors | Donghyeon Park, Keonwoo Kim, Yonggyu Park, Jungwoon Shin, Jaewoo Kang |
Abstract | As a vast number of ingredients exist in the culinary world, there are countless food ingredient pairings, but only a small number of pairings have been adopted by chefs and studied by food researchers. In this work, we propose KitcheNette which is a model that predicts food ingredient pairing scores and recommends optimal ingredient pairings. KitcheNette employs Siamese neural networks and is trained on our annotated dataset containing 300K scores of pairings generated from numerous ingredients in food recipes. As the results demonstrate, our model not only outperforms other baseline models but also can recommend complementary food pairings and discover novel ingredient pairings. |
Tasks | |
Published | 2019-05-16 |
URL | https://arxiv.org/abs/1905.07261v1 |
https://arxiv.org/pdf/1905.07261v1.pdf | |
PWC | https://paperswithcode.com/paper/kitchenette-predicting-and-recommending-food |
Repo | https://github.com/dmis-lab/KitcheNette |
Framework | pytorch |
Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation
Title | Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation |
Authors | Federico Landi, Lorenzo Baraldi, Marcella Cornia, Massimiliano Corsini, Rita Cucchiara |
Abstract | Vision-and-Language Navigation (VLN) is a challenging task in which an agent needs to follow a language-specified path to reach a target destination. In this paper, we strive for the creation of an agent able to tackle three key issues: multi-modality, long-term dependencies, and adaptability towards different locomotive settings. To that end, we devise “Perceive, Transform, and Act” (PTA): a fully-attentive VLN architecture that leaves the recurrent approach behind and the first Transformer-like architecture incorporating three different modalities - natural language, images, and discrete actions for the agent control. In particular, we adopt an early fusion strategy to merge lingual and visual information efficiently in our encoder. We then propose to refine the decoding phase with a late fusion extension between the agent’s history of actions and the perception modalities. We experimentally validate our model on two datasets and two different action settings. PTA surpasses previous state-of-the-art architectures for low-level VLN on R2R and achieves the first place for both setups in the recently proposed R4R benchmark. Our code is publicly available at https://github.com/aimagelab/perceive-transform-and-act. |
Tasks | |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.12377v1 |
https://arxiv.org/pdf/1911.12377v1.pdf | |
PWC | https://paperswithcode.com/paper/perceive-transform-and-act-multi-modal |
Repo | https://github.com/aimagelab/perceive-transform-and-act |
Framework | pytorch |
Capturing Argument Interaction in Semantic Role Labeling with Capsule Networks
Title | Capturing Argument Interaction in Semantic Role Labeling with Capsule Networks |
Authors | Xinchi Chen, Chunchuan Lyu, Ivan Titov |
Abstract | Semantic role labeling (SRL) involves extracting propositions (i.e. predicates and their typed arguments) from natural language sentences. State-of-the-art SRL models rely on powerful encoders (e.g., LSTMs) and do not model non-local interaction between arguments. We propose a new approach to modeling these interactions while maintaining efficient inference. Specifically, we use Capsule Networks: each proposition is encoded as a tuple of \textit{capsules}, one capsule per argument type (i.e. role). These tuples serve as embeddings of entire propositions. In every network layer, the capsules interact with each other and with representations of words in the sentence. Each iteration results in updated proposition embeddings and updated predictions about the SRL structure. Our model substantially outperforms the non-refinement baseline model on all 7 CoNLL-2019 languages and achieves state-of-the-art results on 5 languages (including English) for dependency SRL. We analyze the types of mistakes corrected by the refinement procedure. For example, each role is typically (but not always) filled with at most one argument. Whereas enforcing this approximate constraint is not useful with the modern SRL system, iterative procedure corrects the mistakes by capturing this intuition in a flexible and context-sensitive way. |
Tasks | Semantic Role Labeling |
Published | 2019-10-07 |
URL | https://arxiv.org/abs/1910.03136v1 |
https://arxiv.org/pdf/1910.03136v1.pdf | |
PWC | https://paperswithcode.com/paper/capturing-argument-interaction-in-semantic |
Repo | https://github.com/DalstonChen/CapNetSRL |
Framework | none |
Physiological and Affective Computing through Thermal Imaging: A Survey
Title | Physiological and Affective Computing through Thermal Imaging: A Survey |
Authors | Youngjun Cho, Nadia Bianchi-Berthouze |
Abstract | Thermal imaging-based physiological and affective computing is an emerging research area enabling technologies to monitor our bodily functions and understand psychological and affective needs in a contactless manner. However, up to recently, research has been mainly carried out in very controlled lab settings. As small size and even low-cost versions of thermal video cameras have started to appear on the market, mobile thermal imaging is opening its door to ubiquitous and real-world applications. Here we review the literature on the use of thermal imaging to track changes in physiological cues relevant to affective computing and the technological requirements set so far. In doing so, we aim to establish computational and methodological pipelines from thermal images of the human skin to affective states and outline the research opportunities and challenges to be tackled to make ubiquitous real-life thermal imaging-based affect monitoring a possibility. |
Tasks | |
Published | 2019-08-27 |
URL | https://arxiv.org/abs/1908.10307v1 |
https://arxiv.org/pdf/1908.10307v1.pdf | |
PWC | https://paperswithcode.com/paper/physiological-and-affective-computing-through |
Repo | https://github.com/deepneuroscience/TIPA |
Framework | none |
PST900: RGB-Thermal Calibration, Dataset and Segmentation Network
Title | PST900: RGB-Thermal Calibration, Dataset and Segmentation Network |
Authors | Shreyas S. Shivakumar, Neil Rodrigues, Alex Zhou, Ian D. Miller, Vijay Kumar, Camillo J. Taylor |
Abstract | In this work we propose long wave infrared (LWIR) imagery as a viable supporting modality for semantic segmentation using learning-based techniques. We first address the problem of RGB-thermal camera calibration by proposing a passive calibration target and procedure that is both portable and easy to use. Second, we present PST900, a dataset of 894 synchronized and calibrated RGB and Thermal image pairs with per pixel human annotations across four distinct classes from the DARPA Subterranean Challenge. Lastly, we propose a CNN architecture for fast semantic segmentation that combines both RGB and Thermal imagery in a way that leverages RGB imagery independently. We compare our method against the state-of-the-art and show that our method outperforms them in our dataset. |
Tasks | Calibration, Semantic Segmentation |
Published | 2019-09-20 |
URL | https://arxiv.org/abs/1909.10980v1 |
https://arxiv.org/pdf/1909.10980v1.pdf | |
PWC | https://paperswithcode.com/paper/pst900-rgb-thermal-calibration-dataset-and |
Repo | https://github.com/ShreyasSkandanS/pst900_thermal_rgb |
Framework | pytorch |
Region Proposal by Guided Anchoring
Title | Region Proposal by Guided Anchoring |
Authors | Jiaqi Wang, Kai Chen, Shuo Yang, Chen Change Loy, Dahua Lin |
Abstract | Region anchors are the cornerstone of modern object detection techniques. State-of-the-art detectors mostly rely on a dense anchoring scheme, where anchors are sampled uniformly over the spatial domain with a predefined set of scales and aspect ratios. In this paper, we revisit this foundational stage. Our study shows that it can be done much more effectively and efficiently. Specifically, we present an alternative scheme, named Guided Anchoring, which leverages semantic features to guide the anchoring. The proposed method jointly predicts the locations where the center of objects of interest are likely to exist as well as the scales and aspect ratios at different locations. On top of predicted anchor shapes, we mitigate the feature inconsistency with a feature adaption module. We also study the use of high-quality proposals to improve detection performance. The anchoring scheme can be seamlessly integrated into proposal methods and detectors. With Guided Anchoring, we achieve 9.1% higher recall on MS COCO with 90% fewer anchors than the RPN baseline. We also adopt Guided Anchoring in Fast R-CNN, Faster R-CNN and RetinaNet, respectively improving the detection mAP by 2.2%, 2.7% and 1.2%. Code will be available at https://github.com/open-mmlab/mmdetection. |
Tasks | Object Detection |
Published | 2019-01-10 |
URL | http://arxiv.org/abs/1901.03278v2 |
http://arxiv.org/pdf/1901.03278v2.pdf | |
PWC | https://paperswithcode.com/paper/region-proposal-by-guided-anchoring |
Repo | https://github.com/zhousy1993/paper |
Framework | none |
Chirality Nets for Human Pose Regression
Title | Chirality Nets for Human Pose Regression |
Authors | Raymond A. Yeh, Yuan-Ting Hu, Alexander G. Schwing |
Abstract | We propose Chirality Nets, a family of deep nets that is equivariant to the “chirality transform,” i.e., the transformation to create a chiral pair. Through parameter sharing, odd and even symmetry, we propose and prove variants of standard building blocks of deep nets that satisfy the equivariance property, including fully connected layers, convolutional layers, batch-normalization, and LSTM/GRU cells. The proposed layers lead to a more data efficient representation and a reduction in computation by exploiting symmetry. We evaluate chirality nets on the task of human pose regression, which naturally exploits the left/right mirroring of the human body. We study three pose regression tasks: 3D pose estimation from video, 2D pose forecasting, and skeleton based activity recognition. Our approach achieves/matches state-of-the-art results, with more significant gains on small datasets and limited-data settings. |
Tasks | 3D Pose Estimation, Activity Recognition, Pose Estimation, Skeleton Based Action Recognition |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1911.00029v1 |
https://arxiv.org/pdf/1911.00029v1.pdf | |
PWC | https://paperswithcode.com/paper/chirality-nets-for-human-pose-regression |
Repo | https://github.com/raymondyeh07/chirality_nets |
Framework | pytorch |
Multimodal Joint Emotion and Game Context Recognition in League of Legends Livestreams
Title | Multimodal Joint Emotion and Game Context Recognition in League of Legends Livestreams |
Authors | Charles Ringer, James Alfred Walker, Mihalis A. Nicolaou |
Abstract | Video game streaming provides the viewer with a rich set of audio-visual data, conveying information both with regards to the game itself, through game footage and audio, as well as the streamer’s emotional state and behaviour via webcam footage and audio. Analysing player behaviour and discovering correlations with game context is crucial for modelling and understanding important aspects of livestreams, but comes with a significant set of challenges - such as fusing multimodal data captured by different sensors in uncontrolled (‘in-the-wild’) conditions. Firstly, we present, to our knowledge, the first data set of League of Legends livestreams, annotated for both streamer affect and game context. Secondly, we propose a method that exploits tensor decompositions for high-order fusion of multimodal representations. The proposed method is evaluated on the problem of jointly predicting game context and player affect, compared with a set of baseline fusion approaches such as late and early fusion. |
Tasks | League of Legends |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1905.13694v1 |
https://arxiv.org/pdf/1905.13694v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-joint-emotion-and-game-context |
Repo | https://github.com/charlieringer/LoLEmoGameRecognition |
Framework | none |
Joint Parsing and Generation for Abstractive Summarization
Title | Joint Parsing and Generation for Abstractive Summarization |
Authors | Kaiqiang Song, Logan Lebanoff, Qipeng Guo, Xipeng Qiu, Xiangyang Xue, Chen Li, Dong Yu, Fei Liu |
Abstract | Sentences produced by abstractive summarization systems can be ungrammatical and fail to preserve the original meanings, despite being locally fluent. In this paper we propose to remedy this problem by jointly generating a sentence and its syntactic dependency parse while performing abstraction. If generating a word can introduce an erroneous relation to the summary, the behavior must be discouraged. The proposed method thus holds promise for producing grammatical sentences and encouraging the summary to stay true-to-original. Our contributions of this work are twofold. First, we present a novel neural architecture for abstractive summarization that combines a sequential decoder with a tree-based decoder in a synchronized manner to generate a summary sentence and its syntactic parse. Secondly, we describe a novel human evaluation protocol to assess if, and to what extent, a summary remains true to its original meanings. We evaluate our method on a number of summarization datasets and demonstrate competitive results against strong baselines. |
Tasks | Abstractive Text Summarization |
Published | 2019-11-23 |
URL | https://arxiv.org/abs/1911.10389v1 |
https://arxiv.org/pdf/1911.10389v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-parsing-and-generation-for-abstractive |
Repo | https://github.com/ucfnlp/joint-parse-n-summarize |
Framework | pytorch |