January 31, 2020

3242 words 16 mins read

Paper Group AWR 389

Orthogonal Statistical Learning. Learning to Transfer: Unsupervised Meta Domain Translation. WONDER: Weighted one-shot distributed ridge regression in high dimensions. TGG: Transferable Graph Generation for Zero-shot and Few-shot Learning. Outfit Compatibility Prediction and Diagnosis with Multi-Layered Comparison Network. Overcoming Data Limitatio …

Orthogonal Statistical Learning


Title	Orthogonal Statistical Learning
Authors	Dylan J. Foster, Vasilis Syrgkanis
Abstract	We provide excess risk guarantees for statistical learning in a setting where the population risk with respect to which we evaluate the target model depends on an unknown model that must be to be estimated from data (a “nuisance model”). We analyze a two-stage sample splitting meta-algorithm that takes as input two arbitrary estimation algorithms: one for the target model and one for the nuisance model. We show that if the population risk satisfies a condition called Neyman orthogonality, the impact of the nuisance estimation error on the excess risk bound achieved by the meta-algorithm is of second order. Our theorem is agnostic to the particular algorithms used for the target and nuisance and only makes an assumption on their individual performance. This enables the use of a plethora of existing results from statistical learning and machine learning literature to give new guarantees for learning with a nuisance component. Moreover, by focusing on excess risk rather than parameter estimation, we can give guarantees under weaker assumptions than in previous works and accommodate the case where the target parameter belongs to a complex nonparametric class. We characterize conditions on the metric entropy such that oracle rates—rates of the same order as if we knew the nuisance model—are achieved. We also analyze the rates achieved by specific estimation algorithms such as variance-penalized empirical risk minimization, neural network estimation and sparse high-dimensional linear model estimation. We highlight the applicability of our results in four settings of central importance in the literature: 1) heterogeneous treatment effect estimation, 2) offline policy optimization, 3) domain adaptation, and 4) learning with missing data.
Tasks	Domain Adaptation
Published	2019-01-25
URL	http://arxiv.org/abs/1901.09036v2
PDF	http://arxiv.org/pdf/1901.09036v2.pdf
PWC	https://paperswithcode.com/paper/orthogonal-statistical-learning
Repo	https://github.com/Microsoft/EconML
Framework	none

Learning to Transfer: Unsupervised Meta Domain Translation


Title	Learning to Transfer: Unsupervised Meta Domain Translation
Authors	Jianxin Lin, Yijun Wang, Tianyu He, Zhibo Chen
Abstract	Unsupervised domain translation has recently achieved impressive performance with Generative Adversarial Network (GAN) and sufficient (unpaired) training data. However, existing domain translation frameworks form in a disposable way where the learning experiences are ignored and the obtained model cannot be adapted to a new coming domain. In this work, we take on unsupervised domain translation problems from a meta-learning perspective. We propose a model called Meta-Translation GAN (MT-GAN) to find good initialization of translation models. In the meta-training procedure, MT-GAN is explicitly trained with a primary translation task and a synthesized dual translation task. A cycle-consistency meta-optimization objective is designed to ensure the generalization ability. We demonstrate effectiveness of our model on ten diverse two-domain translation tasks and multiple face identity translation tasks. We show that our proposed approach significantly outperforms the existing domain translation methods when each domain contains no more than ten training samples.
Tasks	Meta-Learning
Published	2019-06-01
URL	https://arxiv.org/abs/1906.00181v3
PDF	https://arxiv.org/pdf/1906.00181v3.pdf
PWC	https://paperswithcode.com/paper/190600181
Repo	https://github.com/linjx-ustc1106/MT-GAN-PyTorch
Framework	pytorch

WONDER: Weighted one-shot distributed ridge regression in high dimensions


Title	WONDER: Weighted one-shot distributed ridge regression in high dimensions
Authors	Edgar Dobriban, Yue Sheng
Abstract	In many areas, practitioners need to analyze large datasets that challenge conventional single-machine computing. To scale up data analysis, distributed and parallel computing approaches are increasingly needed. Here we study a fundamental and highly important problem in this area: How to do ridge regression in a distributed computing environment? Ridge regression is an extremely popular method for supervised learning, and has several optimality properties, thus it is important to study. We study one-shot methods that construct weighted combinations of ridge regression estimators computed on each machine. By analyzing the mean squared error in a high dimensional random-effects model where each predictor has a small effect, we discover several new phenomena. 1. Infinite-worker limit: The distributed estimator works well for very large numbers of machines, a phenomenon we call “infinite-worker limit”. 2. Optimal weights: The optimal weights for combining local estimators sum to more than unity, due to the downward bias of ridge. Thus, all averaging methods are suboptimal. We also propose a new Weighted ONe-shot DistributEd Ridge regression (WONDER) algorithm. We test WONDER in simulation studies and using the Million Song Dataset as an example. There it can save at least 100x in computation time, while nearly preserving test accuracy.
Tasks
Published	2019-03-22
URL	https://arxiv.org/abs/1903.09321v2
PDF	https://arxiv.org/pdf/1903.09321v2.pdf
PWC	https://paperswithcode.com/paper/one-shot-distributed-ridge-regression-in-high
Repo	https://github.com/dobriban/dist_ridge
Framework	none

TGG: Transferable Graph Generation for Zero-shot and Few-shot Learning


Title	TGG: Transferable Graph Generation for Zero-shot and Few-shot Learning
Authors	Chenrui Zhang, Xiaoqing Lyu, Zhi Tang
Abstract	Zero-shot and few-shot learning aim to improve generalization to unseen concepts, which are promising in many realistic scenarios. Due to the lack of data in unseen domain, relation modeling between seen and unseen domains is vital for knowledge transfer in these tasks. Most existing methods capture seen-unseen relation implicitly via semantic embedding or feature generation, resulting in inadequate use of relation and some issues remain (e.g. domain shift). To tackle these challenges, we propose a Transferable Graph Generation (TGG) approach, in which the relation is modeled and utilized explicitly via graph generation. Specifically, our proposed TGG contains two main components: (1) Graph generation for relation modeling. An attention-based aggregate network and a relation kernel are proposed, which generate instance-level graph based on a class-level prototype graph and visual features. Proximity information aggregating is guided by a multi-head graph attention mechanism, where seen and unseen features synthesized by GAN are revised as node embeddings. The relation kernel further generates edges with GCN and graph kernel method, to capture instance-level topological structure while tackling data imbalance and noise. (2) Relation propagation for relation utilization. A dual relation propagation approach is proposed, where relations captured by the generated graph are separately propagated from the seen and unseen subgraphs. The two propagations learn from each other in a dual learning fashion, which performs as an adaptation way for mitigating domain shift. All components are jointly optimized with a meta-learning strategy, and our TGG acts as an end-to-end framework unifying conventional zero-shot, generalized zero-shot and few-shot learning. Extensive experiments demonstrate that it consistently surpasses existing methods of the above three fields by a significant margin.
Tasks	Few-Shot Learning, Graph Generation, Meta-Learning, Transfer Learning
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11503v1
PDF	https://arxiv.org/pdf/1908.11503v1.pdf
PWC	https://paperswithcode.com/paper/tgg-transferable-graph-generation-for-zero
Repo	https://github.com/zcrwind/tgg-pytorch
Framework	pytorch

Outfit Compatibility Prediction and Diagnosis with Multi-Layered Comparison Network


Title	Outfit Compatibility Prediction and Diagnosis with Multi-Layered Comparison Network
Authors	Xin Wang, Bo Wu, Yun Ye, Yueqi Zhong
Abstract	Existing works about fashion outfit compatibility focus on predicting the overall compatibility of a set of fashion items with their information from different modalities. However, there are few works explore how to explain the prediction, which limits the persuasiveness and effectiveness of the model. In this work, we propose an approach to not only predict but also diagnose the outfit compatibility. We introduce an end-to-end framework for this goal, which features for: (1) The overall compatibility is learned from all type-specified pairwise similarities between items, and the backpropagation gradients are used to diagnose the incompatible factors. (2) We leverage the hierarchy of CNN and compare the features at different layers to take into account the compatibilities of different aspects from the low level (such as color, texture) to the high level (such as style). To support the proposed method, we build a new type-specified outfit dataset named Polyvore-T based on Polyvore dataset. We compare our method with the prior state-of-the-art in two tasks: outfit compatibility prediction and fill-in-the-blank. Experiments show that our approach has advantages in both prediction performance and diagnosis ability.
Tasks
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11496v2
PDF	https://arxiv.org/pdf/1907.11496v2.pdf
PWC	https://paperswithcode.com/paper/outfit-compatibility-prediction-and-diagnosis
Repo	https://github.com/WangXin93/fashion_compatibility_mcn
Framework	pytorch

Overcoming Data Limitation in Medical Visual Question Answering


Title	Overcoming Data Limitation in Medical Visual Question Answering
Authors	Binh D. Nguyen, Thanh-Toan Do, Binh X. Nguyen, Tuong Do, Erman Tjiputra, Quang D. Tran
Abstract	Traditional approaches for Visual Question Answering (VQA) require large amount of labeled data for training. Unfortunately, such large scale data is usually not available for medical domain. In this paper, we propose a novel medical VQA framework that overcomes the labeled data limitation. The proposed framework explores the use of the unsupervised Denoising Auto-Encoder (DAE) and the supervised Meta-Learning. The advantage of DAE is to leverage the large amount of unlabeled images while the advantage of Meta-Learning is to learn meta-weights that quickly adapt to VQA problem with limited labeled data. By leveraging the advantages of these techniques, it allows the proposed framework to be efficiently trained using a small labeled training set. The experimental results show that our proposed method significantly outperforms the state-of-the-art medical VQA.
Tasks	Denoising, Meta-Learning, Question Answering, Visual Question Answering
Published	2019-09-26
URL	https://arxiv.org/abs/1909.11867v1
PDF	https://arxiv.org/pdf/1909.11867v1.pdf
PWC	https://paperswithcode.com/paper/overcoming-data-limitation-in-medical-visual
Repo	https://github.com/aioz-ai/MICCAI19-MedVQA
Framework	pytorch

KitcheNette: Predicting and Recommending Food Ingredient Pairings using Siamese Neural Networks


Title	KitcheNette: Predicting and Recommending Food Ingredient Pairings using Siamese Neural Networks
Authors	Donghyeon Park, Keonwoo Kim, Yonggyu Park, Jungwoon Shin, Jaewoo Kang
Abstract	As a vast number of ingredients exist in the culinary world, there are countless food ingredient pairings, but only a small number of pairings have been adopted by chefs and studied by food researchers. In this work, we propose KitcheNette which is a model that predicts food ingredient pairing scores and recommends optimal ingredient pairings. KitcheNette employs Siamese neural networks and is trained on our annotated dataset containing 300K scores of pairings generated from numerous ingredients in food recipes. As the results demonstrate, our model not only outperforms other baseline models but also can recommend complementary food pairings and discover novel ingredient pairings.
Tasks
Published	2019-05-16
URL	https://arxiv.org/abs/1905.07261v1
PDF	https://arxiv.org/pdf/1905.07261v1.pdf
PWC	https://paperswithcode.com/paper/kitchenette-predicting-and-recommending-food
Repo	https://github.com/dmis-lab/KitcheNette
Framework	pytorch


Title	Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation
Authors	Federico Landi, Lorenzo Baraldi, Marcella Cornia, Massimiliano Corsini, Rita Cucchiara
Abstract	Vision-and-Language Navigation (VLN) is a challenging task in which an agent needs to follow a language-specified path to reach a target destination. In this paper, we strive for the creation of an agent able to tackle three key issues: multi-modality, long-term dependencies, and adaptability towards different locomotive settings. To that end, we devise “Perceive, Transform, and Act” (PTA): a fully-attentive VLN architecture that leaves the recurrent approach behind and the first Transformer-like architecture incorporating three different modalities - natural language, images, and discrete actions for the agent control. In particular, we adopt an early fusion strategy to merge lingual and visual information efficiently in our encoder. We then propose to refine the decoding phase with a late fusion extension between the agent’s history of actions and the perception modalities. We experimentally validate our model on two datasets and two different action settings. PTA surpasses previous state-of-the-art architectures for low-level VLN on R2R and achieves the first place for both setups in the recently proposed R4R benchmark. Our code is publicly available at https://github.com/aimagelab/perceive-transform-and-act.
Tasks
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12377v1
PDF	https://arxiv.org/pdf/1911.12377v1.pdf
PWC	https://paperswithcode.com/paper/perceive-transform-and-act-multi-modal
Repo	https://github.com/aimagelab/perceive-transform-and-act
Framework	pytorch

Capturing Argument Interaction in Semantic Role Labeling with Capsule Networks


Title	Capturing Argument Interaction in Semantic Role Labeling with Capsule Networks
Authors	Xinchi Chen, Chunchuan Lyu, Ivan Titov
Abstract	Semantic role labeling (SRL) involves extracting propositions (i.e. predicates and their typed arguments) from natural language sentences. State-of-the-art SRL models rely on powerful encoders (e.g., LSTMs) and do not model non-local interaction between arguments. We propose a new approach to modeling these interactions while maintaining efficient inference. Specifically, we use Capsule Networks: each proposition is encoded as a tuple of \textit{capsules}, one capsule per argument type (i.e. role). These tuples serve as embeddings of entire propositions. In every network layer, the capsules interact with each other and with representations of words in the sentence. Each iteration results in updated proposition embeddings and updated predictions about the SRL structure. Our model substantially outperforms the non-refinement baseline model on all 7 CoNLL-2019 languages and achieves state-of-the-art results on 5 languages (including English) for dependency SRL. We analyze the types of mistakes corrected by the refinement procedure. For example, each role is typically (but not always) filled with at most one argument. Whereas enforcing this approximate constraint is not useful with the modern SRL system, iterative procedure corrects the mistakes by capturing this intuition in a flexible and context-sensitive way.
Tasks	Semantic Role Labeling
Published	2019-10-07
URL	https://arxiv.org/abs/1910.03136v1
PDF	https://arxiv.org/pdf/1910.03136v1.pdf
PWC	https://paperswithcode.com/paper/capturing-argument-interaction-in-semantic
Repo	https://github.com/DalstonChen/CapNetSRL
Framework	none

Physiological and Affective Computing through Thermal Imaging: A Survey


Title	Physiological and Affective Computing through Thermal Imaging: A Survey
Authors	Youngjun Cho, Nadia Bianchi-Berthouze
Abstract	Thermal imaging-based physiological and affective computing is an emerging research area enabling technologies to monitor our bodily functions and understand psychological and affective needs in a contactless manner. However, up to recently, research has been mainly carried out in very controlled lab settings. As small size and even low-cost versions of thermal video cameras have started to appear on the market, mobile thermal imaging is opening its door to ubiquitous and real-world applications. Here we review the literature on the use of thermal imaging to track changes in physiological cues relevant to affective computing and the technological requirements set so far. In doing so, we aim to establish computational and methodological pipelines from thermal images of the human skin to affective states and outline the research opportunities and challenges to be tackled to make ubiquitous real-life thermal imaging-based affect monitoring a possibility.
Tasks
Published	2019-08-27
URL	https://arxiv.org/abs/1908.10307v1
PDF	https://arxiv.org/pdf/1908.10307v1.pdf
PWC	https://paperswithcode.com/paper/physiological-and-affective-computing-through
Repo	https://github.com/deepneuroscience/TIPA
Framework	none

PST900: RGB-Thermal Calibration, Dataset and Segmentation Network


Title	PST900: RGB-Thermal Calibration, Dataset and Segmentation Network
Authors	Shreyas S. Shivakumar, Neil Rodrigues, Alex Zhou, Ian D. Miller, Vijay Kumar, Camillo J. Taylor
Abstract	In this work we propose long wave infrared (LWIR) imagery as a viable supporting modality for semantic segmentation using learning-based techniques. We first address the problem of RGB-thermal camera calibration by proposing a passive calibration target and procedure that is both portable and easy to use. Second, we present PST900, a dataset of 894 synchronized and calibrated RGB and Thermal image pairs with per pixel human annotations across four distinct classes from the DARPA Subterranean Challenge. Lastly, we propose a CNN architecture for fast semantic segmentation that combines both RGB and Thermal imagery in a way that leverages RGB imagery independently. We compare our method against the state-of-the-art and show that our method outperforms them in our dataset.
Tasks	Calibration, Semantic Segmentation
Published	2019-09-20
URL	https://arxiv.org/abs/1909.10980v1
PDF	https://arxiv.org/pdf/1909.10980v1.pdf
PWC	https://paperswithcode.com/paper/pst900-rgb-thermal-calibration-dataset-and
Repo	https://github.com/ShreyasSkandanS/pst900_thermal_rgb
Framework	pytorch

Region Proposal by Guided Anchoring


Title	Region Proposal by Guided Anchoring
Authors	Jiaqi Wang, Kai Chen, Shuo Yang, Chen Change Loy, Dahua Lin
Abstract	Region anchors are the cornerstone of modern object detection techniques. State-of-the-art detectors mostly rely on a dense anchoring scheme, where anchors are sampled uniformly over the spatial domain with a predefined set of scales and aspect ratios. In this paper, we revisit this foundational stage. Our study shows that it can be done much more effectively and efficiently. Specifically, we present an alternative scheme, named Guided Anchoring, which leverages semantic features to guide the anchoring. The proposed method jointly predicts the locations where the center of objects of interest are likely to exist as well as the scales and aspect ratios at different locations. On top of predicted anchor shapes, we mitigate the feature inconsistency with a feature adaption module. We also study the use of high-quality proposals to improve detection performance. The anchoring scheme can be seamlessly integrated into proposal methods and detectors. With Guided Anchoring, we achieve 9.1% higher recall on MS COCO with 90% fewer anchors than the RPN baseline. We also adopt Guided Anchoring in Fast R-CNN, Faster R-CNN and RetinaNet, respectively improving the detection mAP by 2.2%, 2.7% and 1.2%. Code will be available at https://github.com/open-mmlab/mmdetection.
Tasks	Object Detection
Published	2019-01-10
URL	http://arxiv.org/abs/1901.03278v2
PDF	http://arxiv.org/pdf/1901.03278v2.pdf
PWC	https://paperswithcode.com/paper/region-proposal-by-guided-anchoring
Repo	https://github.com/zhousy1993/paper
Framework	none

Chirality Nets for Human Pose Regression


Title	Chirality Nets for Human Pose Regression
Authors	Raymond A. Yeh, Yuan-Ting Hu, Alexander G. Schwing
Abstract	We propose Chirality Nets, a family of deep nets that is equivariant to the “chirality transform,” i.e., the transformation to create a chiral pair. Through parameter sharing, odd and even symmetry, we propose and prove variants of standard building blocks of deep nets that satisfy the equivariance property, including fully connected layers, convolutional layers, batch-normalization, and LSTM/GRU cells. The proposed layers lead to a more data efficient representation and a reduction in computation by exploiting symmetry. We evaluate chirality nets on the task of human pose regression, which naturally exploits the left/right mirroring of the human body. We study three pose regression tasks: 3D pose estimation from video, 2D pose forecasting, and skeleton based activity recognition. Our approach achieves/matches state-of-the-art results, with more significant gains on small datasets and limited-data settings.
Tasks	3D Pose Estimation, Activity Recognition, Pose Estimation, Skeleton Based Action Recognition
Published	2019-10-31
URL	https://arxiv.org/abs/1911.00029v1
PDF	https://arxiv.org/pdf/1911.00029v1.pdf
PWC	https://paperswithcode.com/paper/chirality-nets-for-human-pose-regression
Repo	https://github.com/raymondyeh07/chirality_nets
Framework	pytorch

Multimodal Joint Emotion and Game Context Recognition in League of Legends Livestreams


Title	Multimodal Joint Emotion and Game Context Recognition in League of Legends Livestreams
Authors	Charles Ringer, James Alfred Walker, Mihalis A. Nicolaou
Abstract	Video game streaming provides the viewer with a rich set of audio-visual data, conveying information both with regards to the game itself, through game footage and audio, as well as the streamer’s emotional state and behaviour via webcam footage and audio. Analysing player behaviour and discovering correlations with game context is crucial for modelling and understanding important aspects of livestreams, but comes with a significant set of challenges - such as fusing multimodal data captured by different sensors in uncontrolled (‘in-the-wild’) conditions. Firstly, we present, to our knowledge, the first data set of League of Legends livestreams, annotated for both streamer affect and game context. Secondly, we propose a method that exploits tensor decompositions for high-order fusion of multimodal representations. The proposed method is evaluated on the problem of jointly predicting game context and player affect, compared with a set of baseline fusion approaches such as late and early fusion.
Tasks	League of Legends
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13694v1
PDF	https://arxiv.org/pdf/1905.13694v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-joint-emotion-and-game-context
Repo	https://github.com/charlieringer/LoLEmoGameRecognition
Framework	none

Joint Parsing and Generation for Abstractive Summarization


Title	Joint Parsing and Generation for Abstractive Summarization
Authors	Kaiqiang Song, Logan Lebanoff, Qipeng Guo, Xipeng Qiu, Xiangyang Xue, Chen Li, Dong Yu, Fei Liu
Abstract	Sentences produced by abstractive summarization systems can be ungrammatical and fail to preserve the original meanings, despite being locally fluent. In this paper we propose to remedy this problem by jointly generating a sentence and its syntactic dependency parse while performing abstraction. If generating a word can introduce an erroneous relation to the summary, the behavior must be discouraged. The proposed method thus holds promise for producing grammatical sentences and encouraging the summary to stay true-to-original. Our contributions of this work are twofold. First, we present a novel neural architecture for abstractive summarization that combines a sequential decoder with a tree-based decoder in a synchronized manner to generate a summary sentence and its syntactic parse. Secondly, we describe a novel human evaluation protocol to assess if, and to what extent, a summary remains true to its original meanings. We evaluate our method on a number of summarization datasets and demonstrate competitive results against strong baselines.
Tasks	Abstractive Text Summarization
Published	2019-11-23
URL	https://arxiv.org/abs/1911.10389v1
PDF	https://arxiv.org/pdf/1911.10389v1.pdf
PWC	https://paperswithcode.com/paper/joint-parsing-and-generation-for-abstractive
Repo	https://github.com/ucfnlp/joint-parse-n-summarize
Framework	pytorch