February 1, 2020

3083 words 15 mins read

Paper Group AWR 309

Cross View Fusion for 3D Human Pose Estimation. Blind Super-Resolution Kernel Estimation using an Internal-GAN. TabNet: Attentive Interpretable Tabular Learning. Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset. Handheld Multi-Frame Super-Resolution. A Deep Learning System for Predicting Size and Fit in Fashion E-Commerc …

Cross View Fusion for 3D Human Pose Estimation


Title	Cross View Fusion for 3D Human Pose Estimation
Authors	Haibo Qiu, Chunyu Wang, Jingdong Wang, Naiyan Wang, Wenjun Zeng
Abstract	We present an approach to recover absolute 3D human poses from multi-view images by incorporating multi-view geometric priors in our model. It consists of two separate steps: (1) estimating the 2D poses in multi-view images and (2) recovering the 3D poses from the multi-view 2D poses. First, we introduce a cross-view fusion scheme into CNN to jointly estimate 2D poses for multiple views. Consequently, the 2D pose estimation for each view already benefits from other views. Second, we present a recursive Pictorial Structure Model to recover the 3D pose from the multi-view 2D poses. It gradually improves the accuracy of 3D pose with affordable computational cost. We test our method on two public datasets H36M and Total Capture. The Mean Per Joint Position Errors on the two datasets are 26mm and 29mm, which outperforms the state-of-the-arts remarkably (26mm vs 52mm, 29mm vs 35mm). Our code is released at \url{https://github.com/microsoft/multiview-human-pose-estimation-pytorch}.
Tasks	3D Human Pose Estimation, Pose Estimation
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01203v1
PDF	https://arxiv.org/pdf/1909.01203v1.pdf
PWC	https://paperswithcode.com/paper/cross-view-fusion-for-3d-human-pose
Repo	https://github.com/microsoft/multiview-human-pose-estimation-pytorch
Framework	pytorch


Title	Blind Super-Resolution Kernel Estimation using an Internal-GAN
Authors	Sefi Bell-Kligler, Assaf Shocher, Michal Irani
Abstract	Super resolution (SR) methods typically assume that the low-resolution (LR) image was downscaled from the unknown high-resolution (HR) image by a fixed ‘ideal’ downscaling kernel (e.g. Bicubic downscaling). However, this is rarely the case in real LR images, in contrast to synthetically generated SR datasets. When the assumed downscaling kernel deviates from the true one, the performance of SR methods significantly deteriorates. This gave rise to Blind-SR - namely, SR when the downscaling kernel (“SR-kernel”) is unknown. It was further shown that the true SR-kernel is the one that maximizes the recurrence of patches across scales of the LR image. In this paper we show how this powerful cross-scale recurrence property can be realized using Deep Internal Learning. We introduce “KernelGAN”, an image-specific Internal-GAN, which trains solely on the LR test image at test time, and learns its internal distribution of patches. Its Generator is trained to produce a downscaled version of the LR test image, such that its Discriminator cannot distinguish between the patch distribution of the downscaled image, and the patch distribution of the original LR image. The Generator, once trained, constitutes the downscaling operation with the correct image-specific SR-kernel. KernelGAN is fully unsupervised, requires no training data other than the input image itself, and leads to state-of-the-art results in Blind-SR when plugged into existing SR algorithms.
Tasks	Super-Resolution
Published	2019-09-14
URL	https://arxiv.org/abs/1909.06581v6
PDF	https://arxiv.org/pdf/1909.06581v6.pdf
PWC	https://paperswithcode.com/paper/blind-super-resolution-kernel-estimation
Repo	https://github.com/sefibk/KernelGAN
Framework	pytorch

TabNet: Attentive Interpretable Tabular Learning


Title	TabNet: Attentive Interpretable Tabular Learning
Authors	Sercan O. Arik, Tomas Pfister
Abstract	We propose a novel high-performance and interpretable canonical deep tabular data learning architecture, TabNet. TabNet uses sequential attention to choose which features to reason from at each decision step, enabling interpretability and more efficient learning as the learning capacity is used for the most salient features. We demonstrate that TabNet outperforms other neural network and decision tree variants on a wide range of non-performance-saturated tabular datasets and yields interpretable feature attributions plus insights into the global model behavior. Finally, for the first time to our knowledge, we demonstrate self-supervised learning for tabular data, significantly improving performance with unsupervised representation learning when unlabeled data is abundant.
Tasks	Decision Making, Feature Selection, Representation Learning, Unsupervised Representation Learning
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07442v4
PDF	https://arxiv.org/pdf/1908.07442v4.pdf
PWC	https://paperswithcode.com/paper/tabnet-attentive-interpretable-tabular
Repo	https://github.com/mgrankin/fast_tabnet
Framework	pytorch

Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset


Title	Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset
Authors	Kohei Ozaki, Shuhei Yokoo
Abstract	The Google-Landmarks-v2 dataset is the biggest worldwide landmarks dataset characterized by a large magnitude of noisiness and diversity. We present a novel landmark retrieval/recognition system, robust to a noisy and diverse dataset, by our team, smlyaka. Our approach is based on deep convolutional neural networks with metric learning, trained by cosine-softmax based losses. Deep metric learning methods are usually sensitive to noise, and it could hinder to learn a reliable metric. To address this issue, we develop an automated data cleaning system. Besides, we devise a discriminative re-ranking method to address the diversity of the dataset for landmark retrieval. Using our methods, we achieved 1st place in the Google Landmark Retrieval 2019 challenge and 3rd place in the Google Landmark Recognition 2019 challenge on Kaggle.
Tasks	Metric Learning
Published	2019-06-10
URL	https://arxiv.org/abs/1906.04087v2
PDF	https://arxiv.org/pdf/1906.04087v2.pdf
PWC	https://paperswithcode.com/paper/large-scale-landmark-retrievalrecognition
Repo	https://github.com/lyakaap/Landmark2019-1st-and-3rd-Place-Solution
Framework	pytorch

Handheld Multi-Frame Super-Resolution


Title	Handheld Multi-Frame Super-Resolution
Authors	Bartlomiej Wronski, Ignacio Garcia-Dorado, Manfred Ernst, Damien Kelly, Michael Krainin, Chia-Kai Liang, Marc Levoy, Peyman Milanfar
Abstract	Compared to DSLR cameras, smartphone cameras have smaller sensors, which limits their spatial resolution; smaller apertures, which limits their light gathering ability; and smaller pixels, which reduces their signal-to noise ratio. The use of color filter arrays (CFAs) requires demosaicing, which further degrades resolution. In this paper, we supplant the use of traditional demosaicing in single-frame and burst photography pipelines with a multiframe super-resolution algorithm that creates a complete RGB image directly from a burst of CFA raw images. We harness natural hand tremor, typical in handheld photography, to acquire a burst of raw frames with small offsets. These frames are then aligned and merged to form a single image with red, green, and blue values at every pixel site. This approach, which includes no explicit demosaicing step, serves to both increase image resolution and boost signal to noise ratio. Our algorithm is robust to challenging scene conditions: local motion, occlusion, or scene changes. It runs at 100 milliseconds per 12-megapixel RAW input burst frame on mass-produced mobile phones. Specifically, the algorithm is the basis of the Super-Res Zoom feature, as well as the default merge method in Night Sight mode (whether zooming or not) on Google’s flagship phone.
Tasks	Demosaicking, Multi-Frame Super-Resolution, Super-Resolution
Published	2019-05-08
URL	https://arxiv.org/abs/1905.03277v1
PDF	https://arxiv.org/pdf/1905.03277v1.pdf
PWC	https://paperswithcode.com/paper/190503277
Repo	https://github.com/JVision/Handheld-Multi-Frame-Super-Resolution
Framework	none

A Deep Learning System for Predicting Size and Fit in Fashion E-Commerce


Title	A Deep Learning System for Predicting Size and Fit in Fashion E-Commerce
Authors	Abdul-Saboor Sheikh, Romain Guigoures, Evgenii Koriagin, Yuen King Ho, Reza Shirvany, Roland Vollgraf, Urs Bergmann
Abstract	Personalized size and fit recommendations bear crucial significance for any fashion e-commerce platform. Predicting the correct fit drives customer satisfaction and benefits the business by reducing costs incurred due to size-related returns. Traditional collaborative filtering algorithms seek to model customer preferences based on their previous orders. A typical challenge for such methods stems from extreme sparsity of customer-article orders. To alleviate this problem, we propose a deep learning based content-collaborative methodology for personalized size and fit recommendation. Our proposed method can ingest arbitrary customer and article data and can model multiple individuals or intents behind a single account. The method optimizes a global set of parameters to learn population-level abstractions of size and fit relevant information from observed customer-article interactions. It further employs customer and article specific embedding variables to learn their properties. Together with learned entity embeddings, the method maps additional customer and article attributes into a latent space to derive personalized recommendations. Application of our method to two publicly available datasets demonstrate an improvement over the state-of-the-art published results. On two proprietary datasets, one containing fit feedback from fashion experts and the other involving customer purchases, we further outperform comparable methodologies, including a recent Bayesian approach for size recommendation.
Tasks	Entity Embeddings
Published	2019-07-23
URL	https://arxiv.org/abs/1907.09844v1
PDF	https://arxiv.org/pdf/1907.09844v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-learning-system-for-predicting-size
Repo	https://github.com/NeverInAsh/fit-recommendation
Framework	tf

Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval


Title	Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval
Authors	Li Deng, Shuo Zhang, Krisztian Balog
Abstract	Tables contain valuable knowledge in a structured form. We employ neural language modeling approaches to embed tabular data into vector spaces. Specifically, we consider different table elements, such caption, column headings, and cells, for training word and entity embeddings. These embeddings are then utilized in three particular table-related tasks, row population, column population, and table retrieval, by incorporating them into existing retrieval models as additional semantic similarity signals. Evaluation results show that table embeddings can significantly improve upon the performance of state-of-the-art baselines.
Tasks	Entity Embeddings, Language Modelling, Semantic Similarity, Semantic Textual Similarity
Published	2019-05-31
URL	https://arxiv.org/abs/1906.00041v1
PDF	https://arxiv.org/pdf/1906.00041v1.pdf
PWC	https://paperswithcode.com/paper/190600041
Repo	https://github.com/iai-group/sigir2019-table2vec
Framework	none

Self-supervised GAN: Analysis and Improvement with Multi-class Minimax Game


Title	Self-supervised GAN: Analysis and Improvement with Multi-class Minimax Game
Authors	Ngoc-Trung Tran, Viet-Hung Tran, Ngoc-Bao Nguyen, Linxiao Yang, Ngai-Man Cheung
Abstract	Self-supervised (SS) learning is a powerful approach for representation learning using unlabeled data. Recently, it has been applied to Generative Adversarial Networks (GAN) training. Specifically, SS tasks were proposed to address the catastrophic forgetting issue in the GAN discriminator. In this work, we perform an in-depth analysis to understand how SS tasks interact with learning of generator. From the analysis, we identify issues of SS tasks which allow a severely mode-collapsed generator to excel the SS tasks. To address the issues, we propose new SS tasks based on a multi-class minimax game. The competition between our proposed SS tasks in the game encourages the generator to learn the data distribution and generate diverse samples. We provide both theoretical and empirical analysis to support that our proposed SS tasks have better convergence property. We conduct experiments to incorporate our proposed SS tasks into two different GAN baseline models. Our approach establishes state-of-the-art FID scores on CIFAR-10, CIFAR-100, STL-10, CelebA, Imagenet $32\times32$ and Stacked-MNIST datasets, outperforming existing works by considerable margins in some cases. Our unconditional GAN model approaches performance of conditional GAN without using labeled data. Our code: https://github.com/tntrung/msgan
Tasks	Image Generation, Representation Learning
Published	2019-11-16
URL	https://arxiv.org/abs/1911.06997v2
PDF	https://arxiv.org/pdf/1911.06997v2.pdf
PWC	https://paperswithcode.com/paper/self-supervised-gan-analysis-and-improvement-1
Repo	https://github.com/tntrung/msgan
Framework	tf

Word Embeddings for Entity-annotated Texts


Title	Word Embeddings for Entity-annotated Texts
Authors	Satya Almasian, Andreas Spitz, Michael Gertz
Abstract	Learned vector representations of words are useful tools for many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizens. While it seems intuitive that annotating named entities in the training corpus should result in more intelligent word features for downstream tasks, performance issues arise when popular embedding approaches are naively applied to entity annotated corpora. Not only are the resulting entity embeddings less useful than expected, but one also finds that the performance of the non-entity word embeddings degrades in comparison to those trained on the raw, unannotated corpus. In this paper, we investigate approaches to jointly train word and entity embeddings on a large corpus with automatically annotated and linked entities. We discuss two distinct approaches to the generation of such embeddings, namely the training of state-of-the-art embeddings on raw-text and annotated versions of the corpus, as well as node embeddings of a co-occurrence graph representation of the annotated corpus. We compare the performance of annotated embeddings and classical word embeddings on a variety of word similarity, analogy, and clustering evaluation tasks, and investigate their performance in entity-specific tasks. Our findings show that it takes more than training popular word embedding models on an annotated corpus to create entity embeddings with acceptable performance on common test cases. Based on these results, we discuss how and when node embeddings of the co-occurrence graph representation of the text can restore the performance.
Tasks	Entity Embeddings, Information Retrieval, Word Embeddings
Published	2019-02-06
URL	https://arxiv.org/abs/1902.02078v3
PDF	https://arxiv.org/pdf/1902.02078v3.pdf
PWC	https://paperswithcode.com/paper/word-embeddings-for-entity-annotated-texts
Repo	https://github.com/satya77/Entity_Embedding
Framework	none

MultiVerse: Causal Reasoning using Importance Sampling in Probabilistic Programming


Title	MultiVerse: Causal Reasoning using Importance Sampling in Probabilistic Programming
Authors	Yura Perov, Logan Graham, Kostis Gourgoulias, Jonathan G. Richens, Ciarán M. Lee, Adam Baker, Saurabh Johri
Abstract	We elaborate on using importance sampling for causal reasoning, in particular for counterfactual inference. We show how this can be implemented natively in probabilistic programming. By considering the structure of the counterfactual query, one can significantly optimise the inference process. We also consider design choices to enable further optimisations. We introduce MultiVerse, a probabilistic programming prototype engine for approximate causal reasoning. We provide experimental results and compare with Pyro, an existing probabilistic programming framework with some of causal reasoning tools.
Tasks	Counterfactual Inference, Probabilistic Programming
Published	2019-10-17
URL	https://arxiv.org/abs/1910.08091v2
PDF	https://arxiv.org/pdf/1910.08091v2.pdf
PWC	https://paperswithcode.com/paper/multiverse-causal-reasoning-using-importance
Repo	https://github.com/babylonhealth/multiverse
Framework	none

Consensus Maximization Tree Search Revisited


Title	Consensus Maximization Tree Search Revisited
Authors	Zhipeng Cai, Tat-Jun Chin, Vladlen Koltun
Abstract	Consensus maximization is widely used for robust fitting in computer vision. However, solving it exactly, i.e., finding the globally optimal solution, is intractable. A* tree search, which has been shown to be fixed-parameter tractable, is one of the most efficient exact methods, though it is still limited to small inputs. We make two key contributions towards improving A* tree search. First, we show that the consensus maximization tree structure used previously actually contains paths that connect nodes at both adjacent and non-adjacent levels. Crucially, paths connecting non-adjacent levels are redundant for tree search, but they were not avoided previously. We propose a new acceleration strategy that avoids such redundant paths. In the second contribution, we show that the existing branch pruning technique also deteriorates quickly with the problem dimension. We then propose a new branch pruning technique that is less dimension-sensitive to address this issue. Experiments show that both new techniques can significantly accelerate A* tree search, making it reasonably efficient on inputs that were previously out of reach.
Tasks
Published	2019-08-06
URL	https://arxiv.org/abs/1908.02021v3
PDF	https://arxiv.org/pdf/1908.02021v3.pdf
PWC	https://paperswithcode.com/paper/consensus-maximization-tree-search-revisited
Repo	https://github.com/ZhipengCai/MaxConTreeSearch
Framework	none

DeepClean – self-supervised artefact rejection for intensive care waveform data using deep generative learning


Title	DeepClean – self-supervised artefact rejection for intensive care waveform data using deep generative learning
Authors	Tom Edinburgh, Peter Smielewski, Marek Czosnyka, Stephen J. Eglen, Ari Ercole
Abstract	Waveform physiological data is important in the treatment of critically ill patients in the intensive care unit. Such recordings are susceptible to artefacts, which must be removed before the data can be re-used for alerting or reprocessed for other clinical or research purposes. Accurate removal of artefacts reduces bias and uncertainty in clinical assessment, as well as the false positive rate of intensive care unit alarms, and is therefore a key component in providing optimal clinical care. In this work, we present DeepClean; a prototype self-supervised artefact detection system using a convolutional variational autoencoder deep neural network that avoids costly and painstaking manual annotation, requiring only easily-obtained ‘good’ data for training. For a test case with invasive arterial blood pressure, we demonstrate that our algorithm can detect the presence of an artefact within a 10-second sample of data with sensitivity and specificity around 90%. Furthermore, DeepClean was able to identify regions of artefact within such samples with high accuracy and we show that it significantly outperforms a baseline principle component analysis approach in both signal reconstruction and artefact detection. DeepClean learns a generative model and therefore may also be used for imputation of missing data.
Tasks	Imputation
Published	2019-08-08
URL	https://arxiv.org/abs/1908.03129v4
PDF	https://arxiv.org/pdf/1908.03129v4.pdf
PWC	https://paperswithcode.com/paper/deepclean-self-supervised-artefact-rejection
Repo	https://github.com/tedinburgh/deepclean
Framework	none

Depth Growing for Neural Machine Translation


Title	Depth Growing for Neural Machine Translation
Authors	Lijun Wu, Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin, Jianhuang Lai, Tie-Yan Liu
Abstract	While very deep neural networks have shown effectiveness for computer vision and text classification applications, how to increase the network depth of neural machine translation (NMT) models for better translation quality remains a challenging problem. Directly stacking more blocks to the NMT model results in no improvement and even reduces performance. In this work, we propose an effective two-stage approach with three specially designed components to construct deeper NMT models, which result in significant improvements over the strong Transformer baselines on WMT$14$ English$\to$German and English$\to$French translation tasks\footnote{Our code is available at \url{https://github.com/apeterswu/Depth_Growing_NMT}}.
Tasks	Machine Translation, Text Classification
Published	2019-07-03
URL	https://arxiv.org/abs/1907.01968v1
PDF	https://arxiv.org/pdf/1907.01968v1.pdf
PWC	https://paperswithcode.com/paper/depth-growing-for-neural-machine-translation
Repo	https://github.com/apeterswu/Depth_Growing_NMT
Framework	pytorch

A Neural Approach to Irony Generation


Title	A Neural Approach to Irony Generation
Authors	Mengdi Zhu, Zhiwei Yu, Xiaojun Wan
Abstract	Ironies can not only express stronger emotions but also show a sense of humor. With the development of social media, ironies are widely used in public. Although many prior research studies have been conducted in irony detection, few studies focus on irony generation. The main challenges for irony generation are the lack of large-scale irony dataset and difficulties in modeling the ironic pattern. In this work, we first systematically define irony generation based on style transfer task. To address the lack of data, we make use of twitter and build a large-scale dataset. We also design a combination of rewards for reinforcement learning to control the generation of ironic sentences. Experimental results demonstrate the effectiveness of our model in terms of irony accuracy, sentiment preservation, and content preservation.
Tasks	Style Transfer
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06200v2
PDF	https://arxiv.org/pdf/1909.06200v2.pdf
PWC	https://paperswithcode.com/paper/a-neural-approach-to-irony-generation
Repo	https://github.com/zmd971202/IronyGeneration
Framework	pytorch

Modeling the Gaia Color-Magnitude Diagram with Bayesian Neural Flows to Constrain Distance Estimates


Title	Modeling the Gaia Color-Magnitude Diagram with Bayesian Neural Flows to Constrain Distance Estimates
Authors	Miles D. Cranmer, Richard Galvez, Lauren Anderson, David N. Spergel, Shirley Ho
Abstract	We demonstrate an algorithm for learning a flexible color-magnitude diagram from noisy parallax and photometry measurements using a normalizing flow, a deep neural network capable of learning an arbitrary multi-dimensional probability distribution. We present a catalog of 640M photometric distance posteriors to nearby stars derived from this data-driven model using Gaia DR2 photometry and parallaxes. Dust estimation and dereddening is done iteratively inside the model and without prior distance information, using the Bayestar map. The signal-to-noise (precision) of distance measurements improves on average by more than 48% over the raw Gaia data, and we also demonstrate how the accuracy of distances have improved over other models, especially in the noisy-parallax regime. Applications are discussed, including significantly improved Milky Way disk separation and substructure detection. We conclude with a discussion of future work, which exploits the normalizing flow architecture to allow us to exactly marginalize over missing photometry, enabling the inclusion of many surveys without losing coverage.
Tasks
Published	2019-08-21
URL	https://arxiv.org/abs/1908.08045v1
PDF	https://arxiv.org/pdf/1908.08045v1.pdf
PWC	https://paperswithcode.com/paper/modeling-the-gaia-color-magnitude-diagram
Repo	https://github.com/MilesCranmer/public_CMD_normalizing_flow
Framework	none