February 1, 2020

3083 words 15 mins read

Paper Group AWR 309

Paper Group AWR 309

Cross View Fusion for 3D Human Pose Estimation. Blind Super-Resolution Kernel Estimation using an Internal-GAN. TabNet: Attentive Interpretable Tabular Learning. Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset. Handheld Multi-Frame Super-Resolution. A Deep Learning System for Predicting Size and Fit in Fashion E-Commerc …

Cross View Fusion for 3D Human Pose Estimation

Title Cross View Fusion for 3D Human Pose Estimation
Authors Haibo Qiu, Chunyu Wang, Jingdong Wang, Naiyan Wang, Wenjun Zeng
Abstract We present an approach to recover absolute 3D human poses from multi-view images by incorporating multi-view geometric priors in our model. It consists of two separate steps: (1) estimating the 2D poses in multi-view images and (2) recovering the 3D poses from the multi-view 2D poses. First, we introduce a cross-view fusion scheme into CNN to jointly estimate 2D poses for multiple views. Consequently, the 2D pose estimation for each view already benefits from other views. Second, we present a recursive Pictorial Structure Model to recover the 3D pose from the multi-view 2D poses. It gradually improves the accuracy of 3D pose with affordable computational cost. We test our method on two public datasets H36M and Total Capture. The Mean Per Joint Position Errors on the two datasets are 26mm and 29mm, which outperforms the state-of-the-arts remarkably (26mm vs 52mm, 29mm vs 35mm). Our code is released at \url{https://github.com/microsoft/multiview-human-pose-estimation-pytorch}.
Tasks 3D Human Pose Estimation, Pose Estimation
Published 2019-09-03
URL https://arxiv.org/abs/1909.01203v1
PDF https://arxiv.org/pdf/1909.01203v1.pdf
PWC https://paperswithcode.com/paper/cross-view-fusion-for-3d-human-pose
Repo https://github.com/microsoft/multiview-human-pose-estimation-pytorch
Framework pytorch

Blind Super-Resolution Kernel Estimation using an Internal-GAN

Title Blind Super-Resolution Kernel Estimation using an Internal-GAN
Authors Sefi Bell-Kligler, Assaf Shocher, Michal Irani
Abstract Super resolution (SR) methods typically assume that the low-resolution (LR) image was downscaled from the unknown high-resolution (HR) image by a fixed ‘ideal’ downscaling kernel (e.g. Bicubic downscaling). However, this is rarely the case in real LR images, in contrast to synthetically generated SR datasets. When the assumed downscaling kernel deviates from the true one, the performance of SR methods significantly deteriorates. This gave rise to Blind-SR - namely, SR when the downscaling kernel (“SR-kernel”) is unknown. It was further shown that the true SR-kernel is the one that maximizes the recurrence of patches across scales of the LR image. In this paper we show how this powerful cross-scale recurrence property can be realized using Deep Internal Learning. We introduce “KernelGAN”, an image-specific Internal-GAN, which trains solely on the LR test image at test time, and learns its internal distribution of patches. Its Generator is trained to produce a downscaled version of the LR test image, such that its Discriminator cannot distinguish between the patch distribution of the downscaled image, and the patch distribution of the original LR image. The Generator, once trained, constitutes the downscaling operation with the correct image-specific SR-kernel. KernelGAN is fully unsupervised, requires no training data other than the input image itself, and leads to state-of-the-art results in Blind-SR when plugged into existing SR algorithms.
Tasks Super-Resolution
Published 2019-09-14
URL https://arxiv.org/abs/1909.06581v6
PDF https://arxiv.org/pdf/1909.06581v6.pdf
PWC https://paperswithcode.com/paper/blind-super-resolution-kernel-estimation
Repo https://github.com/sefibk/KernelGAN
Framework pytorch

TabNet: Attentive Interpretable Tabular Learning

Title TabNet: Attentive Interpretable Tabular Learning
Authors Sercan O. Arik, Tomas Pfister
Abstract We propose a novel high-performance and interpretable canonical deep tabular data learning architecture, TabNet. TabNet uses sequential attention to choose which features to reason from at each decision step, enabling interpretability and more efficient learning as the learning capacity is used for the most salient features. We demonstrate that TabNet outperforms other neural network and decision tree variants on a wide range of non-performance-saturated tabular datasets and yields interpretable feature attributions plus insights into the global model behavior. Finally, for the first time to our knowledge, we demonstrate self-supervised learning for tabular data, significantly improving performance with unsupervised representation learning when unlabeled data is abundant.
Tasks Decision Making, Feature Selection, Representation Learning, Unsupervised Representation Learning
Published 2019-08-20
URL https://arxiv.org/abs/1908.07442v4
PDF https://arxiv.org/pdf/1908.07442v4.pdf
PWC https://paperswithcode.com/paper/tabnet-attentive-interpretable-tabular
Repo https://github.com/mgrankin/fast_tabnet
Framework pytorch

Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset

Title Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset
Authors Kohei Ozaki, Shuhei Yokoo
Abstract The Google-Landmarks-v2 dataset is the biggest worldwide landmarks dataset characterized by a large magnitude of noisiness and diversity. We present a novel landmark retrieval/recognition system, robust to a noisy and diverse dataset, by our team, smlyaka. Our approach is based on deep convolutional neural networks with metric learning, trained by cosine-softmax based losses. Deep metric learning methods are usually sensitive to noise, and it could hinder to learn a reliable metric. To address this issue, we develop an automated data cleaning system. Besides, we devise a discriminative re-ranking method to address the diversity of the dataset for landmark retrieval. Using our methods, we achieved 1st place in the Google Landmark Retrieval 2019 challenge and 3rd place in the Google Landmark Recognition 2019 challenge on Kaggle.
Tasks Metric Learning
Published 2019-06-10
URL https://arxiv.org/abs/1906.04087v2
PDF https://arxiv.org/pdf/1906.04087v2.pdf
PWC https://paperswithcode.com/paper/large-scale-landmark-retrievalrecognition
Repo https://github.com/lyakaap/Landmark2019-1st-and-3rd-Place-Solution
Framework pytorch

Handheld Multi-Frame Super-Resolution

Title Handheld Multi-Frame Super-Resolution
Authors Bartlomiej Wronski, Ignacio Garcia-Dorado, Manfred Ernst, Damien Kelly, Michael Krainin, Chia-Kai Liang, Marc Levoy, Peyman Milanfar
Abstract Compared to DSLR cameras, smartphone cameras have smaller sensors, which limits their spatial resolution; smaller apertures, which limits their light gathering ability; and smaller pixels, which reduces their signal-to noise ratio. The use of color filter arrays (CFAs) requires demosaicing, which further degrades resolution. In this paper, we supplant the use of traditional demosaicing in single-frame and burst photography pipelines with a multiframe super-resolution algorithm that creates a complete RGB image directly from a burst of CFA raw images. We harness natural hand tremor, typical in handheld photography, to acquire a burst of raw frames with small offsets. These frames are then aligned and merged to form a single image with red, green, and blue values at every pixel site. This approach, which includes no explicit demosaicing step, serves to both increase image resolution and boost signal to noise ratio. Our algorithm is robust to challenging scene conditions: local motion, occlusion, or scene changes. It runs at 100 milliseconds per 12-megapixel RAW input burst frame on mass-produced mobile phones. Specifically, the algorithm is the basis of the Super-Res Zoom feature, as well as the default merge method in Night Sight mode (whether zooming or not) on Google’s flagship phone.
Tasks Demosaicking, Multi-Frame Super-Resolution, Super-Resolution
Published 2019-05-08
URL https://arxiv.org/abs/1905.03277v1
PDF https://arxiv.org/pdf/1905.03277v1.pdf
PWC https://paperswithcode.com/paper/190503277
Repo https://github.com/JVision/Handheld-Multi-Frame-Super-Resolution
Framework none

A Deep Learning System for Predicting Size and Fit in Fashion E-Commerce

Title A Deep Learning System for Predicting Size and Fit in Fashion E-Commerce
Authors Abdul-Saboor Sheikh, Romain Guigoures, Evgenii Koriagin, Yuen King Ho, Reza Shirvany, Roland Vollgraf, Urs Bergmann
Abstract Personalized size and fit recommendations bear crucial significance for any fashion e-commerce platform. Predicting the correct fit drives customer satisfaction and benefits the business by reducing costs incurred due to size-related returns. Traditional collaborative filtering algorithms seek to model customer preferences based on their previous orders. A typical challenge for such methods stems from extreme sparsity of customer-article orders. To alleviate this problem, we propose a deep learning based content-collaborative methodology for personalized size and fit recommendation. Our proposed method can ingest arbitrary customer and article data and can model multiple individuals or intents behind a single account. The method optimizes a global set of parameters to learn population-level abstractions of size and fit relevant information from observed customer-article interactions. It further employs customer and article specific embedding variables to learn their properties. Together with learned entity embeddings, the method maps additional customer and article attributes into a latent space to derive personalized recommendations. Application of our method to two publicly available datasets demonstrate an improvement over the state-of-the-art published results. On two proprietary datasets, one containing fit feedback from fashion experts and the other involving customer purchases, we further outperform comparable methodologies, including a recent Bayesian approach for size recommendation.
Tasks Entity Embeddings
Published 2019-07-23
URL https://arxiv.org/abs/1907.09844v1
PDF https://arxiv.org/pdf/1907.09844v1.pdf
PWC https://paperswithcode.com/paper/a-deep-learning-system-for-predicting-size
Repo https://github.com/NeverInAsh/fit-recommendation
Framework tf

Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval

Title Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval
Authors Li Deng, Shuo Zhang, Krisztian Balog
Abstract Tables contain valuable knowledge in a structured form. We employ neural language modeling approaches to embed tabular data into vector spaces. Specifically, we consider different table elements, such caption, column headings, and cells, for training word and entity embeddings. These embeddings are then utilized in three particular table-related tasks, row population, column population, and table retrieval, by incorporating them into existing retrieval models as additional semantic similarity signals. Evaluation results show that table embeddings can significantly improve upon the performance of state-of-the-art baselines.
Tasks Entity Embeddings, Language Modelling, Semantic Similarity, Semantic Textual Similarity
Published 2019-05-31
URL https://arxiv.org/abs/1906.00041v1
PDF https://arxiv.org/pdf/1906.00041v1.pdf
PWC https://paperswithcode.com/paper/190600041
Repo https://github.com/iai-group/sigir2019-table2vec
Framework none

Self-supervised GAN: Analysis and Improvement with Multi-class Minimax Game

Title Self-supervised GAN: Analysis and Improvement with Multi-class Minimax Game
Authors Ngoc-Trung Tran, Viet-Hung Tran, Ngoc-Bao Nguyen, Linxiao Yang, Ngai-Man Cheung
Abstract Self-supervised (SS) learning is a powerful approach for representation learning using unlabeled data. Recently, it has been applied to Generative Adversarial Networks (GAN) training. Specifically, SS tasks were proposed to address the catastrophic forgetting issue in the GAN discriminator. In this work, we perform an in-depth analysis to understand how SS tasks interact with learning of generator. From the analysis, we identify issues of SS tasks which allow a severely mode-collapsed generator to excel the SS tasks. To address the issues, we propose new SS tasks based on a multi-class minimax game. The competition between our proposed SS tasks in the game encourages the generator to learn the data distribution and generate diverse samples. We provide both theoretical and empirical analysis to support that our proposed SS tasks have better convergence property. We conduct experiments to incorporate our proposed SS tasks into two different GAN baseline models. Our approach establishes state-of-the-art FID scores on CIFAR-10, CIFAR-100, STL-10, CelebA, Imagenet $32\times32$ and Stacked-MNIST datasets, outperforming existing works by considerable margins in some cases. Our unconditional GAN model approaches performance of conditional GAN without using labeled data. Our code: https://github.com/tntrung/msgan
Tasks Image Generation, Representation Learning
Published 2019-11-16
URL https://arxiv.org/abs/1911.06997v2
PDF https://arxiv.org/pdf/1911.06997v2.pdf
PWC https://paperswithcode.com/paper/self-supervised-gan-analysis-and-improvement-1
Repo https://github.com/tntrung/msgan
Framework tf

Word Embeddings for Entity-annotated Texts

Title Word Embeddings for Entity-annotated Texts
Authors Satya Almasian, Andreas Spitz, Michael Gertz
Abstract Learned vector representations of words are useful tools for many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizens. While it seems intuitive that annotating named entities in the training corpus should result in more intelligent word features for downstream tasks, performance issues arise when popular embedding approaches are naively applied to entity annotated corpora. Not only are the resulting entity embeddings less useful than expected, but one also finds that the performance of the non-entity word embeddings degrades in comparison to those trained on the raw, unannotated corpus. In this paper, we investigate approaches to jointly train word and entity embeddings on a large corpus with automatically annotated and linked entities. We discuss two distinct approaches to the generation of such embeddings, namely the training of state-of-the-art embeddings on raw-text and annotated versions of the corpus, as well as node embeddings of a co-occurrence graph representation of the annotated corpus. We compare the performance of annotated embeddings and classical word embeddings on a variety of word similarity, analogy, and clustering evaluation tasks, and investigate their performance in entity-specific tasks. Our findings show that it takes more than training popular word embedding models on an annotated corpus to create entity embeddings with acceptable performance on common test cases. Based on these results, we discuss how and when node embeddings of the co-occurrence graph representation of the text can restore the performance.
Tasks Entity Embeddings, Information Retrieval, Word Embeddings
Published 2019-02-06
URL https://arxiv.org/abs/1902.02078v3
PDF https://arxiv.org/pdf/1902.02078v3.pdf
PWC https://paperswithcode.com/paper/word-embeddings-for-entity-annotated-texts
Repo https://github.com/satya77/Entity_Embedding
Framework none

MultiVerse: Causal Reasoning using Importance Sampling in Probabilistic Programming

Title MultiVerse: Causal Reasoning using Importance Sampling in Probabilistic Programming
Authors Yura Perov, Logan Graham, Kostis Gourgoulias, Jonathan G. Richens, Ciarán M. Lee, Adam Baker, Saurabh Johri
Abstract We elaborate on using importance sampling for causal reasoning, in particular for counterfactual inference. We show how this can be implemented natively in probabilistic programming. By considering the structure of the counterfactual query, one can significantly optimise the inference process. We also consider design choices to enable further optimisations. We introduce MultiVerse, a probabilistic programming prototype engine for approximate causal reasoning. We provide experimental results and compare with Pyro, an existing probabilistic programming framework with some of causal reasoning tools.
Tasks Counterfactual Inference, Probabilistic Programming
Published 2019-10-17
URL https://arxiv.org/abs/1910.08091v2
PDF https://arxiv.org/pdf/1910.08091v2.pdf
PWC https://paperswithcode.com/paper/multiverse-causal-reasoning-using-importance
Repo https://github.com/babylonhealth/multiverse
Framework none

Consensus Maximization Tree Search Revisited

Title Consensus Maximization Tree Search Revisited
Authors Zhipeng Cai, Tat-Jun Chin, Vladlen Koltun
Abstract Consensus maximization is widely used for robust fitting in computer vision. However, solving it exactly, i.e., finding the globally optimal solution, is intractable. A* tree search, which has been shown to be fixed-parameter tractable, is one of the most efficient exact methods, though it is still limited to small inputs. We make two key contributions towards improving A* tree search. First, we show that the consensus maximization tree structure used previously actually contains paths that connect nodes at both adjacent and non-adjacent levels. Crucially, paths connecting non-adjacent levels are redundant for tree search, but they were not avoided previously. We propose a new acceleration strategy that avoids such redundant paths. In the second contribution, we show that the existing branch pruning technique also deteriorates quickly with the problem dimension. We then propose a new branch pruning technique that is less dimension-sensitive to address this issue. Experiments show that both new techniques can significantly accelerate A* tree search, making it reasonably efficient on inputs that were previously out of reach.
Tasks
Published 2019-08-06
URL https://arxiv.org/abs/1908.02021v3
PDF https://arxiv.org/pdf/1908.02021v3.pdf
PWC https://paperswithcode.com/paper/consensus-maximization-tree-search-revisited
Repo https://github.com/ZhipengCai/MaxConTreeSearch
Framework none

DeepClean – self-supervised artefact rejection for intensive care waveform data using deep generative learning

Title DeepClean – self-supervised artefact rejection for intensive care waveform data using deep generative learning
Authors Tom Edinburgh, Peter Smielewski, Marek Czosnyka, Stephen J. Eglen, Ari Ercole
Abstract Waveform physiological data is important in the treatment of critically ill patients in the intensive care unit. Such recordings are susceptible to artefacts, which must be removed before the data can be re-used for alerting or reprocessed for other clinical or research purposes. Accurate removal of artefacts reduces bias and uncertainty in clinical assessment, as well as the false positive rate of intensive care unit alarms, and is therefore a key component in providing optimal clinical care. In this work, we present DeepClean; a prototype self-supervised artefact detection system using a convolutional variational autoencoder deep neural network that avoids costly and painstaking manual annotation, requiring only easily-obtained ‘good’ data for training. For a test case with invasive arterial blood pressure, we demonstrate that our algorithm can detect the presence of an artefact within a 10-second sample of data with sensitivity and specificity around 90%. Furthermore, DeepClean was able to identify regions of artefact within such samples with high accuracy and we show that it significantly outperforms a baseline principle component analysis approach in both signal reconstruction and artefact detection. DeepClean learns a generative model and therefore may also be used for imputation of missing data.
Tasks Imputation
Published 2019-08-08
URL https://arxiv.org/abs/1908.03129v4
PDF https://arxiv.org/pdf/1908.03129v4.pdf
PWC https://paperswithcode.com/paper/deepclean-self-supervised-artefact-rejection
Repo https://github.com/tedinburgh/deepclean
Framework none

Depth Growing for Neural Machine Translation

Title Depth Growing for Neural Machine Translation
Authors Lijun Wu, Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin, Jianhuang Lai, Tie-Yan Liu
Abstract While very deep neural networks have shown effectiveness for computer vision and text classification applications, how to increase the network depth of neural machine translation (NMT) models for better translation quality remains a challenging problem. Directly stacking more blocks to the NMT model results in no improvement and even reduces performance. In this work, we propose an effective two-stage approach with three specially designed components to construct deeper NMT models, which result in significant improvements over the strong Transformer baselines on WMT$14$ English$\to$German and English$\to$French translation tasks\footnote{Our code is available at \url{https://github.com/apeterswu/Depth_Growing_NMT}}.
Tasks Machine Translation, Text Classification
Published 2019-07-03
URL https://arxiv.org/abs/1907.01968v1
PDF https://arxiv.org/pdf/1907.01968v1.pdf
PWC https://paperswithcode.com/paper/depth-growing-for-neural-machine-translation
Repo https://github.com/apeterswu/Depth_Growing_NMT
Framework pytorch

A Neural Approach to Irony Generation

Title A Neural Approach to Irony Generation
Authors Mengdi Zhu, Zhiwei Yu, Xiaojun Wan
Abstract Ironies can not only express stronger emotions but also show a sense of humor. With the development of social media, ironies are widely used in public. Although many prior research studies have been conducted in irony detection, few studies focus on irony generation. The main challenges for irony generation are the lack of large-scale irony dataset and difficulties in modeling the ironic pattern. In this work, we first systematically define irony generation based on style transfer task. To address the lack of data, we make use of twitter and build a large-scale dataset. We also design a combination of rewards for reinforcement learning to control the generation of ironic sentences. Experimental results demonstrate the effectiveness of our model in terms of irony accuracy, sentiment preservation, and content preservation.
Tasks Style Transfer
Published 2019-09-13
URL https://arxiv.org/abs/1909.06200v2
PDF https://arxiv.org/pdf/1909.06200v2.pdf
PWC https://paperswithcode.com/paper/a-neural-approach-to-irony-generation
Repo https://github.com/zmd971202/IronyGeneration
Framework pytorch

Modeling the Gaia Color-Magnitude Diagram with Bayesian Neural Flows to Constrain Distance Estimates

Title Modeling the Gaia Color-Magnitude Diagram with Bayesian Neural Flows to Constrain Distance Estimates
Authors Miles D. Cranmer, Richard Galvez, Lauren Anderson, David N. Spergel, Shirley Ho
Abstract We demonstrate an algorithm for learning a flexible color-magnitude diagram from noisy parallax and photometry measurements using a normalizing flow, a deep neural network capable of learning an arbitrary multi-dimensional probability distribution. We present a catalog of 640M photometric distance posteriors to nearby stars derived from this data-driven model using Gaia DR2 photometry and parallaxes. Dust estimation and dereddening is done iteratively inside the model and without prior distance information, using the Bayestar map. The signal-to-noise (precision) of distance measurements improves on average by more than 48% over the raw Gaia data, and we also demonstrate how the accuracy of distances have improved over other models, especially in the noisy-parallax regime. Applications are discussed, including significantly improved Milky Way disk separation and substructure detection. We conclude with a discussion of future work, which exploits the normalizing flow architecture to allow us to exactly marginalize over missing photometry, enabling the inclusion of many surveys without losing coverage.
Tasks
Published 2019-08-21
URL https://arxiv.org/abs/1908.08045v1
PDF https://arxiv.org/pdf/1908.08045v1.pdf
PWC https://paperswithcode.com/paper/modeling-the-gaia-color-magnitude-diagram
Repo https://github.com/MilesCranmer/public_CMD_normalizing_flow
Framework none
comments powered by Disqus