Paper Group AWR 309
Cross View Fusion for 3D Human Pose Estimation. Blind Super-Resolution Kernel Estimation using an Internal-GAN. TabNet: Attentive Interpretable Tabular Learning. Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset. Handheld Multi-Frame Super-Resolution. A Deep Learning System for Predicting Size and Fit in Fashion E-Commerc …
Cross View Fusion for 3D Human Pose Estimation
Title | Cross View Fusion for 3D Human Pose Estimation |
Authors | Haibo Qiu, Chunyu Wang, Jingdong Wang, Naiyan Wang, Wenjun Zeng |
Abstract | We present an approach to recover absolute 3D human poses from multi-view images by incorporating multi-view geometric priors in our model. It consists of two separate steps: (1) estimating the 2D poses in multi-view images and (2) recovering the 3D poses from the multi-view 2D poses. First, we introduce a cross-view fusion scheme into CNN to jointly estimate 2D poses for multiple views. Consequently, the 2D pose estimation for each view already benefits from other views. Second, we present a recursive Pictorial Structure Model to recover the 3D pose from the multi-view 2D poses. It gradually improves the accuracy of 3D pose with affordable computational cost. We test our method on two public datasets H36M and Total Capture. The Mean Per Joint Position Errors on the two datasets are 26mm and 29mm, which outperforms the state-of-the-arts remarkably (26mm vs 52mm, 29mm vs 35mm). Our code is released at \url{https://github.com/microsoft/multiview-human-pose-estimation-pytorch}. |
Tasks | 3D Human Pose Estimation, Pose Estimation |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.01203v1 |
https://arxiv.org/pdf/1909.01203v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-view-fusion-for-3d-human-pose |
Repo | https://github.com/microsoft/multiview-human-pose-estimation-pytorch |
Framework | pytorch |
Blind Super-Resolution Kernel Estimation using an Internal-GAN
Title | Blind Super-Resolution Kernel Estimation using an Internal-GAN |
Authors | Sefi Bell-Kligler, Assaf Shocher, Michal Irani |
Abstract | Super resolution (SR) methods typically assume that the low-resolution (LR) image was downscaled from the unknown high-resolution (HR) image by a fixed ‘ideal’ downscaling kernel (e.g. Bicubic downscaling). However, this is rarely the case in real LR images, in contrast to synthetically generated SR datasets. When the assumed downscaling kernel deviates from the true one, the performance of SR methods significantly deteriorates. This gave rise to Blind-SR - namely, SR when the downscaling kernel (“SR-kernel”) is unknown. It was further shown that the true SR-kernel is the one that maximizes the recurrence of patches across scales of the LR image. In this paper we show how this powerful cross-scale recurrence property can be realized using Deep Internal Learning. We introduce “KernelGAN”, an image-specific Internal-GAN, which trains solely on the LR test image at test time, and learns its internal distribution of patches. Its Generator is trained to produce a downscaled version of the LR test image, such that its Discriminator cannot distinguish between the patch distribution of the downscaled image, and the patch distribution of the original LR image. The Generator, once trained, constitutes the downscaling operation with the correct image-specific SR-kernel. KernelGAN is fully unsupervised, requires no training data other than the input image itself, and leads to state-of-the-art results in Blind-SR when plugged into existing SR algorithms. |
Tasks | Super-Resolution |
Published | 2019-09-14 |
URL | https://arxiv.org/abs/1909.06581v6 |
https://arxiv.org/pdf/1909.06581v6.pdf | |
PWC | https://paperswithcode.com/paper/blind-super-resolution-kernel-estimation |
Repo | https://github.com/sefibk/KernelGAN |
Framework | pytorch |
TabNet: Attentive Interpretable Tabular Learning
Title | TabNet: Attentive Interpretable Tabular Learning |
Authors | Sercan O. Arik, Tomas Pfister |
Abstract | We propose a novel high-performance and interpretable canonical deep tabular data learning architecture, TabNet. TabNet uses sequential attention to choose which features to reason from at each decision step, enabling interpretability and more efficient learning as the learning capacity is used for the most salient features. We demonstrate that TabNet outperforms other neural network and decision tree variants on a wide range of non-performance-saturated tabular datasets and yields interpretable feature attributions plus insights into the global model behavior. Finally, for the first time to our knowledge, we demonstrate self-supervised learning for tabular data, significantly improving performance with unsupervised representation learning when unlabeled data is abundant. |
Tasks | Decision Making, Feature Selection, Representation Learning, Unsupervised Representation Learning |
Published | 2019-08-20 |
URL | https://arxiv.org/abs/1908.07442v4 |
https://arxiv.org/pdf/1908.07442v4.pdf | |
PWC | https://paperswithcode.com/paper/tabnet-attentive-interpretable-tabular |
Repo | https://github.com/mgrankin/fast_tabnet |
Framework | pytorch |
Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset
Title | Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset |
Authors | Kohei Ozaki, Shuhei Yokoo |
Abstract | The Google-Landmarks-v2 dataset is the biggest worldwide landmarks dataset characterized by a large magnitude of noisiness and diversity. We present a novel landmark retrieval/recognition system, robust to a noisy and diverse dataset, by our team, smlyaka. Our approach is based on deep convolutional neural networks with metric learning, trained by cosine-softmax based losses. Deep metric learning methods are usually sensitive to noise, and it could hinder to learn a reliable metric. To address this issue, we develop an automated data cleaning system. Besides, we devise a discriminative re-ranking method to address the diversity of the dataset for landmark retrieval. Using our methods, we achieved 1st place in the Google Landmark Retrieval 2019 challenge and 3rd place in the Google Landmark Recognition 2019 challenge on Kaggle. |
Tasks | Metric Learning |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.04087v2 |
https://arxiv.org/pdf/1906.04087v2.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-landmark-retrievalrecognition |
Repo | https://github.com/lyakaap/Landmark2019-1st-and-3rd-Place-Solution |
Framework | pytorch |
Handheld Multi-Frame Super-Resolution
Title | Handheld Multi-Frame Super-Resolution |
Authors | Bartlomiej Wronski, Ignacio Garcia-Dorado, Manfred Ernst, Damien Kelly, Michael Krainin, Chia-Kai Liang, Marc Levoy, Peyman Milanfar |
Abstract | Compared to DSLR cameras, smartphone cameras have smaller sensors, which limits their spatial resolution; smaller apertures, which limits their light gathering ability; and smaller pixels, which reduces their signal-to noise ratio. The use of color filter arrays (CFAs) requires demosaicing, which further degrades resolution. In this paper, we supplant the use of traditional demosaicing in single-frame and burst photography pipelines with a multiframe super-resolution algorithm that creates a complete RGB image directly from a burst of CFA raw images. We harness natural hand tremor, typical in handheld photography, to acquire a burst of raw frames with small offsets. These frames are then aligned and merged to form a single image with red, green, and blue values at every pixel site. This approach, which includes no explicit demosaicing step, serves to both increase image resolution and boost signal to noise ratio. Our algorithm is robust to challenging scene conditions: local motion, occlusion, or scene changes. It runs at 100 milliseconds per 12-megapixel RAW input burst frame on mass-produced mobile phones. Specifically, the algorithm is the basis of the Super-Res Zoom feature, as well as the default merge method in Night Sight mode (whether zooming or not) on Google’s flagship phone. |
Tasks | Demosaicking, Multi-Frame Super-Resolution, Super-Resolution |
Published | 2019-05-08 |
URL | https://arxiv.org/abs/1905.03277v1 |
https://arxiv.org/pdf/1905.03277v1.pdf | |
PWC | https://paperswithcode.com/paper/190503277 |
Repo | https://github.com/JVision/Handheld-Multi-Frame-Super-Resolution |
Framework | none |
A Deep Learning System for Predicting Size and Fit in Fashion E-Commerce
Title | A Deep Learning System for Predicting Size and Fit in Fashion E-Commerce |
Authors | Abdul-Saboor Sheikh, Romain Guigoures, Evgenii Koriagin, Yuen King Ho, Reza Shirvany, Roland Vollgraf, Urs Bergmann |
Abstract | Personalized size and fit recommendations bear crucial significance for any fashion e-commerce platform. Predicting the correct fit drives customer satisfaction and benefits the business by reducing costs incurred due to size-related returns. Traditional collaborative filtering algorithms seek to model customer preferences based on their previous orders. A typical challenge for such methods stems from extreme sparsity of customer-article orders. To alleviate this problem, we propose a deep learning based content-collaborative methodology for personalized size and fit recommendation. Our proposed method can ingest arbitrary customer and article data and can model multiple individuals or intents behind a single account. The method optimizes a global set of parameters to learn population-level abstractions of size and fit relevant information from observed customer-article interactions. It further employs customer and article specific embedding variables to learn their properties. Together with learned entity embeddings, the method maps additional customer and article attributes into a latent space to derive personalized recommendations. Application of our method to two publicly available datasets demonstrate an improvement over the state-of-the-art published results. On two proprietary datasets, one containing fit feedback from fashion experts and the other involving customer purchases, we further outperform comparable methodologies, including a recent Bayesian approach for size recommendation. |
Tasks | Entity Embeddings |
Published | 2019-07-23 |
URL | https://arxiv.org/abs/1907.09844v1 |
https://arxiv.org/pdf/1907.09844v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-learning-system-for-predicting-size |
Repo | https://github.com/NeverInAsh/fit-recommendation |
Framework | tf |
Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval
Title | Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval |
Authors | Li Deng, Shuo Zhang, Krisztian Balog |
Abstract | Tables contain valuable knowledge in a structured form. We employ neural language modeling approaches to embed tabular data into vector spaces. Specifically, we consider different table elements, such caption, column headings, and cells, for training word and entity embeddings. These embeddings are then utilized in three particular table-related tasks, row population, column population, and table retrieval, by incorporating them into existing retrieval models as additional semantic similarity signals. Evaluation results show that table embeddings can significantly improve upon the performance of state-of-the-art baselines. |
Tasks | Entity Embeddings, Language Modelling, Semantic Similarity, Semantic Textual Similarity |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1906.00041v1 |
https://arxiv.org/pdf/1906.00041v1.pdf | |
PWC | https://paperswithcode.com/paper/190600041 |
Repo | https://github.com/iai-group/sigir2019-table2vec |
Framework | none |
Self-supervised GAN: Analysis and Improvement with Multi-class Minimax Game
Title | Self-supervised GAN: Analysis and Improvement with Multi-class Minimax Game |
Authors | Ngoc-Trung Tran, Viet-Hung Tran, Ngoc-Bao Nguyen, Linxiao Yang, Ngai-Man Cheung |
Abstract | Self-supervised (SS) learning is a powerful approach for representation learning using unlabeled data. Recently, it has been applied to Generative Adversarial Networks (GAN) training. Specifically, SS tasks were proposed to address the catastrophic forgetting issue in the GAN discriminator. In this work, we perform an in-depth analysis to understand how SS tasks interact with learning of generator. From the analysis, we identify issues of SS tasks which allow a severely mode-collapsed generator to excel the SS tasks. To address the issues, we propose new SS tasks based on a multi-class minimax game. The competition between our proposed SS tasks in the game encourages the generator to learn the data distribution and generate diverse samples. We provide both theoretical and empirical analysis to support that our proposed SS tasks have better convergence property. We conduct experiments to incorporate our proposed SS tasks into two different GAN baseline models. Our approach establishes state-of-the-art FID scores on CIFAR-10, CIFAR-100, STL-10, CelebA, Imagenet $32\times32$ and Stacked-MNIST datasets, outperforming existing works by considerable margins in some cases. Our unconditional GAN model approaches performance of conditional GAN without using labeled data. Our code: https://github.com/tntrung/msgan |
Tasks | Image Generation, Representation Learning |
Published | 2019-11-16 |
URL | https://arxiv.org/abs/1911.06997v2 |
https://arxiv.org/pdf/1911.06997v2.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-gan-analysis-and-improvement-1 |
Repo | https://github.com/tntrung/msgan |
Framework | tf |
Word Embeddings for Entity-annotated Texts
Title | Word Embeddings for Entity-annotated Texts |
Authors | Satya Almasian, Andreas Spitz, Michael Gertz |
Abstract | Learned vector representations of words are useful tools for many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizens. While it seems intuitive that annotating named entities in the training corpus should result in more intelligent word features for downstream tasks, performance issues arise when popular embedding approaches are naively applied to entity annotated corpora. Not only are the resulting entity embeddings less useful than expected, but one also finds that the performance of the non-entity word embeddings degrades in comparison to those trained on the raw, unannotated corpus. In this paper, we investigate approaches to jointly train word and entity embeddings on a large corpus with automatically annotated and linked entities. We discuss two distinct approaches to the generation of such embeddings, namely the training of state-of-the-art embeddings on raw-text and annotated versions of the corpus, as well as node embeddings of a co-occurrence graph representation of the annotated corpus. We compare the performance of annotated embeddings and classical word embeddings on a variety of word similarity, analogy, and clustering evaluation tasks, and investigate their performance in entity-specific tasks. Our findings show that it takes more than training popular word embedding models on an annotated corpus to create entity embeddings with acceptable performance on common test cases. Based on these results, we discuss how and when node embeddings of the co-occurrence graph representation of the text can restore the performance. |
Tasks | Entity Embeddings, Information Retrieval, Word Embeddings |
Published | 2019-02-06 |
URL | https://arxiv.org/abs/1902.02078v3 |
https://arxiv.org/pdf/1902.02078v3.pdf | |
PWC | https://paperswithcode.com/paper/word-embeddings-for-entity-annotated-texts |
Repo | https://github.com/satya77/Entity_Embedding |
Framework | none |
MultiVerse: Causal Reasoning using Importance Sampling in Probabilistic Programming
Title | MultiVerse: Causal Reasoning using Importance Sampling in Probabilistic Programming |
Authors | Yura Perov, Logan Graham, Kostis Gourgoulias, Jonathan G. Richens, Ciarán M. Lee, Adam Baker, Saurabh Johri |
Abstract | We elaborate on using importance sampling for causal reasoning, in particular for counterfactual inference. We show how this can be implemented natively in probabilistic programming. By considering the structure of the counterfactual query, one can significantly optimise the inference process. We also consider design choices to enable further optimisations. We introduce MultiVerse, a probabilistic programming prototype engine for approximate causal reasoning. We provide experimental results and compare with Pyro, an existing probabilistic programming framework with some of causal reasoning tools. |
Tasks | Counterfactual Inference, Probabilistic Programming |
Published | 2019-10-17 |
URL | https://arxiv.org/abs/1910.08091v2 |
https://arxiv.org/pdf/1910.08091v2.pdf | |
PWC | https://paperswithcode.com/paper/multiverse-causal-reasoning-using-importance |
Repo | https://github.com/babylonhealth/multiverse |
Framework | none |
Consensus Maximization Tree Search Revisited
Title | Consensus Maximization Tree Search Revisited |
Authors | Zhipeng Cai, Tat-Jun Chin, Vladlen Koltun |
Abstract | Consensus maximization is widely used for robust fitting in computer vision. However, solving it exactly, i.e., finding the globally optimal solution, is intractable. A* tree search, which has been shown to be fixed-parameter tractable, is one of the most efficient exact methods, though it is still limited to small inputs. We make two key contributions towards improving A* tree search. First, we show that the consensus maximization tree structure used previously actually contains paths that connect nodes at both adjacent and non-adjacent levels. Crucially, paths connecting non-adjacent levels are redundant for tree search, but they were not avoided previously. We propose a new acceleration strategy that avoids such redundant paths. In the second contribution, we show that the existing branch pruning technique also deteriorates quickly with the problem dimension. We then propose a new branch pruning technique that is less dimension-sensitive to address this issue. Experiments show that both new techniques can significantly accelerate A* tree search, making it reasonably efficient on inputs that were previously out of reach. |
Tasks | |
Published | 2019-08-06 |
URL | https://arxiv.org/abs/1908.02021v3 |
https://arxiv.org/pdf/1908.02021v3.pdf | |
PWC | https://paperswithcode.com/paper/consensus-maximization-tree-search-revisited |
Repo | https://github.com/ZhipengCai/MaxConTreeSearch |
Framework | none |
DeepClean – self-supervised artefact rejection for intensive care waveform data using deep generative learning
Title | DeepClean – self-supervised artefact rejection for intensive care waveform data using deep generative learning |
Authors | Tom Edinburgh, Peter Smielewski, Marek Czosnyka, Stephen J. Eglen, Ari Ercole |
Abstract | Waveform physiological data is important in the treatment of critically ill patients in the intensive care unit. Such recordings are susceptible to artefacts, which must be removed before the data can be re-used for alerting or reprocessed for other clinical or research purposes. Accurate removal of artefacts reduces bias and uncertainty in clinical assessment, as well as the false positive rate of intensive care unit alarms, and is therefore a key component in providing optimal clinical care. In this work, we present DeepClean; a prototype self-supervised artefact detection system using a convolutional variational autoencoder deep neural network that avoids costly and painstaking manual annotation, requiring only easily-obtained ‘good’ data for training. For a test case with invasive arterial blood pressure, we demonstrate that our algorithm can detect the presence of an artefact within a 10-second sample of data with sensitivity and specificity around 90%. Furthermore, DeepClean was able to identify regions of artefact within such samples with high accuracy and we show that it significantly outperforms a baseline principle component analysis approach in both signal reconstruction and artefact detection. DeepClean learns a generative model and therefore may also be used for imputation of missing data. |
Tasks | Imputation |
Published | 2019-08-08 |
URL | https://arxiv.org/abs/1908.03129v4 |
https://arxiv.org/pdf/1908.03129v4.pdf | |
PWC | https://paperswithcode.com/paper/deepclean-self-supervised-artefact-rejection |
Repo | https://github.com/tedinburgh/deepclean |
Framework | none |
Depth Growing for Neural Machine Translation
Title | Depth Growing for Neural Machine Translation |
Authors | Lijun Wu, Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin, Jianhuang Lai, Tie-Yan Liu |
Abstract | While very deep neural networks have shown effectiveness for computer vision and text classification applications, how to increase the network depth of neural machine translation (NMT) models for better translation quality remains a challenging problem. Directly stacking more blocks to the NMT model results in no improvement and even reduces performance. In this work, we propose an effective two-stage approach with three specially designed components to construct deeper NMT models, which result in significant improvements over the strong Transformer baselines on WMT$14$ English$\to$German and English$\to$French translation tasks\footnote{Our code is available at \url{https://github.com/apeterswu/Depth_Growing_NMT}}. |
Tasks | Machine Translation, Text Classification |
Published | 2019-07-03 |
URL | https://arxiv.org/abs/1907.01968v1 |
https://arxiv.org/pdf/1907.01968v1.pdf | |
PWC | https://paperswithcode.com/paper/depth-growing-for-neural-machine-translation |
Repo | https://github.com/apeterswu/Depth_Growing_NMT |
Framework | pytorch |
A Neural Approach to Irony Generation
Title | A Neural Approach to Irony Generation |
Authors | Mengdi Zhu, Zhiwei Yu, Xiaojun Wan |
Abstract | Ironies can not only express stronger emotions but also show a sense of humor. With the development of social media, ironies are widely used in public. Although many prior research studies have been conducted in irony detection, few studies focus on irony generation. The main challenges for irony generation are the lack of large-scale irony dataset and difficulties in modeling the ironic pattern. In this work, we first systematically define irony generation based on style transfer task. To address the lack of data, we make use of twitter and build a large-scale dataset. We also design a combination of rewards for reinforcement learning to control the generation of ironic sentences. Experimental results demonstrate the effectiveness of our model in terms of irony accuracy, sentiment preservation, and content preservation. |
Tasks | Style Transfer |
Published | 2019-09-13 |
URL | https://arxiv.org/abs/1909.06200v2 |
https://arxiv.org/pdf/1909.06200v2.pdf | |
PWC | https://paperswithcode.com/paper/a-neural-approach-to-irony-generation |
Repo | https://github.com/zmd971202/IronyGeneration |
Framework | pytorch |
Modeling the Gaia Color-Magnitude Diagram with Bayesian Neural Flows to Constrain Distance Estimates
Title | Modeling the Gaia Color-Magnitude Diagram with Bayesian Neural Flows to Constrain Distance Estimates |
Authors | Miles D. Cranmer, Richard Galvez, Lauren Anderson, David N. Spergel, Shirley Ho |
Abstract | We demonstrate an algorithm for learning a flexible color-magnitude diagram from noisy parallax and photometry measurements using a normalizing flow, a deep neural network capable of learning an arbitrary multi-dimensional probability distribution. We present a catalog of 640M photometric distance posteriors to nearby stars derived from this data-driven model using Gaia DR2 photometry and parallaxes. Dust estimation and dereddening is done iteratively inside the model and without prior distance information, using the Bayestar map. The signal-to-noise (precision) of distance measurements improves on average by more than 48% over the raw Gaia data, and we also demonstrate how the accuracy of distances have improved over other models, especially in the noisy-parallax regime. Applications are discussed, including significantly improved Milky Way disk separation and substructure detection. We conclude with a discussion of future work, which exploits the normalizing flow architecture to allow us to exactly marginalize over missing photometry, enabling the inclusion of many surveys without losing coverage. |
Tasks | |
Published | 2019-08-21 |
URL | https://arxiv.org/abs/1908.08045v1 |
https://arxiv.org/pdf/1908.08045v1.pdf | |
PWC | https://paperswithcode.com/paper/modeling-the-gaia-color-magnitude-diagram |
Repo | https://github.com/MilesCranmer/public_CMD_normalizing_flow |
Framework | none |