January 25, 2020

2949 words 14 mins read

Paper Group NAWR 34

Paper Group NAWR 34

SISUA: Semi-Supervised Generative Autoencoder for Single Cell Data. Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure. FaceForensics++: Learning to Detect Manipulated Facial Images. Bayesian Adaptive Superpixel Segmentation. Direct Optimization through \arg \max for Discrete Variational Auto-Encoder. Deep Metric Lea …

SISUA: Semi-Supervised Generative Autoencoder for Single Cell Data

Title SISUA: Semi-Supervised Generative Autoencoder for Single Cell Data
Authors Trung Ngo Trong, Roger Kramer, Juha Mehtonen, Gerardo González, Ville Hautamäki, Merja Heinäniemi
Abstract Single-cell transcriptomics offers a tool to study the diversity of cell phenotypes through snapshots of the abundance of mRNA in individual cells. Often there is additional information available besides the single cell gene expression counts, such as bulk transcriptome data from the same tissue, or quantification of surface protein levels from the same cells. In this study, we propose models based on the Bayesian generative approach, where protein quantification available as CITE-seq counts from the same cells are used to constrain the learning process, thus forming a semi-supervised model. The generative model is based on the deep variational autoencoder (VAE) neural network architecture.
Tasks Single-cell modeling
Published 2019-05-08
URL https://www.biorxiv.org/content/10.1101/631382v1
PDF https://www.biorxiv.org/content/biorxiv/early/2019/05/08/631382.full-text.pdf
PWC https://paperswithcode.com/paper/sisua-semi-supervised-generative-autoencoder
Repo https://github.com/trungnt13/sisua
Framework none

Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure

Title Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure
Authors Karan Goel, Emma Brunskill
Abstract Clustering methods and latent variable models are often used as tools for pattern mining and discovery of latent structure in time-series data. In this work, we consider the problem of learning procedural abstractions from possibly high-dimensional observational sequences, such as video demonstrations. Given a dataset of time-series, the goal is to identify the latent sequence of steps common to them and label each time-series with the temporal extent of these procedural steps. We introduce a hierarchical Bayesian model called Prism that models the realization of a common procedure across multiple time-series, and can recover procedural abstractions with supervision. We also bring to light two characteristics ignored by traditional evaluation criteria when evaluating latent temporal labelings (temporal clusterings) – segment structure, and repeated structure – and develop new metrics tailored to their evaluation. We demonstrate that our metrics improve interpretability and ease of analysis for evaluation on benchmark time-series datasets. Results on benchmark and video datasets indicate that Prism outperforms standard sequence models as well as state-of-the-art techniques in identifying procedural abstractions.
Tasks Latent Variable Models, Time Series
Published 2019-05-01
URL https://openreview.net/forum?id=ByleB2CcKm
PDF https://openreview.net/pdf?id=ByleB2CcKm
PWC https://paperswithcode.com/paper/learning-procedural-abstractions-and
Repo https://github.com/StanfordAI4HI/ICLR2019_evaluating_discrete_temporal_structure
Framework none

FaceForensics++: Learning to Detect Manipulated Facial Images

Title FaceForensics++: Learning to Detect Manipulated Facial Images
Authors Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, Matthias Niessner
Abstract The rapid progress in synthetic image generation and manipulation has now come to a point where it raises significant concerns for the implications towards society. At best, this leads to a loss of trust in digital content, but could potentially cause further harm by spreading false information or fake news. This paper examines the realism of state-of-the-art image manipulations, and how difficult it is to detect them, either automatically or by humans. To standardize the evaluation of detection methods, we propose an automated benchmark for facial manipulation detection. In particular, the benchmark is based on Deep-Fakes, Face2Face, FaceSwap and NeuralTextures as prominent representatives for facial manipulations at random compression level and size. The benchmark is publicly available and contains a hidden test set as well as a database of over 1.8 million manipulated images. This dataset is over an order of magnitude larger than comparable, publicly available, forgery datasets. Based on this data, we performed a thorough analysis of data-driven forgery detectors. We show that the use of additional domain-specific knowledge improves forgery detection to unprecedented accuracy, even in the presence of strong compression, and clearly outperforms human observers.
Tasks Face Swapping, Image Generation
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Rossler_FaceForensics_Learning_to_Detect_Manipulated_Facial_Images_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Rossler_FaceForensics_Learning_to_Detect_Manipulated_Facial_Images_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/faceforensics-learning-to-detect-manipulated-1
Repo https://github.com/ondyari/FaceForensics
Framework none

Bayesian Adaptive Superpixel Segmentation

Title Bayesian Adaptive Superpixel Segmentation
Authors Roy Uziel, Meitar Ronen, Oren Freifeld
Abstract Superpixels provide a useful intermediate image representation. Existing superpixel methods, however, suffer from at least some of the following drawbacks: 1) topology is handled heuristically; 2) the number of superpixels is either predefined or estimated at a prohibitive cost; 3) lack of adaptiveness. As a remedy, we propose a novel probabilistic model, self-coined Bayesian Adaptive Superpixel Segmentation (BASS), together with an efficient inference. BASS is a Bayesian nonparametric mixture model that also respects topology and favors spatial coherence. The optimizationbased and topology-aware inference is parallelizable and implemented in GPU. Quantitatively, BASS achieves results that are either better than the state-of-the-art or close to it, depending on the performance index and/or dataset. Qualitatively, we argue it achieves the best results; we demonstrate this by not only subjective visual inspection but also objective quantitative performance evaluation of the downstream application of face detection. Our code is available at https://github.com/uzielroy/BASS.
Tasks Face Detection
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Uziel_Bayesian_Adaptive_Superpixel_Segmentation_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Uziel_Bayesian_Adaptive_Superpixel_Segmentation_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/bayesian-adaptive-superpixel-segmentation
Repo https://github.com/uzielroy/BASS
Framework none

Direct Optimization through \arg \max for Discrete Variational Auto-Encoder

Title Direct Optimization through \arg \max for Discrete Variational Auto-Encoder
Authors Guy Lorberbom, Tommi Jaakkola, Andreea Gane, Tamir Hazan
Abstract Reparameterization of variational auto-encoders with continuous random variables is an effective method for reducing the variance of their gradient estimates. In the discrete case, one can perform reparametrization using the Gumbel-Max trick, but the resulting objective relies on an $\arg \max$ operation and is non-differentiable. In contrast to previous works which resort to \emph{softmax}-based relaxations, we propose to optimize it directly by applying the \emph{direct loss minimization} approach. Our proposal extends naturally to structured discrete latent variable models when evaluating the $\arg \max$ operation is tractable. We demonstrate empirically the effectiveness of the direct loss minimization technique in variational autoencoders with both unstructured and structured discrete latent variables.
Tasks Latent Variable Models
Published 2019-12-01
URL http://papers.nips.cc/paper/8851-direct-optimization-through-arg-max-for-discrete-variational-auto-encoder
PDF http://papers.nips.cc/paper/8851-direct-optimization-through-arg-max-for-discrete-variational-auto-encoder.pdf
PWC https://paperswithcode.com/paper/direct-optimization-through-arg-max-for-2
Repo https://github.com/GuyLor/direct_vae
Framework pytorch

Deep Metric Learning to Rank

Title Deep Metric Learning to Rank
Authors Fatih Cakir, Kun He, Xide Xia, Brian Kulis, Stan Sclaroff
Abstract We propose a novel deep metric learning method by revisiting the learning to rank approach. Our method, named FastAP, optimizes the rank-based Average Precision measure, using an approximation derived from distance quantization. FastAP has a low complexity compared to existing methods, and is tailored for stochastic gradient descent. To fully exploit the benefits of the ranking formulation, we also propose a new minibatch sampling scheme, as well as a simple heuristic to enable large-batch training. On three few-shot image retrieval datasets, FastAP consistently outperforms competing methods, which often involve complex optimization heuristics or costly model ensembles.
Tasks Image Retrieval, Learning-To-Rank, Metric Learning, Quantization
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Cakir_Deep_Metric_Learning_to_Rank_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Cakir_Deep_Metric_Learning_to_Rank_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/deep-metric-learning-to-rank
Repo https://github.com/kunhe/FastAP-metric-learning
Framework pytorch

Joint Optimization of Cascade Ranking Models

Title Joint Optimization of Cascade Ranking Models
Authors Luke Gallagher, Ruey-Chen Chen, Roi Blanco, J. Shane Culpepper
Abstract Reducing excessive costs in feature acquisition and model evaluation has been a long-standing challenge in learning-to-rank systems. A cascaded ranking architecture turns ranking into a pipeline of multiple stages, and has been shown to be a powerful approach to balancing efficiency and effectiveness trade-offs in large-scale search systems. However, learning a cascade model is often complex, and usually performed stagewise independently across the entire ranking pipeline. In this work we show that learning a cascade ranking model in this manner is often suboptimal in terms of both effectiveness and efficiency. We present a new general framework for learning an end-to-end cascade of rankers using backpropagation. We show that stagewise objectives can be chained together and optimized jointly to achieve significantly better trade-offs globally. This novel approach is generalizable to not only differentiable models but also state-of-the-art tree-based algorithms such as LambdaMART and cost-efficient gradient boosted trees, and it opens up new opportunities for exploring additional efficiency-effectiveness trade-offs in large-scale search systems.
Tasks Ad-Hoc Information Retrieval, Document Ranking, Information Retrieval, Learning-To-Rank
Published 2019-02-11
URL https://dl.acm.org/citation.cfm?id=3290986
PDF http://culpepper.io/publications/gcbc19-wsdm.pdf
PWC https://paperswithcode.com/paper/joint-optimization-of-cascade-ranking-models
Repo https://github.com/rmit-ir/joint-cascade-ranking
Framework none

KnowledgeNet: A Benchmark Dataset for Knowledge Base Population

Title KnowledgeNet: A Benchmark Dataset for Knowledge Base Population
Authors Filipe Mesquita, Matteo Cannaviccio, Jordan Schmidek, Paramita Mirza, Denilson Barbosa
Abstract KnowledgeNet is a benchmark dataset for the task of automatically populating a knowledge base (Wikidata) with facts expressed in natural language text on the web. KnowledgeNet provides text exhaustively annotated with facts, thus enabling the holistic end-to-end evaluation of knowledge base population systems as a whole, unlike previous benchmarks that are more suitable for the evaluation of individual subcomponents (e.g., entity linking, relation extraction). We discuss five baseline approaches, where the best approach achieves an F1 score of 0.50, significantly outperforming a traditional approach by 79{%} (0.28). However, our best baseline is far from reaching human performance (0.82), indicating our dataset is challenging. The KnowledgeNet dataset and baselines are available at https://github.com/diffbot/knowledge-net
Tasks Entity Linking, Knowledge Base Population, Relation Extraction
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-1069/
PDF https://www.aclweb.org/anthology/D19-1069
PWC https://paperswithcode.com/paper/knowledgenet-a-benchmark-dataset-for
Repo https://github.com/diffbot/knowledge-net
Framework none

Old is Gold: Linguistic Driven Approach for Entity and Relation Linking of Short Text

Title Old is Gold: Linguistic Driven Approach for Entity and Relation Linking of Short Text
Authors Ahmad Sakor, On, Isaiah o Mulang{'}, Kuldeep Singh, Saeedeh Shekarpour, Maria Esther Vidal, Jens Lehmann, S{"o}ren Auer
Abstract Short texts challenge NLP tasks such as named entity recognition, disambiguation, linking and relation inference because they do not provide sufficient context or are partially malformed (e.g. wrt. capitalization, long tail entities, implicit relations). In this work, we present the Falcon approach which effectively maps entities and relations within a short text to its mentions of a background knowledge graph. Falcon overcomes the challenges of short text using a light-weight linguistic approach relying on a background knowledge graph. Falcon performs joint entity and relation linking of a short text by leveraging several fundamental principles of English morphology (e.g. compounding, headword identification) and utilizes an extended knowledge graph created by merging entities and relations from various knowledge sources. It uses the context of entities for finding relations and does not require training data. Our empirical study using several standard benchmarks and datasets show that Falcon significantly outperforms state-of-the-art entity and relation linking for short text query inventories.
Tasks Entity Linking, Named Entity Recognition
Published 2019-06-01
URL https://www.aclweb.org/anthology/N19-1243/
PDF https://www.aclweb.org/anthology/N19-1243
PWC https://paperswithcode.com/paper/old-is-gold-linguistic-driven-approach-for
Repo https://github.com/AhmadSakor/falcon
Framework none

SIXray: A Large-Scale Security Inspection X-Ray Benchmark for Prohibited Item Discovery in Overlapping Images

Title SIXray: A Large-Scale Security Inspection X-Ray Benchmark for Prohibited Item Discovery in Overlapping Images
Authors Caijing Miao, Lingxi Xie, Fang Wan, Chi Su, Hongye Liu, Jianbin Jiao, Qixiang Ye
Abstract In this paper, we present a large-scale dataset and establish a baseline for prohibited item discovery in Security Inspection X-ray images. Our dataset, named SIXray, consists of 1,059,231 X-ray images, in which 6 classes of 8,929 prohibited items are manually annotated. It raises a brand new challenge of overlapping image data, meanwhile shares the same properties with existing datasets, including complex yet meaningless contexts and class imbalance. We propose an approach named class-balanced hierarchical refinement (CHR) to deal with these difficulties. CHR assumes that each input image is sampled from a mixture distribution, and that deep networks require an iterative process to infer image contents accurately. To accelerate, we insert reversed connections to different network backbones, delivering high-level visual cues to assist mid-level features. In addition, a class-balanced loss function is designed to maximally alleviate the noise introduced by easy negative samples. We evaluate CHR on SIXray with different ratios of positive/negative samples. Compared to the baselines, CHR enjoys a better ability of discriminating objects especially using mid-level features, which offers the possibility of using a weakly-supervised approach towards accurate object localization. In particular, the advantage of CHR is more significant in the scenarios with fewer positive training samples, which demonstrates its potential application in real-world security inspection.
Tasks Object Localization
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Miao_SIXray_A_Large-Scale_Security_Inspection_X-Ray_Benchmark_for_Prohibited_Item_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Miao_SIXray_A_Large-Scale_Security_Inspection_X-Ray_Benchmark_for_Prohibited_Item_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/sixray-a-large-scale-security-inspection-x-1
Repo https://github.com/MeioJane/SIXray
Framework none

Learning Unsupervised Video Object Segmentation Through Visual Attention

Title Learning Unsupervised Video Object Segmentation Through Visual Attention
Authors Wenguan Wang, Hongmei Song, Shuyang Zhao, Jianbing Shen, Sanyuan Zhao, Steven C. H. Hoi, Haibin Ling
Abstract This paper conducts a systematic study on the role of visual attention in Unsupervised Video Object Segmentation (UVOS) tasks. By elaborately annotating three popular video segmentation datasets (DAVIS, Youtube-Objects and SegTrack V2) with dynamic eye-tracking data in the UVOS setting, for the first time, we quantitatively verified the high consistency of visual attention behavior among human observers, and found strong correlation between human attention and explicit primary object judgements during dynamic, task-driven viewing. Such novel observations provide an in-depth insight into the underlying rationale behind UVOS. Inspired by these findings, we decouple UVOS into two sub-tasks: UVOS-driven Dynamic Visual Attention Prediction (DVAP) in spatiotemporal domain, and Attention-Guided Object Segmentation (AGOS) in spatial domain. Our UVOS solution enjoys three major merits: 1) modular training without using expensive video segmentation annotations, instead, using more affordable dynamic fixation data to train the initial video attention module and using existing fixation-segmentation paired static/image data to train the subsequent segmentation module; 2) comprehensive foreground understanding through multi-source learning; and 3) additional interpretability from the biologically-inspired and assessable attention. Experiments on popular benchmarks show that, even without using expensive video object mask annotations, our model achieves compelling performance in comparison with state-of-the-arts.
Tasks Eye Tracking, Semantic Segmentation, Unsupervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Wang_Learning_Unsupervised_Video_Object_Segmentation_Through_Visual_Attention_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Learning_Unsupervised_Video_Object_Segmentation_Through_Visual_Attention_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/learning-unsupervised-video-object
Repo https://github.com/wenguanwang/AGS
Framework caffe2

Auxiliary Variational MCMC

Title Auxiliary Variational MCMC
Authors Raza Habib, David Barber
Abstract We introduce Auxiliary Variational MCMC, a novel framework for learning MCMC kernels that combines recent advances in variational inference with insights drawn from traditional auxiliary variable MCMC methods such as Hamiltonian Monte Carlo. Our framework exploits low dimensional structure in the target distribution in order to learn a more efficient MCMC sampler. The resulting sampler is able to suppress random walk behaviour and mix between modes efficiently, without the need to compute gradients of the target distribution. We test our sampler on a number of challenging distributions, where the underlying structure is known, and on the task of posterior sampling in Bayesian logistic regression. Code to reproduce all experiments is available at https://github.com/AVMCMC/AuxiliaryVariationalMCMC .
Tasks
Published 2019-05-01
URL https://openreview.net/forum?id=r1NJqsRctX
PDF https://openreview.net/pdf?id=r1NJqsRctX
PWC https://paperswithcode.com/paper/auxiliary-variational-mcmc
Repo https://github.com/AVMCMC/AuxiliaryVariationalMCMC
Framework tf

Sparse Bayesian approach for metric learning in latent space

Title Sparse Bayesian approach for metric learning in latent space
Authors Davood Zabihzadeh, Reza Monsefi, Hadi Sadoghi Yazdi
Abstract This paper presents a new and efficient approach for metric learning in latent space. Our method discovers an optimal mapping from the feature space to a latent space that shrinks the distance between similar data items and also increases the distance between dissimilar ones. The proposed approach is based on a Bayesian variational framework which iteratively finds the optimal posterior distribution of parameters and hyperparameters of the model. Advantages of the proposed method to similar work are 1) Learning the noise of the latent variables on the low-dimensional manifold to find a more effective transformation. 2) Automatically finding the dimension of latent space and sparsification of the solution which prevents the overfitting problem. 3) Unlike Mahalanobis metric learning, the proposed algorithm roughly scales linearly to the dimension of data. Also, the present work is extended for learning in the feature space induced by an RKHS kernel. The proposed method is evaluated on small and large datasets coming from real applications such as network intrusion detection, face recognition, handwritten digits, letter recognition, and hyperspectral image classification. The results show that our method outperforms related representative and state-of-the-art methods in many small and large datasets.
Tasks Face Recognition, Hyperspectral Image Classification, Image Classification, Intrusion Detection, Metric Learning, Network Intrusion Detection
Published 2019-08-15
URL https://www.sciencedirect.com/science/article/abs/pii/S0950705119301741
PDF https://www.sciencedirect.com/science/article/abs/pii/S0950705119301741
PWC https://paperswithcode.com/paper/sparse-bayesian-approach-for-metric-learning
Repo https://github.com/GT-Davood/SBML
Framework none

UralicNLP: An NLP Library for Uralic Languages

Title UralicNLP: An NLP Library for Uralic Languages
Authors Mika Hämäläinen
Abstract UralicNLP is a natural language processing library for small Uralic languages. It can produce morphological analysis, generate morphological forms, lemmatize words and give lexical information about words in Uralic languages. At the time of writing, the following languages are supported: Skolt Sami, Ingrian, Meadow & Eastern Mari, Votic, Olonets-Karelian, Erzya, Moksha, Hill Mari, Udmurt, Tundra Nenets, Komi-Permyak and Finnish. This information originates from FST tools and dictionaries developed in the Giellatekno infrastructure. Currently, UralicNLP uses the nightly builds for languages supported by Apertium and less frequently updated FSTs and CGs for the other languages.
Tasks Morphological Analysis
Published 2019-05-09
URL https://www.theoj.org/joss-papers/joss.01345/10.21105.joss.01345.pdf
PDF https://www.theoj.org/joss-papers/joss.01345/10.21105.joss.01345.pdf
PWC https://paperswithcode.com/paper/uralicnlp-an-nlp-library-for-uralic-languages
Repo https://github.com/mikahama/uralicNLP
Framework none

DATA: Differentiable ArchiTecture Approximation

Title DATA: Differentiable ArchiTecture Approximation
Authors Jianlong Chang, Xinbang Zhang, Yiwen Guo, Gaofeng Meng, Shiming Xiang, Chunhong Pan
Abstract Neural architecture search (NAS) is inherently subject to the gap of architectures during searching and validating. To bridge this gap, we develop Differentiable ArchiTecture Approximation (DATA) with an Ensemble Gumbel-Softmax (EGS) estimator to automatically approximate architectures during searching and validating in a differentiable manner. Technically, the EGS estimator consists of a group of Gumbel-Softmax estimators, which is capable of converting probability vectors to binary codes and passing gradients from binary codes to probability vectors. Benefiting from such modeling, in searching, architecture parameters and network weights in the NAS model can be jointly optimized with the standard back-propagation, yielding an end-to-end learning mechanism for searching deep models in a large enough search space. Conclusively, during validating, a high-performance architecture that approaches to the learned one during searching is readily built. Extensive experiments on a variety of popular datasets strongly evidence that our method is capable of discovering high-performance architectures for image classification, language modeling and semantic segmentation, while guaranteeing the requisite efficiency during searching.
Tasks Image Classification, Language Modelling, Neural Architecture Search, Semantic Segmentation
Published 2019-12-01
URL http://papers.nips.cc/paper/8374-data-differentiable-architecture-approximation
PDF http://papers.nips.cc/paper/8374-data-differentiable-architecture-approximation.pdf
PWC https://paperswithcode.com/paper/data-differentiable-architecture
Repo https://github.com/XinbangZhang/DATA-NAS
Framework none
comments powered by Disqus