January 25, 2020

2949 words 14 mins read

Paper Group NAWR 34

SISUA: Semi-Supervised Generative Autoencoder for Single Cell Data. Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure. FaceForensics++: Learning to Detect Manipulated Facial Images. Bayesian Adaptive Superpixel Segmentation. Direct Optimization through \arg \max for Discrete Variational Auto-Encoder. Deep Metric Lea …

SISUA: Semi-Supervised Generative Autoencoder for Single Cell Data


Title	SISUA: Semi-Supervised Generative Autoencoder for Single Cell Data
Authors	Trung Ngo Trong, Roger Kramer, Juha Mehtonen, Gerardo González, Ville Hautamäki, Merja Heinäniemi
Abstract	Single-cell transcriptomics offers a tool to study the diversity of cell phenotypes through snapshots of the abundance of mRNA in individual cells. Often there is additional information available besides the single cell gene expression counts, such as bulk transcriptome data from the same tissue, or quantification of surface protein levels from the same cells. In this study, we propose models based on the Bayesian generative approach, where protein quantification available as CITE-seq counts from the same cells are used to constrain the learning process, thus forming a semi-supervised model. The generative model is based on the deep variational autoencoder (VAE) neural network architecture.
Tasks	Single-cell modeling
Published	2019-05-08
URL	https://www.biorxiv.org/content/10.1101/631382v1
PDF	https://www.biorxiv.org/content/biorxiv/early/2019/05/08/631382.full-text.pdf
PWC	https://paperswithcode.com/paper/sisua-semi-supervised-generative-autoencoder
Repo	https://github.com/trungnt13/sisua
Framework	none

Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure


Title	Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure
Authors	Karan Goel, Emma Brunskill
Abstract	Clustering methods and latent variable models are often used as tools for pattern mining and discovery of latent structure in time-series data. In this work, we consider the problem of learning procedural abstractions from possibly high-dimensional observational sequences, such as video demonstrations. Given a dataset of time-series, the goal is to identify the latent sequence of steps common to them and label each time-series with the temporal extent of these procedural steps. We introduce a hierarchical Bayesian model called Prism that models the realization of a common procedure across multiple time-series, and can recover procedural abstractions with supervision. We also bring to light two characteristics ignored by traditional evaluation criteria when evaluating latent temporal labelings (temporal clusterings) – segment structure, and repeated structure – and develop new metrics tailored to their evaluation. We demonstrate that our metrics improve interpretability and ease of analysis for evaluation on benchmark time-series datasets. Results on benchmark and video datasets indicate that Prism outperforms standard sequence models as well as state-of-the-art techniques in identifying procedural abstractions.
Tasks	Latent Variable Models, Time Series
Published	2019-05-01
URL	https://openreview.net/forum?id=ByleB2CcKm
PDF	https://openreview.net/pdf?id=ByleB2CcKm
PWC	https://paperswithcode.com/paper/learning-procedural-abstractions-and
Repo	https://github.com/StanfordAI4HI/ICLR2019_evaluating_discrete_temporal_structure
Framework	none

FaceForensics++: Learning to Detect Manipulated Facial Images


Title	FaceForensics++: Learning to Detect Manipulated Facial Images
Authors	Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, Matthias Niessner
Abstract	The rapid progress in synthetic image generation and manipulation has now come to a point where it raises significant concerns for the implications towards society. At best, this leads to a loss of trust in digital content, but could potentially cause further harm by spreading false information or fake news. This paper examines the realism of state-of-the-art image manipulations, and how difficult it is to detect them, either automatically or by humans. To standardize the evaluation of detection methods, we propose an automated benchmark for facial manipulation detection. In particular, the benchmark is based on Deep-Fakes, Face2Face, FaceSwap and NeuralTextures as prominent representatives for facial manipulations at random compression level and size. The benchmark is publicly available and contains a hidden test set as well as a database of over 1.8 million manipulated images. This dataset is over an order of magnitude larger than comparable, publicly available, forgery datasets. Based on this data, we performed a thorough analysis of data-driven forgery detectors. We show that the use of additional domain-specific knowledge improves forgery detection to unprecedented accuracy, even in the presence of strong compression, and clearly outperforms human observers.
Tasks	Face Swapping, Image Generation
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Rossler_FaceForensics_Learning_to_Detect_Manipulated_Facial_Images_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Rossler_FaceForensics_Learning_to_Detect_Manipulated_Facial_Images_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/faceforensics-learning-to-detect-manipulated-1
Repo	https://github.com/ondyari/FaceForensics
Framework	none

Bayesian Adaptive Superpixel Segmentation


Title	Bayesian Adaptive Superpixel Segmentation
Authors	Roy Uziel, Meitar Ronen, Oren Freifeld
Abstract	Superpixels provide a useful intermediate image representation. Existing superpixel methods, however, suffer from at least some of the following drawbacks: 1) topology is handled heuristically; 2) the number of superpixels is either predefined or estimated at a prohibitive cost; 3) lack of adaptiveness. As a remedy, we propose a novel probabilistic model, self-coined Bayesian Adaptive Superpixel Segmentation (BASS), together with an efficient inference. BASS is a Bayesian nonparametric mixture model that also respects topology and favors spatial coherence. The optimizationbased and topology-aware inference is parallelizable and implemented in GPU. Quantitatively, BASS achieves results that are either better than the state-of-the-art or close to it, depending on the performance index and/or dataset. Qualitatively, we argue it achieves the best results; we demonstrate this by not only subjective visual inspection but also objective quantitative performance evaluation of the downstream application of face detection. Our code is available at https://github.com/uzielroy/BASS.
Tasks	Face Detection
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Uziel_Bayesian_Adaptive_Superpixel_Segmentation_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Uziel_Bayesian_Adaptive_Superpixel_Segmentation_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/bayesian-adaptive-superpixel-segmentation
Repo	https://github.com/uzielroy/BASS
Framework	none

Direct Optimization through \arg \max for Discrete Variational Auto-Encoder


Title	Direct Optimization through \arg \max for Discrete Variational Auto-Encoder
Authors	Guy Lorberbom, Tommi Jaakkola, Andreea Gane, Tamir Hazan
Abstract	Reparameterization of variational auto-encoders with continuous random variables is an effective method for reducing the variance of their gradient estimates. In the discrete case, one can perform reparametrization using the Gumbel-Max trick, but the resulting objective relies on an $\arg \max$ operation and is non-differentiable. In contrast to previous works which resort to \emph{softmax}-based relaxations, we propose to optimize it directly by applying the \emph{direct loss minimization} approach. Our proposal extends naturally to structured discrete latent variable models when evaluating the $\arg \max$ operation is tractable. We demonstrate empirically the effectiveness of the direct loss minimization technique in variational autoencoders with both unstructured and structured discrete latent variables.
Tasks	Latent Variable Models
Published	2019-12-01
URL	http://papers.nips.cc/paper/8851-direct-optimization-through-arg-max-for-discrete-variational-auto-encoder
PDF	http://papers.nips.cc/paper/8851-direct-optimization-through-arg-max-for-discrete-variational-auto-encoder.pdf
PWC	https://paperswithcode.com/paper/direct-optimization-through-arg-max-for-2
Repo	https://github.com/GuyLor/direct_vae
Framework	pytorch

Deep Metric Learning to Rank


Title	Deep Metric Learning to Rank
Authors	Fatih Cakir, Kun He, Xide Xia, Brian Kulis, Stan Sclaroff
Abstract	We propose a novel deep metric learning method by revisiting the learning to rank approach. Our method, named FastAP, optimizes the rank-based Average Precision measure, using an approximation derived from distance quantization. FastAP has a low complexity compared to existing methods, and is tailored for stochastic gradient descent. To fully exploit the benefits of the ranking formulation, we also propose a new minibatch sampling scheme, as well as a simple heuristic to enable large-batch training. On three few-shot image retrieval datasets, FastAP consistently outperforms competing methods, which often involve complex optimization heuristics or costly model ensembles.
Tasks	Image Retrieval, Learning-To-Rank, Metric Learning, Quantization
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Cakir_Deep_Metric_Learning_to_Rank_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Cakir_Deep_Metric_Learning_to_Rank_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/deep-metric-learning-to-rank
Repo	https://github.com/kunhe/FastAP-metric-learning
Framework	pytorch

Joint Optimization of Cascade Ranking Models


Title	Joint Optimization of Cascade Ranking Models
Authors	Luke Gallagher, Ruey-Chen Chen, Roi Blanco, J. Shane Culpepper
Abstract	Reducing excessive costs in feature acquisition and model evaluation has been a long-standing challenge in learning-to-rank systems. A cascaded ranking architecture turns ranking into a pipeline of multiple stages, and has been shown to be a powerful approach to balancing efficiency and effectiveness trade-offs in large-scale search systems. However, learning a cascade model is often complex, and usually performed stagewise independently across the entire ranking pipeline. In this work we show that learning a cascade ranking model in this manner is often suboptimal in terms of both effectiveness and efficiency. We present a new general framework for learning an end-to-end cascade of rankers using backpropagation. We show that stagewise objectives can be chained together and optimized jointly to achieve significantly better trade-offs globally. This novel approach is generalizable to not only differentiable models but also state-of-the-art tree-based algorithms such as LambdaMART and cost-efficient gradient boosted trees, and it opens up new opportunities for exploring additional efficiency-effectiveness trade-offs in large-scale search systems.
Tasks	Ad-Hoc Information Retrieval, Document Ranking, Information Retrieval, Learning-To-Rank
Published	2019-02-11
URL	https://dl.acm.org/citation.cfm?id=3290986
PDF	http://culpepper.io/publications/gcbc19-wsdm.pdf
PWC	https://paperswithcode.com/paper/joint-optimization-of-cascade-ranking-models
Repo	https://github.com/rmit-ir/joint-cascade-ranking
Framework	none

KnowledgeNet: A Benchmark Dataset for Knowledge Base Population


Title	KnowledgeNet: A Benchmark Dataset for Knowledge Base Population
Authors	Filipe Mesquita, Matteo Cannaviccio, Jordan Schmidek, Paramita Mirza, Denilson Barbosa
Abstract	KnowledgeNet is a benchmark dataset for the task of automatically populating a knowledge base (Wikidata) with facts expressed in natural language text on the web. KnowledgeNet provides text exhaustively annotated with facts, thus enabling the holistic end-to-end evaluation of knowledge base population systems as a whole, unlike previous benchmarks that are more suitable for the evaluation of individual subcomponents (e.g., entity linking, relation extraction). We discuss five baseline approaches, where the best approach achieves an F1 score of 0.50, significantly outperforming a traditional approach by 79{%} (0.28). However, our best baseline is far from reaching human performance (0.82), indicating our dataset is challenging. The KnowledgeNet dataset and baselines are available at https://github.com/diffbot/knowledge-net
Tasks	Entity Linking, Knowledge Base Population, Relation Extraction
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-1069/
PDF	https://www.aclweb.org/anthology/D19-1069
PWC	https://paperswithcode.com/paper/knowledgenet-a-benchmark-dataset-for
Repo	https://github.com/diffbot/knowledge-net
Framework	none

Old is Gold: Linguistic Driven Approach for Entity and Relation Linking of Short Text


Title	Old is Gold: Linguistic Driven Approach for Entity and Relation Linking of Short Text
Authors	Ahmad Sakor, On, Isaiah o Mulang{'}, Kuldeep Singh, Saeedeh Shekarpour, Maria Esther Vidal, Jens Lehmann, S{"o}ren Auer
Abstract	Short texts challenge NLP tasks such as named entity recognition, disambiguation, linking and relation inference because they do not provide sufficient context or are partially malformed (e.g. wrt. capitalization, long tail entities, implicit relations). In this work, we present the Falcon approach which effectively maps entities and relations within a short text to its mentions of a background knowledge graph. Falcon overcomes the challenges of short text using a light-weight linguistic approach relying on a background knowledge graph. Falcon performs joint entity and relation linking of a short text by leveraging several fundamental principles of English morphology (e.g. compounding, headword identification) and utilizes an extended knowledge graph created by merging entities and relations from various knowledge sources. It uses the context of entities for finding relations and does not require training data. Our empirical study using several standard benchmarks and datasets show that Falcon significantly outperforms state-of-the-art entity and relation linking for short text query inventories.
Tasks	Entity Linking, Named Entity Recognition
Published	2019-06-01
URL	https://www.aclweb.org/anthology/N19-1243/
PDF	https://www.aclweb.org/anthology/N19-1243
PWC	https://paperswithcode.com/paper/old-is-gold-linguistic-driven-approach-for
Repo	https://github.com/AhmadSakor/falcon
Framework	none

SIXray: A Large-Scale Security Inspection X-Ray Benchmark for Prohibited Item Discovery in Overlapping Images


Title	SIXray: A Large-Scale Security Inspection X-Ray Benchmark for Prohibited Item Discovery in Overlapping Images
Authors	Caijing Miao, Lingxi Xie, Fang Wan, Chi Su, Hongye Liu, Jianbin Jiao, Qixiang Ye
Abstract	In this paper, we present a large-scale dataset and establish a baseline for prohibited item discovery in Security Inspection X-ray images. Our dataset, named SIXray, consists of 1,059,231 X-ray images, in which 6 classes of 8,929 prohibited items are manually annotated. It raises a brand new challenge of overlapping image data, meanwhile shares the same properties with existing datasets, including complex yet meaningless contexts and class imbalance. We propose an approach named class-balanced hierarchical refinement (CHR) to deal with these difficulties. CHR assumes that each input image is sampled from a mixture distribution, and that deep networks require an iterative process to infer image contents accurately. To accelerate, we insert reversed connections to different network backbones, delivering high-level visual cues to assist mid-level features. In addition, a class-balanced loss function is designed to maximally alleviate the noise introduced by easy negative samples. We evaluate CHR on SIXray with different ratios of positive/negative samples. Compared to the baselines, CHR enjoys a better ability of discriminating objects especially using mid-level features, which offers the possibility of using a weakly-supervised approach towards accurate object localization. In particular, the advantage of CHR is more significant in the scenarios with fewer positive training samples, which demonstrates its potential application in real-world security inspection.
Tasks	Object Localization
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Miao_SIXray_A_Large-Scale_Security_Inspection_X-Ray_Benchmark_for_Prohibited_Item_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Miao_SIXray_A_Large-Scale_Security_Inspection_X-Ray_Benchmark_for_Prohibited_Item_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/sixray-a-large-scale-security-inspection-x-1
Repo	https://github.com/MeioJane/SIXray
Framework	none

Learning Unsupervised Video Object Segmentation Through Visual Attention


Title	Learning Unsupervised Video Object Segmentation Through Visual Attention
Authors	Wenguan Wang, Hongmei Song, Shuyang Zhao, Jianbing Shen, Sanyuan Zhao, Steven C. H. Hoi, Haibin Ling
Abstract	This paper conducts a systematic study on the role of visual attention in Unsupervised Video Object Segmentation (UVOS) tasks. By elaborately annotating three popular video segmentation datasets (DAVIS, Youtube-Objects and SegTrack V2) with dynamic eye-tracking data in the UVOS setting, for the first time, we quantitatively verified the high consistency of visual attention behavior among human observers, and found strong correlation between human attention and explicit primary object judgements during dynamic, task-driven viewing. Such novel observations provide an in-depth insight into the underlying rationale behind UVOS. Inspired by these findings, we decouple UVOS into two sub-tasks: UVOS-driven Dynamic Visual Attention Prediction (DVAP) in spatiotemporal domain, and Attention-Guided Object Segmentation (AGOS) in spatial domain. Our UVOS solution enjoys three major merits: 1) modular training without using expensive video segmentation annotations, instead, using more affordable dynamic fixation data to train the initial video attention module and using existing fixation-segmentation paired static/image data to train the subsequent segmentation module; 2) comprehensive foreground understanding through multi-source learning; and 3) additional interpretability from the biologically-inspired and assessable attention. Experiments on popular benchmarks show that, even without using expensive video object mask annotations, our model achieves compelling performance in comparison with state-of-the-arts.
Tasks	Eye Tracking, Semantic Segmentation, Unsupervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Wang_Learning_Unsupervised_Video_Object_Segmentation_Through_Visual_Attention_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Learning_Unsupervised_Video_Object_Segmentation_Through_Visual_Attention_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/learning-unsupervised-video-object
Repo	https://github.com/wenguanwang/AGS
Framework	caffe2

Auxiliary Variational MCMC


Title	Auxiliary Variational MCMC
Authors	Raza Habib, David Barber
Abstract	We introduce Auxiliary Variational MCMC, a novel framework for learning MCMC kernels that combines recent advances in variational inference with insights drawn from traditional auxiliary variable MCMC methods such as Hamiltonian Monte Carlo. Our framework exploits low dimensional structure in the target distribution in order to learn a more efficient MCMC sampler. The resulting sampler is able to suppress random walk behaviour and mix between modes efficiently, without the need to compute gradients of the target distribution. We test our sampler on a number of challenging distributions, where the underlying structure is known, and on the task of posterior sampling in Bayesian logistic regression. Code to reproduce all experiments is available at https://github.com/AVMCMC/AuxiliaryVariationalMCMC .
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=r1NJqsRctX
PDF	https://openreview.net/pdf?id=r1NJqsRctX
PWC	https://paperswithcode.com/paper/auxiliary-variational-mcmc
Repo	https://github.com/AVMCMC/AuxiliaryVariationalMCMC
Framework	tf

Sparse Bayesian approach for metric learning in latent space


Title	Sparse Bayesian approach for metric learning in latent space
Authors	Davood Zabihzadeh, Reza Monsefi, Hadi Sadoghi Yazdi
Abstract	This paper presents a new and efficient approach for metric learning in latent space. Our method discovers an optimal mapping from the feature space to a latent space that shrinks the distance between similar data items and also increases the distance between dissimilar ones. The proposed approach is based on a Bayesian variational framework which iteratively finds the optimal posterior distribution of parameters and hyperparameters of the model. Advantages of the proposed method to similar work are 1) Learning the noise of the latent variables on the low-dimensional manifold to find a more effective transformation. 2) Automatically finding the dimension of latent space and sparsification of the solution which prevents the overfitting problem. 3) Unlike Mahalanobis metric learning, the proposed algorithm roughly scales linearly to the dimension of data. Also, the present work is extended for learning in the feature space induced by an RKHS kernel. The proposed method is evaluated on small and large datasets coming from real applications such as network intrusion detection, face recognition, handwritten digits, letter recognition, and hyperspectral image classification. The results show that our method outperforms related representative and state-of-the-art methods in many small and large datasets.
Tasks	Face Recognition, Hyperspectral Image Classification, Image Classification, Intrusion Detection, Metric Learning, Network Intrusion Detection
Published	2019-08-15
URL	https://www.sciencedirect.com/science/article/abs/pii/S0950705119301741
PDF	https://www.sciencedirect.com/science/article/abs/pii/S0950705119301741
PWC	https://paperswithcode.com/paper/sparse-bayesian-approach-for-metric-learning
Repo	https://github.com/GT-Davood/SBML
Framework	none

UralicNLP: An NLP Library for Uralic Languages


Title	UralicNLP: An NLP Library for Uralic Languages
Authors	Mika Hämäläinen
Abstract	UralicNLP is a natural language processing library for small Uralic languages. It can produce morphological analysis, generate morphological forms, lemmatize words and give lexical information about words in Uralic languages. At the time of writing, the following languages are supported: Skolt Sami, Ingrian, Meadow & Eastern Mari, Votic, Olonets-Karelian, Erzya, Moksha, Hill Mari, Udmurt, Tundra Nenets, Komi-Permyak and Finnish. This information originates from FST tools and dictionaries developed in the Giellatekno infrastructure. Currently, UralicNLP uses the nightly builds for languages supported by Apertium and less frequently updated FSTs and CGs for the other languages.
Tasks	Morphological Analysis
Published	2019-05-09
URL	https://www.theoj.org/joss-papers/joss.01345/10.21105.joss.01345.pdf
PDF	https://www.theoj.org/joss-papers/joss.01345/10.21105.joss.01345.pdf
PWC	https://paperswithcode.com/paper/uralicnlp-an-nlp-library-for-uralic-languages
Repo	https://github.com/mikahama/uralicNLP
Framework	none

DATA: Differentiable ArchiTecture Approximation


Title	DATA: Differentiable ArchiTecture Approximation
Authors	Jianlong Chang, Xinbang Zhang, Yiwen Guo, Gaofeng Meng, Shiming Xiang, Chunhong Pan
Abstract	Neural architecture search (NAS) is inherently subject to the gap of architectures during searching and validating. To bridge this gap, we develop Differentiable ArchiTecture Approximation (DATA) with an Ensemble Gumbel-Softmax (EGS) estimator to automatically approximate architectures during searching and validating in a differentiable manner. Technically, the EGS estimator consists of a group of Gumbel-Softmax estimators, which is capable of converting probability vectors to binary codes and passing gradients from binary codes to probability vectors. Benefiting from such modeling, in searching, architecture parameters and network weights in the NAS model can be jointly optimized with the standard back-propagation, yielding an end-to-end learning mechanism for searching deep models in a large enough search space. Conclusively, during validating, a high-performance architecture that approaches to the learned one during searching is readily built. Extensive experiments on a variety of popular datasets strongly evidence that our method is capable of discovering high-performance architectures for image classification, language modeling and semantic segmentation, while guaranteeing the requisite efficiency during searching.
Tasks	Image Classification, Language Modelling, Neural Architecture Search, Semantic Segmentation
Published	2019-12-01
URL	http://papers.nips.cc/paper/8374-data-differentiable-architecture-approximation
PDF	http://papers.nips.cc/paper/8374-data-differentiable-architecture-approximation.pdf
PWC	https://paperswithcode.com/paper/data-differentiable-architecture
Repo	https://github.com/XinbangZhang/DATA-NAS
Framework	none