Paper Group NAWR 34
SISUA: Semi-Supervised Generative Autoencoder for Single Cell Data. Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure. FaceForensics++: Learning to Detect Manipulated Facial Images. Bayesian Adaptive Superpixel Segmentation. Direct Optimization through \arg \max for Discrete Variational Auto-Encoder. Deep Metric Lea …
SISUA: Semi-Supervised Generative Autoencoder for Single Cell Data
Title | SISUA: Semi-Supervised Generative Autoencoder for Single Cell Data |
Authors | Trung Ngo Trong, Roger Kramer, Juha Mehtonen, Gerardo González, Ville Hautamäki, Merja Heinäniemi |
Abstract | Single-cell transcriptomics offers a tool to study the diversity of cell phenotypes through snapshots of the abundance of mRNA in individual cells. Often there is additional information available besides the single cell gene expression counts, such as bulk transcriptome data from the same tissue, or quantification of surface protein levels from the same cells. In this study, we propose models based on the Bayesian generative approach, where protein quantification available as CITE-seq counts from the same cells are used to constrain the learning process, thus forming a semi-supervised model. The generative model is based on the deep variational autoencoder (VAE) neural network architecture. |
Tasks | Single-cell modeling |
Published | 2019-05-08 |
URL | https://www.biorxiv.org/content/10.1101/631382v1 |
https://www.biorxiv.org/content/biorxiv/early/2019/05/08/631382.full-text.pdf | |
PWC | https://paperswithcode.com/paper/sisua-semi-supervised-generative-autoencoder |
Repo | https://github.com/trungnt13/sisua |
Framework | none |
Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure
Title | Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure |
Authors | Karan Goel, Emma Brunskill |
Abstract | Clustering methods and latent variable models are often used as tools for pattern mining and discovery of latent structure in time-series data. In this work, we consider the problem of learning procedural abstractions from possibly high-dimensional observational sequences, such as video demonstrations. Given a dataset of time-series, the goal is to identify the latent sequence of steps common to them and label each time-series with the temporal extent of these procedural steps. We introduce a hierarchical Bayesian model called Prism that models the realization of a common procedure across multiple time-series, and can recover procedural abstractions with supervision. We also bring to light two characteristics ignored by traditional evaluation criteria when evaluating latent temporal labelings (temporal clusterings) – segment structure, and repeated structure – and develop new metrics tailored to their evaluation. We demonstrate that our metrics improve interpretability and ease of analysis for evaluation on benchmark time-series datasets. Results on benchmark and video datasets indicate that Prism outperforms standard sequence models as well as state-of-the-art techniques in identifying procedural abstractions. |
Tasks | Latent Variable Models, Time Series |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=ByleB2CcKm |
https://openreview.net/pdf?id=ByleB2CcKm | |
PWC | https://paperswithcode.com/paper/learning-procedural-abstractions-and |
Repo | https://github.com/StanfordAI4HI/ICLR2019_evaluating_discrete_temporal_structure |
Framework | none |
FaceForensics++: Learning to Detect Manipulated Facial Images
Title | FaceForensics++: Learning to Detect Manipulated Facial Images |
Authors | Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, Matthias Niessner |
Abstract | The rapid progress in synthetic image generation and manipulation has now come to a point where it raises significant concerns for the implications towards society. At best, this leads to a loss of trust in digital content, but could potentially cause further harm by spreading false information or fake news. This paper examines the realism of state-of-the-art image manipulations, and how difficult it is to detect them, either automatically or by humans. To standardize the evaluation of detection methods, we propose an automated benchmark for facial manipulation detection. In particular, the benchmark is based on Deep-Fakes, Face2Face, FaceSwap and NeuralTextures as prominent representatives for facial manipulations at random compression level and size. The benchmark is publicly available and contains a hidden test set as well as a database of over 1.8 million manipulated images. This dataset is over an order of magnitude larger than comparable, publicly available, forgery datasets. Based on this data, we performed a thorough analysis of data-driven forgery detectors. We show that the use of additional domain-specific knowledge improves forgery detection to unprecedented accuracy, even in the presence of strong compression, and clearly outperforms human observers. |
Tasks | Face Swapping, Image Generation |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Rossler_FaceForensics_Learning_to_Detect_Manipulated_Facial_Images_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Rossler_FaceForensics_Learning_to_Detect_Manipulated_Facial_Images_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/faceforensics-learning-to-detect-manipulated-1 |
Repo | https://github.com/ondyari/FaceForensics |
Framework | none |
Bayesian Adaptive Superpixel Segmentation
Title | Bayesian Adaptive Superpixel Segmentation |
Authors | Roy Uziel, Meitar Ronen, Oren Freifeld |
Abstract | Superpixels provide a useful intermediate image representation. Existing superpixel methods, however, suffer from at least some of the following drawbacks: 1) topology is handled heuristically; 2) the number of superpixels is either predefined or estimated at a prohibitive cost; 3) lack of adaptiveness. As a remedy, we propose a novel probabilistic model, self-coined Bayesian Adaptive Superpixel Segmentation (BASS), together with an efficient inference. BASS is a Bayesian nonparametric mixture model that also respects topology and favors spatial coherence. The optimizationbased and topology-aware inference is parallelizable and implemented in GPU. Quantitatively, BASS achieves results that are either better than the state-of-the-art or close to it, depending on the performance index and/or dataset. Qualitatively, we argue it achieves the best results; we demonstrate this by not only subjective visual inspection but also objective quantitative performance evaluation of the downstream application of face detection. Our code is available at https://github.com/uzielroy/BASS. |
Tasks | Face Detection |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Uziel_Bayesian_Adaptive_Superpixel_Segmentation_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Uziel_Bayesian_Adaptive_Superpixel_Segmentation_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-adaptive-superpixel-segmentation |
Repo | https://github.com/uzielroy/BASS |
Framework | none |
Direct Optimization through \arg \max for Discrete Variational Auto-Encoder
Title | Direct Optimization through \arg \max for Discrete Variational Auto-Encoder |
Authors | Guy Lorberbom, Tommi Jaakkola, Andreea Gane, Tamir Hazan |
Abstract | Reparameterization of variational auto-encoders with continuous random variables is an effective method for reducing the variance of their gradient estimates. In the discrete case, one can perform reparametrization using the Gumbel-Max trick, but the resulting objective relies on an $\arg \max$ operation and is non-differentiable. In contrast to previous works which resort to \emph{softmax}-based relaxations, we propose to optimize it directly by applying the \emph{direct loss minimization} approach. Our proposal extends naturally to structured discrete latent variable models when evaluating the $\arg \max$ operation is tractable. We demonstrate empirically the effectiveness of the direct loss minimization technique in variational autoencoders with both unstructured and structured discrete latent variables. |
Tasks | Latent Variable Models |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8851-direct-optimization-through-arg-max-for-discrete-variational-auto-encoder |
http://papers.nips.cc/paper/8851-direct-optimization-through-arg-max-for-discrete-variational-auto-encoder.pdf | |
PWC | https://paperswithcode.com/paper/direct-optimization-through-arg-max-for-2 |
Repo | https://github.com/GuyLor/direct_vae |
Framework | pytorch |
Deep Metric Learning to Rank
Title | Deep Metric Learning to Rank |
Authors | Fatih Cakir, Kun He, Xide Xia, Brian Kulis, Stan Sclaroff |
Abstract | We propose a novel deep metric learning method by revisiting the learning to rank approach. Our method, named FastAP, optimizes the rank-based Average Precision measure, using an approximation derived from distance quantization. FastAP has a low complexity compared to existing methods, and is tailored for stochastic gradient descent. To fully exploit the benefits of the ranking formulation, we also propose a new minibatch sampling scheme, as well as a simple heuristic to enable large-batch training. On three few-shot image retrieval datasets, FastAP consistently outperforms competing methods, which often involve complex optimization heuristics or costly model ensembles. |
Tasks | Image Retrieval, Learning-To-Rank, Metric Learning, Quantization |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Cakir_Deep_Metric_Learning_to_Rank_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Cakir_Deep_Metric_Learning_to_Rank_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/deep-metric-learning-to-rank |
Repo | https://github.com/kunhe/FastAP-metric-learning |
Framework | pytorch |
Joint Optimization of Cascade Ranking Models
Title | Joint Optimization of Cascade Ranking Models |
Authors | Luke Gallagher, Ruey-Chen Chen, Roi Blanco, J. Shane Culpepper |
Abstract | Reducing excessive costs in feature acquisition and model evaluation has been a long-standing challenge in learning-to-rank systems. A cascaded ranking architecture turns ranking into a pipeline of multiple stages, and has been shown to be a powerful approach to balancing efficiency and effectiveness trade-offs in large-scale search systems. However, learning a cascade model is often complex, and usually performed stagewise independently across the entire ranking pipeline. In this work we show that learning a cascade ranking model in this manner is often suboptimal in terms of both effectiveness and efficiency. We present a new general framework for learning an end-to-end cascade of rankers using backpropagation. We show that stagewise objectives can be chained together and optimized jointly to achieve significantly better trade-offs globally. This novel approach is generalizable to not only differentiable models but also state-of-the-art tree-based algorithms such as LambdaMART and cost-efficient gradient boosted trees, and it opens up new opportunities for exploring additional efficiency-effectiveness trade-offs in large-scale search systems. |
Tasks | Ad-Hoc Information Retrieval, Document Ranking, Information Retrieval, Learning-To-Rank |
Published | 2019-02-11 |
URL | https://dl.acm.org/citation.cfm?id=3290986 |
http://culpepper.io/publications/gcbc19-wsdm.pdf | |
PWC | https://paperswithcode.com/paper/joint-optimization-of-cascade-ranking-models |
Repo | https://github.com/rmit-ir/joint-cascade-ranking |
Framework | none |
KnowledgeNet: A Benchmark Dataset for Knowledge Base Population
Title | KnowledgeNet: A Benchmark Dataset for Knowledge Base Population |
Authors | Filipe Mesquita, Matteo Cannaviccio, Jordan Schmidek, Paramita Mirza, Denilson Barbosa |
Abstract | KnowledgeNet is a benchmark dataset for the task of automatically populating a knowledge base (Wikidata) with facts expressed in natural language text on the web. KnowledgeNet provides text exhaustively annotated with facts, thus enabling the holistic end-to-end evaluation of knowledge base population systems as a whole, unlike previous benchmarks that are more suitable for the evaluation of individual subcomponents (e.g., entity linking, relation extraction). We discuss five baseline approaches, where the best approach achieves an F1 score of 0.50, significantly outperforming a traditional approach by 79{%} (0.28). However, our best baseline is far from reaching human performance (0.82), indicating our dataset is challenging. The KnowledgeNet dataset and baselines are available at https://github.com/diffbot/knowledge-net |
Tasks | Entity Linking, Knowledge Base Population, Relation Extraction |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-1069/ |
https://www.aclweb.org/anthology/D19-1069 | |
PWC | https://paperswithcode.com/paper/knowledgenet-a-benchmark-dataset-for |
Repo | https://github.com/diffbot/knowledge-net |
Framework | none |
Old is Gold: Linguistic Driven Approach for Entity and Relation Linking of Short Text
Title | Old is Gold: Linguistic Driven Approach for Entity and Relation Linking of Short Text |
Authors | Ahmad Sakor, On, Isaiah o Mulang{'}, Kuldeep Singh, Saeedeh Shekarpour, Maria Esther Vidal, Jens Lehmann, S{"o}ren Auer |
Abstract | Short texts challenge NLP tasks such as named entity recognition, disambiguation, linking and relation inference because they do not provide sufficient context or are partially malformed (e.g. wrt. capitalization, long tail entities, implicit relations). In this work, we present the Falcon approach which effectively maps entities and relations within a short text to its mentions of a background knowledge graph. Falcon overcomes the challenges of short text using a light-weight linguistic approach relying on a background knowledge graph. Falcon performs joint entity and relation linking of a short text by leveraging several fundamental principles of English morphology (e.g. compounding, headword identification) and utilizes an extended knowledge graph created by merging entities and relations from various knowledge sources. It uses the context of entities for finding relations and does not require training data. Our empirical study using several standard benchmarks and datasets show that Falcon significantly outperforms state-of-the-art entity and relation linking for short text query inventories. |
Tasks | Entity Linking, Named Entity Recognition |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1243/ |
https://www.aclweb.org/anthology/N19-1243 | |
PWC | https://paperswithcode.com/paper/old-is-gold-linguistic-driven-approach-for |
Repo | https://github.com/AhmadSakor/falcon |
Framework | none |
SIXray: A Large-Scale Security Inspection X-Ray Benchmark for Prohibited Item Discovery in Overlapping Images
Title | SIXray: A Large-Scale Security Inspection X-Ray Benchmark for Prohibited Item Discovery in Overlapping Images |
Authors | Caijing Miao, Lingxi Xie, Fang Wan, Chi Su, Hongye Liu, Jianbin Jiao, Qixiang Ye |
Abstract | In this paper, we present a large-scale dataset and establish a baseline for prohibited item discovery in Security Inspection X-ray images. Our dataset, named SIXray, consists of 1,059,231 X-ray images, in which 6 classes of 8,929 prohibited items are manually annotated. It raises a brand new challenge of overlapping image data, meanwhile shares the same properties with existing datasets, including complex yet meaningless contexts and class imbalance. We propose an approach named class-balanced hierarchical refinement (CHR) to deal with these difficulties. CHR assumes that each input image is sampled from a mixture distribution, and that deep networks require an iterative process to infer image contents accurately. To accelerate, we insert reversed connections to different network backbones, delivering high-level visual cues to assist mid-level features. In addition, a class-balanced loss function is designed to maximally alleviate the noise introduced by easy negative samples. We evaluate CHR on SIXray with different ratios of positive/negative samples. Compared to the baselines, CHR enjoys a better ability of discriminating objects especially using mid-level features, which offers the possibility of using a weakly-supervised approach towards accurate object localization. In particular, the advantage of CHR is more significant in the scenarios with fewer positive training samples, which demonstrates its potential application in real-world security inspection. |
Tasks | Object Localization |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Miao_SIXray_A_Large-Scale_Security_Inspection_X-Ray_Benchmark_for_Prohibited_Item_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Miao_SIXray_A_Large-Scale_Security_Inspection_X-Ray_Benchmark_for_Prohibited_Item_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/sixray-a-large-scale-security-inspection-x-1 |
Repo | https://github.com/MeioJane/SIXray |
Framework | none |
Learning Unsupervised Video Object Segmentation Through Visual Attention
Title | Learning Unsupervised Video Object Segmentation Through Visual Attention |
Authors | Wenguan Wang, Hongmei Song, Shuyang Zhao, Jianbing Shen, Sanyuan Zhao, Steven C. H. Hoi, Haibin Ling |
Abstract | This paper conducts a systematic study on the role of visual attention in Unsupervised Video Object Segmentation (UVOS) tasks. By elaborately annotating three popular video segmentation datasets (DAVIS, Youtube-Objects and SegTrack V2) with dynamic eye-tracking data in the UVOS setting, for the first time, we quantitatively verified the high consistency of visual attention behavior among human observers, and found strong correlation between human attention and explicit primary object judgements during dynamic, task-driven viewing. Such novel observations provide an in-depth insight into the underlying rationale behind UVOS. Inspired by these findings, we decouple UVOS into two sub-tasks: UVOS-driven Dynamic Visual Attention Prediction (DVAP) in spatiotemporal domain, and Attention-Guided Object Segmentation (AGOS) in spatial domain. Our UVOS solution enjoys three major merits: 1) modular training without using expensive video segmentation annotations, instead, using more affordable dynamic fixation data to train the initial video attention module and using existing fixation-segmentation paired static/image data to train the subsequent segmentation module; 2) comprehensive foreground understanding through multi-source learning; and 3) additional interpretability from the biologically-inspired and assessable attention. Experiments on popular benchmarks show that, even without using expensive video object mask annotations, our model achieves compelling performance in comparison with state-of-the-arts. |
Tasks | Eye Tracking, Semantic Segmentation, Unsupervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Wang_Learning_Unsupervised_Video_Object_Segmentation_Through_Visual_Attention_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Learning_Unsupervised_Video_Object_Segmentation_Through_Visual_Attention_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/learning-unsupervised-video-object |
Repo | https://github.com/wenguanwang/AGS |
Framework | caffe2 |
Auxiliary Variational MCMC
Title | Auxiliary Variational MCMC |
Authors | Raza Habib, David Barber |
Abstract | We introduce Auxiliary Variational MCMC, a novel framework for learning MCMC kernels that combines recent advances in variational inference with insights drawn from traditional auxiliary variable MCMC methods such as Hamiltonian Monte Carlo. Our framework exploits low dimensional structure in the target distribution in order to learn a more efficient MCMC sampler. The resulting sampler is able to suppress random walk behaviour and mix between modes efficiently, without the need to compute gradients of the target distribution. We test our sampler on a number of challenging distributions, where the underlying structure is known, and on the task of posterior sampling in Bayesian logistic regression. Code to reproduce all experiments is available at https://github.com/AVMCMC/AuxiliaryVariationalMCMC . |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=r1NJqsRctX |
https://openreview.net/pdf?id=r1NJqsRctX | |
PWC | https://paperswithcode.com/paper/auxiliary-variational-mcmc |
Repo | https://github.com/AVMCMC/AuxiliaryVariationalMCMC |
Framework | tf |
Sparse Bayesian approach for metric learning in latent space
Title | Sparse Bayesian approach for metric learning in latent space |
Authors | Davood Zabihzadeh, Reza Monsefi, Hadi Sadoghi Yazdi |
Abstract | This paper presents a new and efficient approach for metric learning in latent space. Our method discovers an optimal mapping from the feature space to a latent space that shrinks the distance between similar data items and also increases the distance between dissimilar ones. The proposed approach is based on a Bayesian variational framework which iteratively finds the optimal posterior distribution of parameters and hyperparameters of the model. Advantages of the proposed method to similar work are 1) Learning the noise of the latent variables on the low-dimensional manifold to find a more effective transformation. 2) Automatically finding the dimension of latent space and sparsification of the solution which prevents the overfitting problem. 3) Unlike Mahalanobis metric learning, the proposed algorithm roughly scales linearly to the dimension of data. Also, the present work is extended for learning in the feature space induced by an RKHS kernel. The proposed method is evaluated on small and large datasets coming from real applications such as network intrusion detection, face recognition, handwritten digits, letter recognition, and hyperspectral image classification. The results show that our method outperforms related representative and state-of-the-art methods in many small and large datasets. |
Tasks | Face Recognition, Hyperspectral Image Classification, Image Classification, Intrusion Detection, Metric Learning, Network Intrusion Detection |
Published | 2019-08-15 |
URL | https://www.sciencedirect.com/science/article/abs/pii/S0950705119301741 |
https://www.sciencedirect.com/science/article/abs/pii/S0950705119301741 | |
PWC | https://paperswithcode.com/paper/sparse-bayesian-approach-for-metric-learning |
Repo | https://github.com/GT-Davood/SBML |
Framework | none |
UralicNLP: An NLP Library for Uralic Languages
Title | UralicNLP: An NLP Library for Uralic Languages |
Authors | Mika Hämäläinen |
Abstract | UralicNLP is a natural language processing library for small Uralic languages. It can produce morphological analysis, generate morphological forms, lemmatize words and give lexical information about words in Uralic languages. At the time of writing, the following languages are supported: Skolt Sami, Ingrian, Meadow & Eastern Mari, Votic, Olonets-Karelian, Erzya, Moksha, Hill Mari, Udmurt, Tundra Nenets, Komi-Permyak and Finnish. This information originates from FST tools and dictionaries developed in the Giellatekno infrastructure. Currently, UralicNLP uses the nightly builds for languages supported by Apertium and less frequently updated FSTs and CGs for the other languages. |
Tasks | Morphological Analysis |
Published | 2019-05-09 |
URL | https://www.theoj.org/joss-papers/joss.01345/10.21105.joss.01345.pdf |
https://www.theoj.org/joss-papers/joss.01345/10.21105.joss.01345.pdf | |
PWC | https://paperswithcode.com/paper/uralicnlp-an-nlp-library-for-uralic-languages |
Repo | https://github.com/mikahama/uralicNLP |
Framework | none |
DATA: Differentiable ArchiTecture Approximation
Title | DATA: Differentiable ArchiTecture Approximation |
Authors | Jianlong Chang, Xinbang Zhang, Yiwen Guo, Gaofeng Meng, Shiming Xiang, Chunhong Pan |
Abstract | Neural architecture search (NAS) is inherently subject to the gap of architectures during searching and validating. To bridge this gap, we develop Differentiable ArchiTecture Approximation (DATA) with an Ensemble Gumbel-Softmax (EGS) estimator to automatically approximate architectures during searching and validating in a differentiable manner. Technically, the EGS estimator consists of a group of Gumbel-Softmax estimators, which is capable of converting probability vectors to binary codes and passing gradients from binary codes to probability vectors. Benefiting from such modeling, in searching, architecture parameters and network weights in the NAS model can be jointly optimized with the standard back-propagation, yielding an end-to-end learning mechanism for searching deep models in a large enough search space. Conclusively, during validating, a high-performance architecture that approaches to the learned one during searching is readily built. Extensive experiments on a variety of popular datasets strongly evidence that our method is capable of discovering high-performance architectures for image classification, language modeling and semantic segmentation, while guaranteeing the requisite efficiency during searching. |
Tasks | Image Classification, Language Modelling, Neural Architecture Search, Semantic Segmentation |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8374-data-differentiable-architecture-approximation |
http://papers.nips.cc/paper/8374-data-differentiable-architecture-approximation.pdf | |
PWC | https://paperswithcode.com/paper/data-differentiable-architecture |
Repo | https://github.com/XinbangZhang/DATA-NAS |
Framework | none |