January 27, 2020

2917 words 14 mins read

Paper Group ANR 1075

Mixture-Model-based Bounding Box Density Estimation for Object Detection. Auditing ImageNet: Towards a Model-driven Framework for Annotating Demographic Attributes of Large-Scale Image Datasets. Is a Single Vector Enough? Exploring Node Polysemy for Network Embedding. Knowledge Graph Embedding Bi-Vector Models for Symmetric Relation. MinWikiSplit: …

Mixture-Model-based Bounding Box Density Estimation for Object Detection


Title	Mixture-Model-based Bounding Box Density Estimation for Object Detection
Authors	Jaeyoung Yoo, Geonseok Seo, Inseop Chung, Nojun Kwak
Abstract	In this paper, we reformulate the multi-object detection task as density estimation of bounding boxes. We propose a new object detection network, Mixture-Model-based Object Detector (MMOD), that performs multi-object detection through density estimation using a mixture model. MMOD captures this conditional distribution of bounding boxes for a given input image using a mixture model consisting of Gaussian and categorical distributions. In doing so, we also propose a new network structure and objective function for the MMOD. MMOD is not trained by assigning a ground truth bounding box to the specific locations of the network’s output. Instead, the mixture components are automatically learned to represent the distribution of the bounding box through density estimation. In this way, MMOD is not only trained without ground truth assignment but also does not suffer from foreground-background imbalance problem, since background bounding boxes are stochastically sampled from the mixture model that estimates ground truth bounding box distribution. We applied MMOD to MS COCO and Pascal VOC datasets, and observed that MMOD outperforms other detection methods in terms of speed and performance trade-offs. Code will be available.
Tasks	Density Estimation, Object Detection, Real-Time Object Detection
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12721v2
PDF	https://arxiv.org/pdf/1911.12721v2.pdf
PWC	https://paperswithcode.com/paper/mixture-model-based-bounding-box-density
Repo
Framework

Auditing ImageNet: Towards a Model-driven Framework for Annotating Demographic Attributes of Large-Scale Image Datasets


Title	Auditing ImageNet: Towards a Model-driven Framework for Annotating Demographic Attributes of Large-Scale Image Datasets
Authors	Chris Dulhanty, Alexander Wong
Abstract	The ImageNet dataset ushered in a flood of academic and industry interest in deep learning for computer vision applications. Despite its significant impact, there has not been a comprehensive investigation into the demographic attributes of images contained within the dataset. Such a study could lead to new insights on inherent biases within ImageNet, particularly important given it is frequently used to pretrain models for a wide variety of computer vision tasks. In this work, we introduce a model-driven framework for the automatic annotation of apparent age and gender attributes in large-scale image datasets. Using this framework, we conduct the first demographic audit of the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) subset of ImageNet and the “person” hierarchical category of ImageNet. We find that 41.62% of faces in ILSVRC appear as female, 1.71% appear as individuals above the age of 60, and males aged 15 to 29 account for the largest subgroup with 27.11%. We note that the presented model-driven framework is not fair for all intersectional groups, so annotation are subject to bias. We present this work as the starting point for future development of unbiased annotation models and for the study of downstream effects of imbalances in the demographics of ImageNet. Code and annotations are available at: http://bit.ly/ImageNetDemoAudit
Tasks	Object Recognition
Published	2019-05-03
URL	https://arxiv.org/abs/1905.01347v2
PDF	https://arxiv.org/pdf/1905.01347v2.pdf
PWC	https://paperswithcode.com/paper/auditing-imagenet-towards-a-model-driven
Repo
Framework

Is a Single Vector Enough? Exploring Node Polysemy for Network Embedding


Title	Is a Single Vector Enough? Exploring Node Polysemy for Network Embedding
Authors	Ninghao Liu, Qiaoyu Tan, Yuening Li, Hongxia Yang, Jingren Zhou, Xia Hu
Abstract	Networks have been widely used as the data structure for abstracting real-world systems as well as organizing the relations among entities. Network embedding models are powerful tools in mapping nodes in a network into continuous vector-space representations in order to facilitate subsequent tasks such as classification and link prediction. Existing network embedding models comprehensively integrate all information of each node, such as links and attributes, towards a single embedding vector to represent the node’s general role in the network. However, a real-world entity could be multifaceted, where it connects to different neighborhoods due to different motives or self-characteristics that are not necessarily correlated. For example, in a movie recommender system, a user may love comedies or horror movies simultaneously, but it is not likely that these two types of movies are mutually close in the embedding space, nor the user embedding vector could be sufficiently close to them at the same time. In this paper, we propose a polysemous embedding approach for modeling multiple facets of nodes, as motivated by the phenomenon of word polysemy in language modeling. Each facet of a node is mapped as an embedding vector, while we also maintain association degree between each pair of node and facet. The proposed method is adaptive to various existing embedding models, without significantly complicating the optimization process. We also discuss how to engage embedding vectors of different facets for inference tasks including classification and link prediction. Experiments on real-world datasets help comprehensively evaluate the performance of the proposed method.
Tasks	Language Modelling, Link Prediction, Network Embedding, Recommendation Systems
Published	2019-05-25
URL	https://arxiv.org/abs/1905.10668v1
PDF	https://arxiv.org/pdf/1905.10668v1.pdf
PWC	https://paperswithcode.com/paper/is-a-single-vector-enough-exploring-node
Repo
Framework

Knowledge Graph Embedding Bi-Vector Models for Symmetric Relation


Title	Knowledge Graph Embedding Bi-Vector Models for Symmetric Relation
Authors	Jinkui Yao, Lianghua Xu
Abstract	Knowledge graph embedding (KGE) models have been proposed to improve the performance of knowledge graph reasoning. However, there is a general phenomenon in most of KGEs, as the training progresses, the symmetric relations tend to zero vector, if the symmetric triples ratio is high enough in the dataset. This phenomenon causes subsequent tasks, e.g. link prediction etc., of symmetric relations to fail. The root cause of the problem is that KGEs do not utilize the semantic information of symmetric relations. We propose KGE bi-vector models, which represent the symmetric relations as vector pair, significantly increasing the processing capability of the symmetry relations. We generate the benchmark datasets based on FB15k and WN18 by completing the symmetric relation triples to verify models. The experiment results of our models clearly affirm the effectiveness and superiority of our models against baseline.
Tasks	Graph Embedding, Knowledge Graph Embedding, Link Prediction
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09557v1
PDF	https://arxiv.org/pdf/1905.09557v1.pdf
PWC	https://paperswithcode.com/paper/knowledge-graph-embedding-bi-vector-models
Repo
Framework

MinWikiSplit: A Sentence Splitting Corpus with Minimal Propositions


Title	MinWikiSplit: A Sentence Splitting Corpus with Minimal Propositions
Authors	Christina Niklaus, Andre Freitas, Siegfried Handschuh
Abstract	We compiled a new sentence splitting corpus that is composed of 203K pairs of aligned complex source and simplified target sentences. Contrary to previously proposed text simplification corpora, which contain only a small number of split examples, we present a dataset where each input sentence is broken down into a set of minimal propositions, i.e. a sequence of sound, self-contained utterances with each of them presenting a minimal semantic unit that cannot be further decomposed into meaningful propositions. This corpus is useful for developing sentence splitting approaches that learn how to transform sentences with a complex linguistic structure into a fine-grained representation of short sentences that present a simple and more regular structure which is easier to process for downstream applications and thus facilitates and improves their performance.
Tasks	Text Simplification
Published	2019-09-26
URL	https://arxiv.org/abs/1909.12131v1
PDF	https://arxiv.org/pdf/1909.12131v1.pdf
PWC	https://paperswithcode.com/paper/minwikisplit-a-sentence-splitting-corpus-with
Repo
Framework


Title	SpeechBERT: Cross-Modal Pre-trained Language Model for End-to-end Spoken Question Answering
Authors	Yung-Sung Chuang, Chi-Liang Liu, Hung-Yi Lee
Abstract	While end-to-end models for spoken language understanding tasks have been explored recently, there is still no end-to-end model for spoken question answering (SQA) tasks, which would be catastrophically influenced by speech recognition errors. Meanwhile, pre-trained language models, such as BERT, have performed successfully in text question answering. To bring this advantage of pre-trained language models into spoken question answering, we propose SpeechBERT, a cross-modal transformer-based pre-trained language model. Our model can outperform conventional approaches on the dataset which contains both correctly recognized answers and incorrectly recognized answers. Our experimental results show the potential of end-to-end SQA models.
Tasks	Language Modelling, Question Answering, Speech Recognition, Spoken Language Understanding
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11559v2
PDF	https://arxiv.org/pdf/1910.11559v2.pdf
PWC	https://paperswithcode.com/paper/speechbert-cross-modal-pre-trained-language
Repo
Framework

Evaluation Function Approximation for Scrabble


Title	Evaluation Function Approximation for Scrabble
Authors	Rishabh Agarwal
Abstract	The current state-of-the-art Scrabble agents are not learning-based but depend on truncated Monte Carlo simulations and the quality of such agents is contingent upon the time available for running the simulations. This thesis takes steps towards building a learning-based Scrabble agent using self-play. Specifically, we try to find a better function approximation for the static evaluation function used in Scrabble which determines the move goodness at a given board configuration. In this work, we experimented with evolutionary algorithms and Bayesian Optimization to learn the weights for an approximate feature-based evaluation function. However, these optimization methods were not quite effective, which lead us to explore the given problem from an Imitation Learning point of view. We also tried to imitate the ranking of moves produced by the Quackle simulation agent using supervised learning with a neural network function approximator which takes the raw representation of the Scrabble board as the input instead of using only a fixed number of handcrafted features.
Tasks	Imitation Learning
Published	2019-01-25
URL	http://arxiv.org/abs/1901.08728v1
PDF	http://arxiv.org/pdf/1901.08728v1.pdf
PWC	https://paperswithcode.com/paper/evaluation-function-approximation-for
Repo
Framework

An image structure model for exact edge detection


Title	An image structure model for exact edge detection
Authors	Alessandro Dal Palu’
Abstract	The paper presents a new model for single channel images low-level interpretation. The image is decomposed into a graph which captures a complete set of structural features. The description allows to accurately identify every edge location and its correct connectivity. The key features of the method are: vector description of the edges, subpixel precision, and parallelism of the underlying algorithm. The methodology outperforms classical and state of the art edge detectors at both conceptual and experimental levels. It also enables graph based algorithms for higher-level feature extraction. Any image processing pipeline can benefit from such results: e.g., controlled denoising, edge preserving filtering, upsampling, compression, vector and graph based pattern matching, neural network training.
Tasks	Denoising, Edge Detection
Published	2019-04-21
URL	http://arxiv.org/abs/1904.09659v1
PDF	http://arxiv.org/pdf/1904.09659v1.pdf
PWC	https://paperswithcode.com/paper/an-image-structure-model-for-exact-edge
Repo
Framework

Clustering Images by Unmasking - A New Baseline


Title	Clustering Images by Unmasking - A New Baseline
Authors	Mariana-Iuliana Georgescu, Radu Tudor Ionescu
Abstract	We propose a novel agglomerative clustering method based on unmasking, a technique that was previously used for authorship verification of text documents and for abnormal event detection in videos. In order to join two clusters, we alternate between (i) training a binary classifier to distinguish between the samples from one cluster and the samples from the other cluster, and (ii) removing at each step the most discriminant features. The faster-decreasing accuracy rates of the intermediately-obtained classifiers indicate that the two clusters should be joined. To the best of our knowledge, this is the first work to apply unmasking in order to cluster images. We compare our method with k-means as well as a recent state-of-the-art clustering method. The empirical results indicate that our approach is able to improve performance for various (deep and shallow) feature representations and different tasks, such as handwritten digit recognition, texture classification and fine-grained object recognition.
Tasks	Handwritten Digit Recognition, Object Recognition, Texture Classification
Published	2019-05-02
URL	https://arxiv.org/abs/1905.00773v1
PDF	https://arxiv.org/pdf/1905.00773v1.pdf
PWC	https://paperswithcode.com/paper/clustering-images-by-unmasking-a-new-baseline
Repo
Framework

PDQ & TMK + PDQF – A Test Drive of Facebook’s Perceptual Hashing Algorithms


Title	PDQ & TMK + PDQF – A Test Drive of Facebook’s Perceptual Hashing Algorithms
Authors	Janis Dalins, Campbell Wilson, Douglas Boudry
Abstract	Efficient and reliable automated detection of modified image and multimedia files has long been a challenge for law enforcement, compounded by the harm caused by repeated exposure to psychologically harmful materials. In August 2019 Facebook open-sourced their PDQ and TMK + PDQF algorithms for image and video similarity measurement, respectively. In this report, we review the algorithms’ performance on detecting commonly encountered transformations on real-world case data, sourced from contemporary investigations. We also provide a reference implementation to demonstrate the potential application and integration of such algorithms within existing law enforcement systems.
Tasks	Video Similarity
Published	2019-12-16
URL	https://arxiv.org/abs/1912.07745v1
PDF	https://arxiv.org/pdf/1912.07745v1.pdf
PWC	https://paperswithcode.com/paper/pdq-tmk-pdqf-a-test-drive-of-facebooks
Repo
Framework

Learning Likelihoods with Conditional Normalizing Flows


Title	Learning Likelihoods with Conditional Normalizing Flows
Authors	Christina Winkler, Daniel Worrall, Emiel Hoogeboom, Max Welling
Abstract	Normalizing Flows (NFs) are able to model complicated distributions p(y) with strong inter-dimensional correlations and high multimodality by transforming a simple base density p(z) through an invertible neural network under the change of variables formula. Such behavior is desirable in multivariate structured prediction tasks, where handcrafted per-pixel loss-based methods inadequately capture strong correlations between output dimensions. We present a study of conditional normalizing flows (CNFs), a class of NFs where the base density to output space mapping is conditioned on an input x, to model conditional densities p(yx). CNFs are efficient in sampling and inference, they can be trained with a likelihood-based objective, and CNFs, being generative flows, do not suffer from mode collapse or training instabilities. We provide an effective method to train continuous CNFs for binary problems and in particular, we apply these CNFs to super-resolution and vessel segmentation tasks demonstrating competitive performance on standard benchmark datasets in terms of likelihood and conventional metrics.
Tasks	Structured Prediction, Super-Resolution
Published	2019-11-29
URL	https://arxiv.org/abs/1912.00042v1
PDF	https://arxiv.org/pdf/1912.00042v1.pdf
PWC	https://paperswithcode.com/paper/learning-likelihoods-with-conditional-1
Repo
Framework

Detect Toxic Content to Improve Online Conversations


Title	Detect Toxic Content to Improve Online Conversations
Authors	Deepshi Mediratta, Nikhil Oswal
Abstract	Social media is filled with toxic content. The aim of this paper is to build a model that can detect insincere questions. We use the ‘Quora Insincere Questions Classification’ dataset for our analysis. The dataset is composed of sincere and insincere questions, with the majority of sincere questions. The dataset is processed and analyzed using Python and its libraries such as sklearn, numpy, pandas, keras etc. The dataset is converted to vector form using word embeddings such as GloVe, Wiki-news and TF-IDF. The imbalance in the dataset is handled by resampling techniques. We train and compare various machine learning and deep learning models to come up with the best results. Models discussed include SVM, Naive Bayes, GRU and LSTM.
Tasks	Word Embeddings
Published	2019-10-29
URL	https://arxiv.org/abs/1911.01217v1
PDF	https://arxiv.org/pdf/1911.01217v1.pdf
PWC	https://paperswithcode.com/paper/detect-toxic-content-to-improve-online
Repo
Framework

A Bayesian Approach to Recurrence in Neural Networks


Title	A Bayesian Approach to Recurrence in Neural Networks
Authors	Philip N. Garner, Sibo Tong
Abstract	We begin by reiterating that common neural network activation functions have simple Bayesian origins. In this spirit, we go on to show that Bayes’s theorem also implies a simple recurrence relation; this leads to a Bayesian recurrent unit with a prescribed feedback formulation. We show that introduction of a context indicator leads to a variable feedback that is similar to the forget mechanism in conventional recurrent units. A similar approach leads to a probabilistic input gate. The Bayesian formulation leads naturally to the two pass algorithm of the Kalman smoother or forward-backward algorithm, meaning that inference naturally depends upon future inputs as well as past ones. Experiments on speech recognition confirm that the resulting architecture can perform as well as a bidirectional recurrent network with the same number of parameters as a unidirectional one. Further, when configured explicitly bidirectionally, the architecture can exceed the performance of a conventional bidirectional recurrence.
Tasks	Speech Recognition
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11247v2
PDF	https://arxiv.org/pdf/1910.11247v2.pdf
PWC	https://paperswithcode.com/paper/a-bayesian-approach-to-recurrence-in-neural
Repo
Framework

Solving dynamic multi-objective optimization problems via support vector machine


Title	Solving dynamic multi-objective optimization problems via support vector machine
Authors	Min Jiang, Weizhen Hu, Liming Qiu, Minghui Shi, Kay Chen Tan
Abstract	Dynamic Multi-objective Optimization Problems (DMOPs) refer to optimization problems that objective functions will change with time. Solving DMOPs implies that the Pareto Optimal Set (POS) at different moments can be accurately found, and this is a very difficult job due to the dynamics of the optimization problems. The POS that have been obtained in the past can help us to find the POS of the next time more quickly and accurately. Therefore, in this paper we present a Support Vector Machine (SVM) based Dynamic Multi-Objective Evolutionary optimization Algorithm, called SVM-DMOEA. The algorithm uses the POS that has been obtained to train a SVM and then take the trained SVM to classify the solutions of the dynamic optimization problem at the next moment, and thus it is able to generate an initial population which consists of different individuals recognized by the trained SVM. The initial populuation can be fed into any population based optimization algorithm, e.g., the Nondominated Sorting Genetic Algorithm II (NSGA-II), to get the POS at that moment. The experimental results show the validity of our proposed approach.
Tasks
Published	2019-10-19
URL	https://arxiv.org/abs/1910.08747v1
PDF	https://arxiv.org/pdf/1910.08747v1.pdf
PWC	https://paperswithcode.com/paper/solving-dynamic-multi-objective-optimization-1
Repo
Framework

A Method to Model Conditional Distributions with Normalizing Flows


Title	A Method to Model Conditional Distributions with Normalizing Flows
Authors	Zhisheng Xiao, Qing Yan, Yali Amit
Abstract	In this work, we investigate the use of normalizing flows to model conditional distributions. In particular, we use our proposed method to analyze inverse problems with invertible neural networks by maximizing the posterior likelihood. Our method uses only a single loss and is easy to train. This is an improvement on the previous method that solves similar inverse problems with invertible neural networks but which involves a combination of several loss terms with ad-hoc weighting. In addition, our method provides a natural framework to incorporate conditioning in normalizing flows, and therefore, we can train an invertible network to perform conditional generation. We analyze our method and perform a careful comparison with previous approaches. Simple experiments show the effectiveness of our method, and more comprehensive experimental evaluations are undergoing.
Tasks
Published	2019-11-05
URL	https://arxiv.org/abs/1911.02052v1
PDF	https://arxiv.org/pdf/1911.02052v1.pdf
PWC	https://paperswithcode.com/paper/a-method-to-model-conditional-distributions
Repo
Framework