July 29, 2019

3193 words 15 mins read

Paper Group AWR 196

Paper Group AWR 196

Stability of Topic Modeling via Matrix Factorization. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. RLlib: Abstractions for Distributed Reinforcement Learning. Adversarial Occlusion-aware Face Detection. Geometry-Aware Learning of Maps for Camera Localization. Cross-lingual Word Segmentation and Mo …

Stability of Topic Modeling via Matrix Factorization

Title Stability of Topic Modeling via Matrix Factorization
Authors Mark Belford, Brian Mac Namee, Derek Greene
Abstract Topic models can provide us with an insight into the underlying latent structure of a large corpus of documents. A range of methods have been proposed in the literature, including probabilistic topic models and techniques based on matrix factorization. However, in both cases, standard implementations rely on stochastic elements in their initialization phase, which can potentially lead to different results being generated on the same corpus when using the same parameter values. This corresponds to the concept of “instability” which has previously been studied in the context of $k$-means clustering. In many applications of topic modeling, this problem of instability is not considered and topic models are treated as being definitive, even though the results may change considerably if the initialization process is altered. In this paper we demonstrate the inherent instability of popular topic modeling approaches, using a number of new measures to assess stability. To address this issue in the context of matrix factorization for topic modeling, we propose the use of ensemble learning strategies. Based on experiments performed on annotated text corpora, we show that a K-Fold ensemble strategy, combining both ensembles and structured initialization, can significantly reduce instability, while simultaneously yielding more accurate topic models.
Tasks Topic Models
Published 2017-02-23
URL http://arxiv.org/abs/1702.07186v2
PDF http://arxiv.org/pdf/1702.07186v2.pdf
PWC https://paperswithcode.com/paper/stability-of-topic-modeling-via-matrix
Repo https://github.com/derekgreene/topic-ensemble
Framework none

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

Title AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks
Authors Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, Xiaodong He
Abstract In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation. With a novel attentional generative network, the AttnGAN can synthesize fine-grained details at different subregions of the image by paying attentions to the relevant words in the natural language description. In addition, a deep attentional multimodal similarity model is proposed to compute a fine-grained image-text matching loss for training the generator. The proposed AttnGAN significantly outperforms the previous state of the art, boosting the best reported inception score by 14.14% on the CUB dataset and 170.25% on the more challenging COCO dataset. A detailed analysis is also performed by visualizing the attention layers of the AttnGAN. It for the first time shows that the layered attentional GAN is able to automatically select the condition at the word level for generating different parts of the image.
Tasks Image Generation, Text Matching, Text-to-Image Generation
Published 2017-11-28
URL http://arxiv.org/abs/1711.10485v1
PDF http://arxiv.org/pdf/1711.10485v1.pdf
PWC https://paperswithcode.com/paper/attngan-fine-grained-text-to-image-generation
Repo https://github.com/bprabhakar/text-to-image
Framework pytorch

RLlib: Abstractions for Distributed Reinforcement Learning

Title RLlib: Abstractions for Distributed Reinforcement Learning
Authors Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael I. Jordan, Ion Stoica
Abstract Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distributing RL components in a composable way by adapting algorithms for top-down hierarchical control, thereby encapsulating parallelism and resource requirements within short-running compute tasks. We demonstrate the benefits of this principle through RLlib: a library that provides scalable software primitives for RL. These primitives enable a broad range of algorithms to be implemented with high performance, scalability, and substantial code reuse. RLlib is available at https://rllib.io/.
Tasks
Published 2017-12-26
URL http://arxiv.org/abs/1712.09381v4
PDF http://arxiv.org/pdf/1712.09381v4.pdf
PWC https://paperswithcode.com/paper/rllib-abstractions-for-distributed
Repo https://github.com/ray-project/ray
Framework tf

Adversarial Occlusion-aware Face Detection

Title Adversarial Occlusion-aware Face Detection
Authors Yujia Chen, Lingxiao Song, Ran He
Abstract Occluded face detection is a challenging detection task due to the large appearance variations incurred by various real-world occlusions. This paper introduces an Adversarial Occlusion-aware Face Detector (AOFD) by simultaneously detecting occluded faces and segmenting occluded areas. Specifically, we employ an adversarial training strategy to generate occlusion-like face features that are difficult for a face detector to recognize. Occlusion mask is predicted simultaneously while detecting occluded faces and the occluded area is utilized as an auxiliary instead of being regarded as a hindrance. Moreover, the supervisory signals from the segmentation branch will reversely affect the features, aiding in detecting heavily-occluded faces accordingly. Consequently, AOFD is able to find the faces with few exposed facial landmarks with very high confidences and keeps high detection accuracy even for masked faces. Extensive experiments demonstrate that AOFD not only significantly outperforms state-of-the-art methods on the MAFA occluded face detection dataset, but also achieves competitive detection accuracy on benchmark dataset for general face detection such as FDDB.
Tasks Face Detection, Occluded Face Detection
Published 2017-09-15
URL http://arxiv.org/abs/1709.05188v6
PDF http://arxiv.org/pdf/1709.05188v6.pdf
PWC https://paperswithcode.com/paper/adversarial-occlusion-aware-face-detection
Repo https://github.com/IssacCyj/Adversarial-Occlussion-aware-Face-Detection
Framework caffe2

Geometry-Aware Learning of Maps for Camera Localization

Title Geometry-Aware Learning of Maps for Camera Localization
Authors Samarth Brahmbhatt, Jinwei Gu, Kihwan Kim, James Hays, Jan Kautz
Abstract Maps are a key component in image-based camera localization and visual SLAM systems: they are used to establish geometric constraints between images, correct drift in relative pose estimation, and relocalize cameras after lost tracking. The exact definitions of maps, however, are often application-specific and hand-crafted for different scenarios (e.g. 3D landmarks, lines, planes, bags of visual words). We propose to represent maps as a deep neural net called MapNet, which enables learning a data-driven map representation. Unlike prior work on learning maps, MapNet exploits cheap and ubiquitous sensory inputs like visual odometry and GPS in addition to images and fuses them together for camera localization. Geometric constraints expressed by these inputs, which have traditionally been used in bundle adjustment or pose-graph optimization, are formulated as loss terms in MapNet training and also used during inference. In addition to directly improving localization accuracy, this allows us to update the MapNet (i.e., maps) in a self-supervised manner using additional unlabeled video sequences from the scene. We also propose a novel parameterization for camera rotation which is better suited for deep-learning based camera pose regression. Experimental results on both the indoor 7-Scenes dataset and the outdoor Oxford RobotCar dataset show significant performance improvement over prior work. The MapNet project webpage is https://goo.gl/mRB3Au.
Tasks 3D Pose Estimation, Camera Localization, Pose Estimation, Visual Odometry
Published 2017-12-09
URL http://arxiv.org/abs/1712.03342v3
PDF http://arxiv.org/pdf/1712.03342v3.pdf
PWC https://paperswithcode.com/paper/geometry-aware-learning-of-maps-for-camera
Repo https://github.com/NVlabs/geomapnet
Framework pytorch

Cross-lingual Word Segmentation and Morpheme Segmentation as Sequence Labelling

Title Cross-lingual Word Segmentation and Morpheme Segmentation as Sequence Labelling
Authors Yan Shao
Abstract This paper presents our segmentation system developed for the MLP 2017 shared tasks on cross-lingual word segmentation and morpheme segmentation. We model both word and morpheme segmentation as character-level sequence labelling tasks. The prevalent bidirectional recurrent neural network with conditional random fields as the output interface is adapted as the baseline system, which is further improved via ensemble decoding. Our universal system is applied to and extensively evaluated on all the official data sets without any language-specific adjustment. The official evaluation results indicate that the proposed model achieves outstanding accuracies both for word and morpheme segmentation on all the languages in various types when compared to the other participating systems.
Tasks
Published 2017-09-12
URL http://arxiv.org/abs/1709.03756v1
PDF http://arxiv.org/pdf/1709.03756v1.pdf
PWC https://paperswithcode.com/paper/cross-lingual-word-segmentation-and-morpheme
Repo https://github.com/yanshao9798/segmenter
Framework tf

Activation Maximization Generative Adversarial Nets

Title Activation Maximization Generative Adversarial Nets
Authors Zhiming Zhou, Han Cai, Shu Rong, Yuxuan Song, Kan Ren, Weinan Zhang, Yong Yu, Jun Wang
Abstract Class labels have been empirically shown useful in improving the sample quality of generative adversarial nets (GANs). In this paper, we mathematically study the properties of the current variants of GANs that make use of class label information. With class aware gradient and cross-entropy decomposition, we reveal how class labels and associated losses influence GAN’s training. Based on that, we propose Activation Maximization Generative Adversarial Networks (AM-GAN) as an advanced solution. Comprehensive experiments have been conducted to validate our analysis and evaluate the effectiveness of our solution, where AM-GAN outperforms other strong baselines and achieves state-of-the-art Inception Score (8.91) on CIFAR-10. In addition, we demonstrate that, with the Inception ImageNet classifier, Inception Score mainly tracks the diversity of the generator, and there is, however, no reliable evidence that it can reflect the true sample quality. We thus propose a new metric, called AM Score, to provide a more accurate estimation of the sample quality. Our proposed model also outperforms the baseline methods in the new metric.
Tasks
Published 2017-03-06
URL http://arxiv.org/abs/1703.02000v9
PDF http://arxiv.org/pdf/1703.02000v9.pdf
PWC https://paperswithcode.com/paper/activation-maximization-generative
Repo https://github.com/ZhimingZhou/AM-GAN
Framework tf

VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

Title VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
Authors Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, Christian Theobalt
Abstract We present the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera. Our method combines a new convolutional neural network (CNN) based pose regressor with kinematic skeleton fitting. Our novel fully-convolutional pose formulation regresses 2D and 3D joint positions jointly in real time and does not require tightly cropped input frames. A real-time kinematic skeleton fitting method uses the CNN output to yield temporally stable 3D global pose reconstructions on the basis of a coherent kinematic skeleton. This makes our approach the first monocular RGB method usable in real-time applications such as 3D character control—thus far, the only monocular methods for such applications employed specialized RGB-D cameras. Our method’s accuracy is quantitatively on par with the best offline 3D monocular RGB pose estimation methods. Our results are qualitatively comparable to, and sometimes better than, results from monocular RGB-D approaches, such as the Kinect. However, we show that our approach is more broadly applicable than RGB-D solutions, i.e. it works for outdoor scenes, community videos, and low quality commodity RGB cameras.
Tasks 3D Human Pose Estimation, Pose Estimation
Published 2017-05-03
URL http://arxiv.org/abs/1705.01583v1
PDF http://arxiv.org/pdf/1705.01583v1.pdf
PWC https://paperswithcode.com/paper/vnect-real-time-3d-human-pose-estimation-with
Repo https://github.com/XinArkh/VNect
Framework tf

KBGAN: Adversarial Learning for Knowledge Graph Embeddings

Title KBGAN: Adversarial Learning for Knowledge Graph Embeddings
Authors Liwei Cai, William Yang Wang
Abstract We introduce KBGAN, an adversarial learning framework to improve the performances of a wide range of existing knowledge graph embedding models. Because knowledge graphs typically only contain positive facts, sampling useful negative training examples is a non-trivial task. Replacing the head or tail entity of a fact with a uniformly randomly selected entity is a conventional method for generating negative facts, but the majority of the generated negative facts can be easily discriminated from positive facts, and will contribute little towards the training. Inspired by generative adversarial networks (GANs), we use one knowledge graph embedding model as a negative sample generator to assist the training of our desired model, which acts as the discriminator in GANs. This framework is independent of the concrete form of generator and discriminator, and therefore can utilize a wide variety of knowledge graph embedding models as its building blocks. In experiments, we adversarially train two translation-based models, TransE and TransD, each with assistance from one of the two probability-based models, DistMult and ComplEx. We evaluate the performances of KBGAN on the link prediction task, using three knowledge base completion datasets: FB15k-237, WN18 and WN18RR. Experimental results show that adversarial training substantially improves the performances of target embedding models under various settings.
Tasks Graph Embedding, Knowledge Base Completion, Knowledge Graph Embedding, Knowledge Graph Embeddings, Knowledge Graphs, Link Prediction
Published 2017-11-11
URL http://arxiv.org/abs/1711.04071v3
PDF http://arxiv.org/pdf/1711.04071v3.pdf
PWC https://paperswithcode.com/paper/kbgan-adversarial-learning-for-knowledge
Repo https://github.com/Chenxr1997/KBGAN
Framework pytorch

On the Effectiveness of Least Squares Generative Adversarial Networks

Title On the Effectiveness of Least Squares Generative Adversarial Networks
Authors Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, Zhen Wang, Stephen Paul Smolley
Abstract Unsupervised learning with generative adversarial networks (GANs) has proven to be hugely successful. Regular GANs hypothesize the discriminator as a classifier with the sigmoid cross entropy loss function. However, we found that this loss function may lead to the vanishing gradients problem during the learning process. To overcome such a problem, we propose in this paper the Least Squares Generative Adversarial Networks (LSGANs) which adopt the least squares loss for both the discriminator and the generator. We show that minimizing the objective function of LSGAN yields minimizing the Pearson $\chi^2$ divergence. We also show that the derived objective function that yields minimizing the Pearson $\chi^2$ divergence performs better than the classical one of using least squares for classification. There are two benefits of LSGANs over regular GANs. First, LSGANs are able to generate higher quality images than regular GANs. Second, LSGANs perform more stably during the learning process. For evaluating the image quality, we conduct both qualitative and quantitative experiments, and the experimental results show that LSGANs can generate higher quality images than regular GANs. Furthermore, we evaluate the stability of LSGANs in two groups. One is to compare between LSGANs and regular GANs without gradient penalty. We conduct three experiments, including Gaussian mixture distribution, difficult architectures, and a newly proposed method — datasets with small variability, to illustrate the stability of LSGANs. The other one is to compare between LSGANs with gradient penalty (LSGANs-GP) and WGANs with gradient penalty (WGANs-GP). The experimental results show that LSGANs-GP succeed in training for all the difficult architectures used in WGANs-GP, including 101-layer ResNet.
Tasks
Published 2017-12-18
URL http://arxiv.org/abs/1712.06391v2
PDF http://arxiv.org/pdf/1712.06391v2.pdf
PWC https://paperswithcode.com/paper/on-the-effectiveness-of-least-squares
Repo https://github.com/xudonmao/improved_LSGAN
Framework tf

Lost Relatives of the Gumbel Trick

Title Lost Relatives of the Gumbel Trick
Authors Matej Balog, Nilesh Tripuraneni, Zoubin Ghahramani, Adrian Weller
Abstract The Gumbel trick is a method to sample from a discrete probability distribution, or to estimate its normalizing partition function. The method relies on repeatedly applying a random perturbation to the distribution in a particular way, each time solving for the most likely configuration. We derive an entire family of related methods, of which the Gumbel trick is one member, and show that the new methods have superior properties in several settings with minimal additional computational cost. In particular, for the Gumbel trick to yield computational benefits for discrete graphical models, Gumbel perturbations on all configurations are typically replaced with so-called low-rank perturbations. We show how a subfamily of our new methods adapts to this setting, proving new upper and lower bounds on the log partition function and deriving a family of sequential samplers for the Gibbs distribution. Finally, we balance the discussion by showing how the simpler analytical form of the Gumbel trick enables additional theoretical results.
Tasks
Published 2017-06-13
URL http://arxiv.org/abs/1706.04161v1
PDF http://arxiv.org/pdf/1706.04161v1.pdf
PWC https://paperswithcode.com/paper/lost-relatives-of-the-gumbel-trick
Repo https://github.com/matejbalog/gumbel-relatives
Framework none

Pycobra: A Python Toolbox for Ensemble Learning and Visualisation

Title Pycobra: A Python Toolbox for Ensemble Learning and Visualisation
Authors Benjamin Guedj, Bhargav Srinivasa Desikan
Abstract We introduce \texttt{pycobra}, a Python library devoted to ensemble learning (regression and classification) and visualisation. Its main assets are the implementation of several ensemble learning algorithms, a flexible and generic interface to compare and blend any existing machine learning algorithm available in Python libraries (as long as a \texttt{predict} method is given), and visualisation tools such as Voronoi tessellations. \texttt{pycobra} is fully \texttt{scikit-learn} compatible and is released under the MIT open-source license. \texttt{pycobra} can be downloaded from the Python Package Index (PyPi) and Machine Learning Open Source Software (MLOSS). The current version (along with Jupyter notebooks, extensive documentation, and continuous integration tests) is available at \href{https://github.com/bhargavvader/pycobra}{https://github.com/bhargavvader/pycobra} and official documentation website is \href{https://modal.lille.inria.fr/pycobra}{https://modal.lille.inria.fr/pycobra}.
Tasks
Published 2017-04-25
URL https://arxiv.org/abs/1707.00558v3
PDF https://arxiv.org/pdf/1707.00558v3.pdf
PWC https://paperswithcode.com/paper/pycobra-a-python-toolbox-for-ensemble
Repo https://github.com/bhargavvader/pycobra
Framework none

Improving Object Localization with Fitness NMS and Bounded IoU Loss

Title Improving Object Localization with Fitness NMS and Bounded IoU Loss
Authors Lachlan Tychsen-Smith, Lars Petersson
Abstract We demonstrate that many detection methods are designed to identify only a sufficently accurate bounding box, rather than the best available one. To address this issue we propose a simple and fast modification to the existing methods called Fitness NMS. This method is tested with the DeNet model and obtains a significantly improved MAP at greater localization accuracies without a loss in evaluation rate, and can be used in conjunction with Soft NMS for additional improvements. Next we derive a novel bounding box regression loss based on a set of IoU upper bounds that better matches the goal of IoU maximization while still providing good convergence properties. Following these novelties we investigate RoI clustering schemes for improving evaluation rates for the DeNet wide model variants and provide an analysis of localization performance at various input image dimensions. We obtain a MAP of 33.6%@79Hz and 41.8%@5Hz for MSCOCO and a Titan X (Maxwell). Source code available from: https://github.com/lachlants/denet
Tasks Object Localization
Published 2017-11-01
URL http://arxiv.org/abs/1711.00164v3
PDF http://arxiv.org/pdf/1711.00164v3.pdf
PWC https://paperswithcode.com/paper/improving-object-localization-with-fitness
Repo https://github.com/lachlants/denet
Framework none

SPARQL as a Foreign Language

Title SPARQL as a Foreign Language
Authors Tommaso Soru, Edgard Marx, Diego Moussallem, Gustavo Publio, André Valdestilhas, Diego Esteves, Ciro Baron Neto
Abstract In the last years, the Linked Data Cloud has achieved a size of more than 100 billion facts pertaining to a multitude of domains. However, accessing this information has been significantly challenging for lay users. Approaches to problems such as Question Answering on Linked Data and Link Discovery have notably played a role in increasing information access. These approaches are often based on handcrafted and/or statistical models derived from data observation. Recently, Deep Learning architectures based on Neural Networks called seq2seq have shown to achieve state-of-the-art results at translating sequences into sequences. In this direction, we propose Neural SPARQL Machines, end-to-end deep architectures to translate any natural language expression into sentences encoding SPARQL queries. Our preliminary results, restricted on selected DBpedia classes, show that Neural SPARQL Machines are a promising approach for Question Answering on Linked Data, as they can deal with known problems such as vocabulary mismatch and perform graph pattern composition.
Tasks Question Answering
Published 2017-08-25
URL http://arxiv.org/abs/1708.07624v1
PDF http://arxiv.org/pdf/1708.07624v1.pdf
PWC https://paperswithcode.com/paper/sparql-as-a-foreign-language
Repo https://github.com/AKSW/NSpM
Framework tf

End-to-End Information Extraction without Token-Level Supervision

Title End-to-End Information Extraction without Token-Level Supervision
Authors Rasmus Berg Palm, Dirk Hovy, Florian Laws, Ole Winther
Abstract Most state-of-the-art information extraction approaches rely on token-level labels to find the areas of interest in text. Unfortunately, these labels are time-consuming and costly to create, and consequently, not available for many real-life IE tasks. To make matters worse, token-level labels are usually not the desired output, but just an intermediary step. End-to-end (E2E) models, which take raw text as input and produce the desired output directly, need not depend on token-level labels. We propose an E2E model based on pointer networks, which can be trained directly on pairs of raw input and output text. We evaluate our model on the ATIS data set, MIT restaurant corpus and the MIT movie corpus and compare to neural baselines that do use token-level labels. We achieve competitive results, within a few percentage points of the baselines, showing the feasibility of E2E information extraction without the need for token-level labels. This opens up new possibilities, as for many tasks currently addressed by human extractors, raw input and output data are available, but not token-level labels.
Tasks
Published 2017-07-16
URL http://arxiv.org/abs/1707.04913v1
PDF http://arxiv.org/pdf/1707.04913v1.pdf
PWC https://paperswithcode.com/paper/end-to-end-information-extraction-without
Repo https://github.com/rasmusbergpalm/e2e-ie-release
Framework none
comments powered by Disqus