July 30, 2019

3017 words 15 mins read

Paper Group AWR 61

Online Spatial Concept and Lexical Acquisition with Simultaneous Localization and Mapping. Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution, resize convolution and convolution resize. Training a Subsampling Mechanism in Expectation. Deep Active Learning for Named Entity Recognition. Appearance-and-Relation Networks …

Online Spatial Concept and Lexical Acquisition with Simultaneous Localization and Mapping


Title	Online Spatial Concept and Lexical Acquisition with Simultaneous Localization and Mapping
Authors	Akira Taniguchi, Yoshinobu Hagiwara, Tadahiro Taniguchi, Tetsunari Inamura
Abstract	In this paper, we propose an online learning algorithm based on a Rao-Blackwellized particle filter for spatial concept acquisition and mapping. We have proposed a nonparametric Bayesian spatial concept acquisition model (SpCoA). We propose a novel method (SpCoSLAM) integrating SpCoA and FastSLAM in the theoretical framework of the Bayesian generative model. The proposed method can simultaneously learn place categories and lexicons while incrementally generating an environmental map. Furthermore, the proposed method has scene image features and a language model added to SpCoA. In the experiments, we tested online learning of spatial concepts and environmental maps in a novel environment of which the robot did not have a map. Then, we evaluated the results of online learning of spatial concepts and lexical acquisition. The experimental results demonstrated that the robot was able to more accurately learn the relationships between words and the place in the environmental map incrementally by using the proposed method.
Tasks	Language Modelling, Simultaneous Localization and Mapping
Published	2017-04-15
URL	http://arxiv.org/abs/1704.04664v2
PDF	http://arxiv.org/pdf/1704.04664v2.pdf
PWC	https://paperswithcode.com/paper/online-spatial-concept-and-lexical
Repo	https://github.com/EmergentSystemLabStudent/SpCoSLAM_Lets
Framework	none

Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution, resize convolution and convolution resize


Title	Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution, resize convolution and convolution resize
Authors	Andrew Aitken, Christian Ledig, Lucas Theis, Jose Caballero, Zehan Wang, Wenzhe Shi
Abstract	The most prominent problem associated with the deconvolution layer is the presence of checkerboard artifacts in output images and dense labels. To combat this problem, smoothness constraints, post processing and different architecture designs have been proposed. Odena et al. highlight three sources of checkerboard artifacts: deconvolution overlap, random initialization and loss functions. In this note, we proposed an initialization method for sub-pixel convolution known as convolution NN resize. Compared to sub-pixel convolution initialized with schemes designed for standard convolution kernels, it is free from checkerboard artifacts immediately after initialization. Compared to resize convolution, at the same computational complexity, it has more modelling power and converges to solutions with smaller test errors.
Tasks
Published	2017-07-10
URL	http://arxiv.org/abs/1707.02937v1
PDF	http://arxiv.org/pdf/1707.02937v1.pdf
PWC	https://paperswithcode.com/paper/checkerboard-artifact-free-sub-pixel
Repo	https://github.com/imatge-upc/3D-GAN-superresolution
Framework	tf

Training a Subsampling Mechanism in Expectation


Title	Training a Subsampling Mechanism in Expectation
Authors	Colin Raffel, Dieterich Lawson
Abstract	We describe a mechanism for subsampling sequences and show how to compute its expected output so that it can be trained with standard backpropagation. We test this approach on a simple toy problem and discuss its shortcomings.
Tasks
Published	2017-02-22
URL	http://arxiv.org/abs/1702.06914v3
PDF	http://arxiv.org/pdf/1702.06914v3.pdf
PWC	https://paperswithcode.com/paper/training-a-subsampling-mechanism-in
Repo	https://github.com/craffel/subsampling_in_expectation
Framework	none

Deep Active Learning for Named Entity Recognition


Title	Deep Active Learning for Named Entity Recognition
Authors	Yanyao Shen, Hyokun Yun, Zachary C. Lipton, Yakov Kronrod, Animashree Anandkumar
Abstract	Deep learning has yielded state-of-the-art performance on many natural language processing tasks including named entity recognition (NER). However, this typically requires large amounts of labeled data. In this work, we demonstrate that the amount of labeled training data can be drastically reduced when deep learning is combined with active learning. While active learning is sample-efficient, it can be computationally expensive since it requires iterative retraining. To speed this up, we introduce a lightweight architecture for NER, viz., the CNN-CNN-LSTM model consisting of convolutional character and word encoders and a long short term memory (LSTM) tag decoder. The model achieves nearly state-of-the-art performance on standard datasets for the task while being computationally much more efficient than best performing models. We carry out incremental active learning, during the training process, and are able to nearly match state-of-the-art performance with just 25% of the original training data.
Tasks	Active Learning, Named Entity Recognition
Published	2017-07-19
URL	http://arxiv.org/abs/1707.05928v3
PDF	http://arxiv.org/pdf/1707.05928v3.pdf
PWC	https://paperswithcode.com/paper/deep-active-learning-for-named-entity
Repo	https://github.com/tonygsw/Joint-Extraction-of-Entities-and-Relations-Based-on-a-Novel-Tagging-Scheme
Framework	pytorch

Appearance-and-Relation Networks for Video Classification


Title	Appearance-and-Relation Networks for Video Classification
Authors	Limin Wang, Wei Li, Wen Li, Luc Van Gool
Abstract	Spatiotemporal feature learning in videos is a fundamental problem in computer vision. This paper presents a new architecture, termed as Appearance-and-Relation Network (ARTNet), to learn video representation in an end-to-end manner. ARTNets are constructed by stacking multiple generic building blocks, called as SMART, whose goal is to simultaneously model appearance and relation from RGB input in a separate and explicit manner. Specifically, SMART blocks decouple the spatiotemporal learning module into an appearance branch for spatial modeling and a relation branch for temporal modeling. The appearance branch is implemented based on the linear combination of pixels or filter responses in each frame, while the relation branch is designed based on the multiplicative interactions between pixels or filter responses across multiple frames. We perform experiments on three action recognition benchmarks: Kinetics, UCF101, and HMDB51, demonstrating that SMART blocks obtain an evident improvement over 3D convolutions for spatiotemporal feature learning. Under the same training setting, ARTNets achieve superior performance on these three datasets to the existing state-of-the-art methods.
Tasks	Temporal Action Localization, Video Classification
Published	2017-11-24
URL	http://arxiv.org/abs/1711.09125v2
PDF	http://arxiv.org/pdf/1711.09125v2.pdf
PWC	https://paperswithcode.com/paper/appearance-and-relation-networks-for-video
Repo	https://github.com/wanglimin/ARTNet
Framework	none

Supervised Community Detection with Line Graph Neural Networks


Title	Supervised Community Detection with Line Graph Neural Networks
Authors	Zhengdao Chen, Xiang Li, Joan Bruna
Abstract	We study data-driven methods for community detection on graphs, an inverse problem that is typically solved in terms of the spectrum of certain operators or via posterior inference under certain probabilistic graphical models. Focusing on random graph families such as the stochastic block model, recent research has unified both approaches and identified both statistical and computational signal-to-noise detection thresholds. This graph inference task can be recast as a node-wise graph classification problem, and, as such, computational detection thresholds can be translated in terms of learning within appropriate models. We present a novel family of Graph Neural Networks (GNNs) and show that they can reach those detection thresholds in a purely data-driven manner without access to the underlying generative models, and even improve upon current computational thresholds in hard regimes. For that purpose, we propose to augment GNNs with the non-backtracking operator, defined on the line graph of edge adjacencies. We also perform the first analysis of optimization landscape on using GNNs to solve community detection problems, demonstrating that under certain simplifications and assumptions, the loss value at the local minima is close to the loss value at the global minimum/minima. Finally, the resulting model is also tested on real datasets, performing significantly better than previous models.
Tasks	Community Detection, Graph Classification
Published	2017-05-23
URL	http://arxiv.org/abs/1705.08415v5
PDF	http://arxiv.org/pdf/1705.08415v5.pdf
PWC	https://paperswithcode.com/paper/supervised-community-detection-with-line
Repo	https://github.com/afansi/multiscalegnn
Framework	pytorch

A Restaurant Process Mixture Model for Connectivity Based Parcellation of the Cortex


Title	A Restaurant Process Mixture Model for Connectivity Based Parcellation of the Cortex
Authors	Daniel Moyer, Boris A Gutman, Neda Jahanshad, Paul M. Thompson
Abstract	One of the primary objectives of human brain mapping is the division of the cortical surface into functionally distinct regions, i.e. parcellation. While it is generally agreed that at macro-scale different regions of the cortex have different functions, the exact number and configuration of these regions is not known. Methods for the discovery of these regions are thus important, particularly as the volume of available information grows. Towards this end, we present a parcellation method based on a Bayesian non-parametric mixture model of cortical connectivity.
Tasks
Published	2017-03-02
URL	http://arxiv.org/abs/1703.00981v1
PDF	http://arxiv.org/pdf/1703.00981v1.pdf
PWC	https://paperswithcode.com/paper/a-restaurant-process-mixture-model-for
Repo	https://github.com/kristianeschenburg/ddCRP
Framework	none

SemRe-Rank: Improving Automatic Term Extraction By Incorporating Semantic Relatedness With Personalised PageRank


Title	SemRe-Rank: Improving Automatic Term Extraction By Incorporating Semantic Relatedness With Personalised PageRank
Authors	Ziqi Zhang, Jie Gao, Fabio Ciravegna
Abstract	Automatic Term Extraction deals with the extraction of terminology from a domain specific corpus, and has long been an established research area in data and knowledge acquisition. ATE remains a challenging task as it is known that there is no existing ATE methods that can consistently outperform others in any domain. This work adopts a refreshed perspective to this problem: instead of searching for such a ‘one-size-fit-all’ solution that may never exist, we propose to develop generic methods to ‘enhance’ existing ATE methods. We introduce SemRe-Rank, the first method based on this principle, to incorporate semantic relatedness - an often overlooked venue - into an existing ATE method to further improve its performance. SemRe-Rank incorporates word embeddings into a personalised PageRank process to compute ‘semantic importance’ scores for candidate terms from a graph of semantically related words (nodes), which are then used to revise the scores of candidate terms computed by a base ATE algorithm. Extensively evaluated with 13 state-of-the-art base ATE methods on four datasets of diverse nature, it is shown to have achieved widespread improvement over all base methods and across all datasets, with up to 15 percentage points when measured by the Precision in the top ranked K candidate terms (the average for a set of K’s), or up to 28 percentage points in F1 measured at a K that equals to the expected real terms in the candidates (F1 in short). Compared to an alternative approach built on the well-known TextRank algorithm, SemRe-Rank can potentially outperform by up to 8 points in Precision at top K, or up to 17 points in F1.
Tasks	Word Embeddings
Published	2017-11-09
URL	http://arxiv.org/abs/1711.03373v3
PDF	http://arxiv.org/pdf/1711.03373v3.pdf
PWC	https://paperswithcode.com/paper/semre-rank-improving-automatic-term
Repo	https://github.com/ziqizhang/semrerank
Framework	none

Multiple Instance Learning Networks for Fine-Grained Sentiment Analysis


Title	Multiple Instance Learning Networks for Fine-Grained Sentiment Analysis
Authors	Stefanos Angelidis, Mirella Lapata
Abstract	We consider the task of fine-grained sentiment analysis from the perspective of multiple instance learning (MIL). Our neural model is trained on document sentiment labels, and learns to predict the sentiment of text segments, i.e. sentences or elementary discourse units (EDUs), without segment-level supervision. We introduce an attention-based polarity scoring method for identifying positive and negative text snippets and a new dataset which we call SPOT (as shorthand for Segment-level POlariTy annotations) for evaluating MIL-style sentiment models like ours. Experimental results demonstrate superior performance against multiple baselines, whereas a judgement elicitation study shows that EDU-level opinion extraction produces more informative summaries than sentence-based alternatives.
Tasks	Multiple Instance Learning, Sentiment Analysis
Published	2017-11-27
URL	http://arxiv.org/abs/1711.09645v2
PDF	http://arxiv.org/pdf/1711.09645v2.pdf
PWC	https://paperswithcode.com/paper/multiple-instance-learning-networks-for-fine
Repo	https://github.com/stangelid/milnet-sent
Framework	none

Visual Translation Embedding Network for Visual Relation Detection


Title	Visual Translation Embedding Network for Visual Relation Detection
Authors	Hanwang Zhang, Zawlin Kyaw, Shih-Fu Chang, Tat-Seng Chua
Abstract	Visual relations, such as “person ride bike” and “bike next to car”, offer a comprehensive scene understanding of an image, and have already shown their great utility in connecting computer vision and natural language. However, due to the challenging combinatorial complexity of modeling subject-predicate-object relation triplets, very little work has been done to localize and predict visual relations. Inspired by the recent advances in relational representation learning of knowledge bases and convolutional object detection networks, we propose a Visual Translation Embedding network (VTransE) for visual relation detection. VTransE places objects in a low-dimensional relation space where a relation can be modeled as a simple vector translation, i.e., subject + predicate $\approx$ object. We propose a novel feature extraction layer that enables object-relation knowledge transfer in a fully-convolutional fashion that supports training and inference in a single forward/backward pass. To the best of our knowledge, VTransE is the first end-to-end relation detection network. We demonstrate the effectiveness of VTransE over other state-of-the-art methods on two large-scale datasets: Visual Relationship and Visual Genome. Note that even though VTransE is a purely visual model, it is still competitive to the Lu’s multi-modal model with language priors.
Tasks	Object Detection, Representation Learning, Scene Understanding, Transfer Learning
Published	2017-02-27
URL	http://arxiv.org/abs/1702.08319v1
PDF	http://arxiv.org/pdf/1702.08319v1.pdf
PWC	https://paperswithcode.com/paper/visual-translation-embedding-network-for
Repo	https://github.com/zawlin/cvpr17_vtranse
Framework	caffe2

Video Highlights Detection and Summarization with Lag-Calibration based on Concept-Emotion Mapping of Crowd-sourced Time-Sync Comments


Title	Video Highlights Detection and Summarization with Lag-Calibration based on Concept-Emotion Mapping of Crowd-sourced Time-Sync Comments
Authors	Qing Ping, Chaomei Chen
Abstract	With the prevalence of video sharing, there are increasing demands for automatic video digestion such as highlight detection. Recently, platforms with crowdsourced time-sync video comments have emerged worldwide, providing a good opportunity for highlight detection. However, this task is non-trivial: (1) time-sync comments often lag behind their corresponding shot; (2) time-sync comments are semantically sparse and noisy; (3) to determine which shots are highlights is highly subjective. The present paper aims to tackle these challenges by proposing a framework that (1) uses concept-mapped lexical-chains for lag calibration; (2) models video highlights based on comment intensity and combination of emotion and concept concentration of each shot; (3) summarize each detected highlight using improved SumBasic with emotion and concept mapping. Experiments on large real-world datasets show that our highlight detection method and summarization method both outperform other benchmarks with considerable margins.
Tasks	Calibration
Published	2017-08-07
URL	http://arxiv.org/abs/1708.02210v1
PDF	http://arxiv.org/pdf/1708.02210v1.pdf
PWC	https://paperswithcode.com/paper/video-highlights-detection-and-summarization-1
Repo	https://github.com/ChanningPing/VideoHighlightDetection
Framework	none

Unsupervised Holistic Image Generation from Key Local Patches


Title	Unsupervised Holistic Image Generation from Key Local Patches
Authors	Donghoon Lee, Sangdoo Yun, Sungjoon Choi, Hwiyeon Yoo, Ming-Hsuan Yang, Songhwai Oh
Abstract	We introduce a new problem of generating an image based on a small number of key local patches without any geometric prior. In this work, key local patches are defined as informative regions of the target object or scene. This is a challenging problem since it requires generating realistic images and predicting locations of parts at the same time. We construct adversarial networks to tackle this problem. A generator network generates a fake image as well as a mask based on the encoder-decoder framework. On the other hand, a discriminator network aims to detect fake images. The network is trained with three losses to consider spatial, appearance, and adversarial information. The spatial loss determines whether the locations of predicted parts are correct. Input patches are restored in the output image without much modification due to the appearance loss. The adversarial loss ensures output images are realistic. The proposed network is trained without supervisory signals since no labels of key parts are required. Experimental results on six datasets demonstrate that the proposed algorithm performs favorably on challenging objects and scenes.
Tasks	Image Generation
Published	2017-03-31
URL	http://arxiv.org/abs/1703.10730v2
PDF	http://arxiv.org/pdf/1703.10730v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-holistic-image-generation-from
Repo	https://github.com/hellbell/KeyPatchGan
Framework	pytorch

BEGAN: Boundary Equilibrium Generative Adversarial Networks


Title	BEGAN: Boundary Equilibrium Generative Adversarial Networks
Authors	David Berthelot, Thomas Schumm, Luke Metz
Abstract	We propose a new equilibrium enforcing method paired with a loss derived from the Wasserstein distance for training auto-encoder based Generative Adversarial Networks. This method balances the generator and discriminator during training. Additionally, it provides a new approximate convergence measure, fast and stable training and high visual quality. We also derive a way of controlling the trade-off between image diversity and visual quality. We focus on the image generation task, setting a new milestone in visual quality, even at higher resolutions. This is achieved while using a relatively simple model architecture and a standard training procedure.
Tasks	Image Generation
Published	2017-03-31
URL	http://arxiv.org/abs/1703.10717v4
PDF	http://arxiv.org/pdf/1703.10717v4.pdf
PWC	https://paperswithcode.com/paper/began-boundary-equilibrium-generative
Repo	https://github.com/eliceio/vocal-style-transfer
Framework	tf

Hierarchical Spatial-aware Siamese Network for Thermal Infrared Object Tracking


Title	Hierarchical Spatial-aware Siamese Network for Thermal Infrared Object Tracking
Authors	Xin Li, Qiao Liu, Nana Fan, Zhenyu He, Hongzhi Wang
Abstract	Most thermal infrared (TIR) tracking methods are discriminative, treating the tracking problem as a classification task. However, the objective of the classifier (label prediction) is not coupled to the objective of the tracker (location estimation). The classification task focuses on the between-class difference of the arbitrary objects, while the tracking task mainly deals with the within-class difference of the same objects. In this paper, we cast the TIR tracking problem as a similarity verification task, which is coupled well to the objective of the tracking task. We propose a TIR tracker via a Hierarchical Spatial-aware Siamese Convolutional Neural Network (CNN), named HSSNet. To obtain both spatial and semantic features of the TIR object, we design a Siamese CNN that coalesces the multiple hierarchical convolutional layers. Then, we propose a spatial-aware network to enhance the discriminative ability of the coalesced hierarchical feature. Subsequently, we train this network end to end on a large visible video detection dataset to learn the similarity between paired objects before we transfer the network into the TIR domain. Next, this pre-trained Siamese network is used to evaluate the similarity between the target template and target candidates. Finally, we locate the candidate that is most similar to the tracked target. Extensive experimental results on the benchmarks VOT-TIR 2015 and VOT-TIR 2016 show that our proposed method achieves favourable performance compared to the state-of-the-art methods.
Tasks	Object Tracking, Thermal Infrared Object Tracking
Published	2017-11-27
URL	http://arxiv.org/abs/1711.09539v2
PDF	http://arxiv.org/pdf/1711.09539v2.pdf
PWC	https://paperswithcode.com/paper/hierarchical-spatial-aware-siamese-network
Repo	https://github.com/QiaoLiuHit/HSSNet
Framework	none

Hidden Talents of the Variational Autoencoder


Title	Hidden Talents of the Variational Autoencoder
Authors	Bin Dai, Yu Wang, John Aston, Gang Hua, David Wipf
Abstract	Variational autoencoders (VAE) represent a popular, flexible form of deep generative model that can be stochastically fit to samples from a given random process using an information-theoretic variational bound on the true underlying distribution. Once so-obtained, the model can be putatively used to generate new samples from this distribution, or to provide a low-dimensional latent representation of existing samples. While quite effective in numerous application domains, certain important mechanisms which govern the behavior of the VAE are obfuscated by the intractable integrals and resulting stochastic approximations involved. Moreover, as a highly non-convex model, it remains unclear exactly how minima of the underlying energy relate to original design purposes. We attempt to better quantify these issues by analyzing a series of tractable special cases of increasing complexity. In doing so, we unveil interesting connections with more traditional dimensionality reduction models, as well as an intrinsic yet underappreciated propensity for robustly dismissing sparse outliers when estimating latent manifolds. With respect to the latter, we demonstrate that the VAE can be viewed as the natural evolution of recent robust PCA models, capable of learning nonlinear manifolds of unknown dimension obscured by gross corruptions.
Tasks	Dimensionality Reduction
Published	2017-06-16
URL	https://arxiv.org/abs/1706.05148v5
PDF	https://arxiv.org/pdf/1706.05148v5.pdf
PWC	https://paperswithcode.com/paper/hidden-talents-of-the-variational-autoencoder
Repo	https://github.com/Kismuz/crypto_spread_test
Framework	tf