Paper Group AWR 61
Online Spatial Concept and Lexical Acquisition with Simultaneous Localization and Mapping. Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution, resize convolution and convolution resize. Training a Subsampling Mechanism in Expectation. Deep Active Learning for Named Entity Recognition. Appearance-and-Relation Networks …
Online Spatial Concept and Lexical Acquisition with Simultaneous Localization and Mapping
Title | Online Spatial Concept and Lexical Acquisition with Simultaneous Localization and Mapping |
Authors | Akira Taniguchi, Yoshinobu Hagiwara, Tadahiro Taniguchi, Tetsunari Inamura |
Abstract | In this paper, we propose an online learning algorithm based on a Rao-Blackwellized particle filter for spatial concept acquisition and mapping. We have proposed a nonparametric Bayesian spatial concept acquisition model (SpCoA). We propose a novel method (SpCoSLAM) integrating SpCoA and FastSLAM in the theoretical framework of the Bayesian generative model. The proposed method can simultaneously learn place categories and lexicons while incrementally generating an environmental map. Furthermore, the proposed method has scene image features and a language model added to SpCoA. In the experiments, we tested online learning of spatial concepts and environmental maps in a novel environment of which the robot did not have a map. Then, we evaluated the results of online learning of spatial concepts and lexical acquisition. The experimental results demonstrated that the robot was able to more accurately learn the relationships between words and the place in the environmental map incrementally by using the proposed method. |
Tasks | Language Modelling, Simultaneous Localization and Mapping |
Published | 2017-04-15 |
URL | http://arxiv.org/abs/1704.04664v2 |
http://arxiv.org/pdf/1704.04664v2.pdf | |
PWC | https://paperswithcode.com/paper/online-spatial-concept-and-lexical |
Repo | https://github.com/EmergentSystemLabStudent/SpCoSLAM_Lets |
Framework | none |
Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution, resize convolution and convolution resize
Title | Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution, resize convolution and convolution resize |
Authors | Andrew Aitken, Christian Ledig, Lucas Theis, Jose Caballero, Zehan Wang, Wenzhe Shi |
Abstract | The most prominent problem associated with the deconvolution layer is the presence of checkerboard artifacts in output images and dense labels. To combat this problem, smoothness constraints, post processing and different architecture designs have been proposed. Odena et al. highlight three sources of checkerboard artifacts: deconvolution overlap, random initialization and loss functions. In this note, we proposed an initialization method for sub-pixel convolution known as convolution NN resize. Compared to sub-pixel convolution initialized with schemes designed for standard convolution kernels, it is free from checkerboard artifacts immediately after initialization. Compared to resize convolution, at the same computational complexity, it has more modelling power and converges to solutions with smaller test errors. |
Tasks | |
Published | 2017-07-10 |
URL | http://arxiv.org/abs/1707.02937v1 |
http://arxiv.org/pdf/1707.02937v1.pdf | |
PWC | https://paperswithcode.com/paper/checkerboard-artifact-free-sub-pixel |
Repo | https://github.com/imatge-upc/3D-GAN-superresolution |
Framework | tf |
Training a Subsampling Mechanism in Expectation
Title | Training a Subsampling Mechanism in Expectation |
Authors | Colin Raffel, Dieterich Lawson |
Abstract | We describe a mechanism for subsampling sequences and show how to compute its expected output so that it can be trained with standard backpropagation. We test this approach on a simple toy problem and discuss its shortcomings. |
Tasks | |
Published | 2017-02-22 |
URL | http://arxiv.org/abs/1702.06914v3 |
http://arxiv.org/pdf/1702.06914v3.pdf | |
PWC | https://paperswithcode.com/paper/training-a-subsampling-mechanism-in |
Repo | https://github.com/craffel/subsampling_in_expectation |
Framework | none |
Deep Active Learning for Named Entity Recognition
Title | Deep Active Learning for Named Entity Recognition |
Authors | Yanyao Shen, Hyokun Yun, Zachary C. Lipton, Yakov Kronrod, Animashree Anandkumar |
Abstract | Deep learning has yielded state-of-the-art performance on many natural language processing tasks including named entity recognition (NER). However, this typically requires large amounts of labeled data. In this work, we demonstrate that the amount of labeled training data can be drastically reduced when deep learning is combined with active learning. While active learning is sample-efficient, it can be computationally expensive since it requires iterative retraining. To speed this up, we introduce a lightweight architecture for NER, viz., the CNN-CNN-LSTM model consisting of convolutional character and word encoders and a long short term memory (LSTM) tag decoder. The model achieves nearly state-of-the-art performance on standard datasets for the task while being computationally much more efficient than best performing models. We carry out incremental active learning, during the training process, and are able to nearly match state-of-the-art performance with just 25% of the original training data. |
Tasks | Active Learning, Named Entity Recognition |
Published | 2017-07-19 |
URL | http://arxiv.org/abs/1707.05928v3 |
http://arxiv.org/pdf/1707.05928v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-active-learning-for-named-entity |
Repo | https://github.com/tonygsw/Joint-Extraction-of-Entities-and-Relations-Based-on-a-Novel-Tagging-Scheme |
Framework | pytorch |
Appearance-and-Relation Networks for Video Classification
Title | Appearance-and-Relation Networks for Video Classification |
Authors | Limin Wang, Wei Li, Wen Li, Luc Van Gool |
Abstract | Spatiotemporal feature learning in videos is a fundamental problem in computer vision. This paper presents a new architecture, termed as Appearance-and-Relation Network (ARTNet), to learn video representation in an end-to-end manner. ARTNets are constructed by stacking multiple generic building blocks, called as SMART, whose goal is to simultaneously model appearance and relation from RGB input in a separate and explicit manner. Specifically, SMART blocks decouple the spatiotemporal learning module into an appearance branch for spatial modeling and a relation branch for temporal modeling. The appearance branch is implemented based on the linear combination of pixels or filter responses in each frame, while the relation branch is designed based on the multiplicative interactions between pixels or filter responses across multiple frames. We perform experiments on three action recognition benchmarks: Kinetics, UCF101, and HMDB51, demonstrating that SMART blocks obtain an evident improvement over 3D convolutions for spatiotemporal feature learning. Under the same training setting, ARTNets achieve superior performance on these three datasets to the existing state-of-the-art methods. |
Tasks | Temporal Action Localization, Video Classification |
Published | 2017-11-24 |
URL | http://arxiv.org/abs/1711.09125v2 |
http://arxiv.org/pdf/1711.09125v2.pdf | |
PWC | https://paperswithcode.com/paper/appearance-and-relation-networks-for-video |
Repo | https://github.com/wanglimin/ARTNet |
Framework | none |
Supervised Community Detection with Line Graph Neural Networks
Title | Supervised Community Detection with Line Graph Neural Networks |
Authors | Zhengdao Chen, Xiang Li, Joan Bruna |
Abstract | We study data-driven methods for community detection on graphs, an inverse problem that is typically solved in terms of the spectrum of certain operators or via posterior inference under certain probabilistic graphical models. Focusing on random graph families such as the stochastic block model, recent research has unified both approaches and identified both statistical and computational signal-to-noise detection thresholds. This graph inference task can be recast as a node-wise graph classification problem, and, as such, computational detection thresholds can be translated in terms of learning within appropriate models. We present a novel family of Graph Neural Networks (GNNs) and show that they can reach those detection thresholds in a purely data-driven manner without access to the underlying generative models, and even improve upon current computational thresholds in hard regimes. For that purpose, we propose to augment GNNs with the non-backtracking operator, defined on the line graph of edge adjacencies. We also perform the first analysis of optimization landscape on using GNNs to solve community detection problems, demonstrating that under certain simplifications and assumptions, the loss value at the local minima is close to the loss value at the global minimum/minima. Finally, the resulting model is also tested on real datasets, performing significantly better than previous models. |
Tasks | Community Detection, Graph Classification |
Published | 2017-05-23 |
URL | http://arxiv.org/abs/1705.08415v5 |
http://arxiv.org/pdf/1705.08415v5.pdf | |
PWC | https://paperswithcode.com/paper/supervised-community-detection-with-line |
Repo | https://github.com/afansi/multiscalegnn |
Framework | pytorch |
A Restaurant Process Mixture Model for Connectivity Based Parcellation of the Cortex
Title | A Restaurant Process Mixture Model for Connectivity Based Parcellation of the Cortex |
Authors | Daniel Moyer, Boris A Gutman, Neda Jahanshad, Paul M. Thompson |
Abstract | One of the primary objectives of human brain mapping is the division of the cortical surface into functionally distinct regions, i.e. parcellation. While it is generally agreed that at macro-scale different regions of the cortex have different functions, the exact number and configuration of these regions is not known. Methods for the discovery of these regions are thus important, particularly as the volume of available information grows. Towards this end, we present a parcellation method based on a Bayesian non-parametric mixture model of cortical connectivity. |
Tasks | |
Published | 2017-03-02 |
URL | http://arxiv.org/abs/1703.00981v1 |
http://arxiv.org/pdf/1703.00981v1.pdf | |
PWC | https://paperswithcode.com/paper/a-restaurant-process-mixture-model-for |
Repo | https://github.com/kristianeschenburg/ddCRP |
Framework | none |
SemRe-Rank: Improving Automatic Term Extraction By Incorporating Semantic Relatedness With Personalised PageRank
Title | SemRe-Rank: Improving Automatic Term Extraction By Incorporating Semantic Relatedness With Personalised PageRank |
Authors | Ziqi Zhang, Jie Gao, Fabio Ciravegna |
Abstract | Automatic Term Extraction deals with the extraction of terminology from a domain specific corpus, and has long been an established research area in data and knowledge acquisition. ATE remains a challenging task as it is known that there is no existing ATE methods that can consistently outperform others in any domain. This work adopts a refreshed perspective to this problem: instead of searching for such a ‘one-size-fit-all’ solution that may never exist, we propose to develop generic methods to ‘enhance’ existing ATE methods. We introduce SemRe-Rank, the first method based on this principle, to incorporate semantic relatedness - an often overlooked venue - into an existing ATE method to further improve its performance. SemRe-Rank incorporates word embeddings into a personalised PageRank process to compute ‘semantic importance’ scores for candidate terms from a graph of semantically related words (nodes), which are then used to revise the scores of candidate terms computed by a base ATE algorithm. Extensively evaluated with 13 state-of-the-art base ATE methods on four datasets of diverse nature, it is shown to have achieved widespread improvement over all base methods and across all datasets, with up to 15 percentage points when measured by the Precision in the top ranked K candidate terms (the average for a set of K’s), or up to 28 percentage points in F1 measured at a K that equals to the expected real terms in the candidates (F1 in short). Compared to an alternative approach built on the well-known TextRank algorithm, SemRe-Rank can potentially outperform by up to 8 points in Precision at top K, or up to 17 points in F1. |
Tasks | Word Embeddings |
Published | 2017-11-09 |
URL | http://arxiv.org/abs/1711.03373v3 |
http://arxiv.org/pdf/1711.03373v3.pdf | |
PWC | https://paperswithcode.com/paper/semre-rank-improving-automatic-term |
Repo | https://github.com/ziqizhang/semrerank |
Framework | none |
Multiple Instance Learning Networks for Fine-Grained Sentiment Analysis
Title | Multiple Instance Learning Networks for Fine-Grained Sentiment Analysis |
Authors | Stefanos Angelidis, Mirella Lapata |
Abstract | We consider the task of fine-grained sentiment analysis from the perspective of multiple instance learning (MIL). Our neural model is trained on document sentiment labels, and learns to predict the sentiment of text segments, i.e. sentences or elementary discourse units (EDUs), without segment-level supervision. We introduce an attention-based polarity scoring method for identifying positive and negative text snippets and a new dataset which we call SPOT (as shorthand for Segment-level POlariTy annotations) for evaluating MIL-style sentiment models like ours. Experimental results demonstrate superior performance against multiple baselines, whereas a judgement elicitation study shows that EDU-level opinion extraction produces more informative summaries than sentence-based alternatives. |
Tasks | Multiple Instance Learning, Sentiment Analysis |
Published | 2017-11-27 |
URL | http://arxiv.org/abs/1711.09645v2 |
http://arxiv.org/pdf/1711.09645v2.pdf | |
PWC | https://paperswithcode.com/paper/multiple-instance-learning-networks-for-fine |
Repo | https://github.com/stangelid/milnet-sent |
Framework | none |
Visual Translation Embedding Network for Visual Relation Detection
Title | Visual Translation Embedding Network for Visual Relation Detection |
Authors | Hanwang Zhang, Zawlin Kyaw, Shih-Fu Chang, Tat-Seng Chua |
Abstract | Visual relations, such as “person ride bike” and “bike next to car”, offer a comprehensive scene understanding of an image, and have already shown their great utility in connecting computer vision and natural language. However, due to the challenging combinatorial complexity of modeling subject-predicate-object relation triplets, very little work has been done to localize and predict visual relations. Inspired by the recent advances in relational representation learning of knowledge bases and convolutional object detection networks, we propose a Visual Translation Embedding network (VTransE) for visual relation detection. VTransE places objects in a low-dimensional relation space where a relation can be modeled as a simple vector translation, i.e., subject + predicate $\approx$ object. We propose a novel feature extraction layer that enables object-relation knowledge transfer in a fully-convolutional fashion that supports training and inference in a single forward/backward pass. To the best of our knowledge, VTransE is the first end-to-end relation detection network. We demonstrate the effectiveness of VTransE over other state-of-the-art methods on two large-scale datasets: Visual Relationship and Visual Genome. Note that even though VTransE is a purely visual model, it is still competitive to the Lu’s multi-modal model with language priors. |
Tasks | Object Detection, Representation Learning, Scene Understanding, Transfer Learning |
Published | 2017-02-27 |
URL | http://arxiv.org/abs/1702.08319v1 |
http://arxiv.org/pdf/1702.08319v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-translation-embedding-network-for |
Repo | https://github.com/zawlin/cvpr17_vtranse |
Framework | caffe2 |
Video Highlights Detection and Summarization with Lag-Calibration based on Concept-Emotion Mapping of Crowd-sourced Time-Sync Comments
Title | Video Highlights Detection and Summarization with Lag-Calibration based on Concept-Emotion Mapping of Crowd-sourced Time-Sync Comments |
Authors | Qing Ping, Chaomei Chen |
Abstract | With the prevalence of video sharing, there are increasing demands for automatic video digestion such as highlight detection. Recently, platforms with crowdsourced time-sync video comments have emerged worldwide, providing a good opportunity for highlight detection. However, this task is non-trivial: (1) time-sync comments often lag behind their corresponding shot; (2) time-sync comments are semantically sparse and noisy; (3) to determine which shots are highlights is highly subjective. The present paper aims to tackle these challenges by proposing a framework that (1) uses concept-mapped lexical-chains for lag calibration; (2) models video highlights based on comment intensity and combination of emotion and concept concentration of each shot; (3) summarize each detected highlight using improved SumBasic with emotion and concept mapping. Experiments on large real-world datasets show that our highlight detection method and summarization method both outperform other benchmarks with considerable margins. |
Tasks | Calibration |
Published | 2017-08-07 |
URL | http://arxiv.org/abs/1708.02210v1 |
http://arxiv.org/pdf/1708.02210v1.pdf | |
PWC | https://paperswithcode.com/paper/video-highlights-detection-and-summarization-1 |
Repo | https://github.com/ChanningPing/VideoHighlightDetection |
Framework | none |
Unsupervised Holistic Image Generation from Key Local Patches
Title | Unsupervised Holistic Image Generation from Key Local Patches |
Authors | Donghoon Lee, Sangdoo Yun, Sungjoon Choi, Hwiyeon Yoo, Ming-Hsuan Yang, Songhwai Oh |
Abstract | We introduce a new problem of generating an image based on a small number of key local patches without any geometric prior. In this work, key local patches are defined as informative regions of the target object or scene. This is a challenging problem since it requires generating realistic images and predicting locations of parts at the same time. We construct adversarial networks to tackle this problem. A generator network generates a fake image as well as a mask based on the encoder-decoder framework. On the other hand, a discriminator network aims to detect fake images. The network is trained with three losses to consider spatial, appearance, and adversarial information. The spatial loss determines whether the locations of predicted parts are correct. Input patches are restored in the output image without much modification due to the appearance loss. The adversarial loss ensures output images are realistic. The proposed network is trained without supervisory signals since no labels of key parts are required. Experimental results on six datasets demonstrate that the proposed algorithm performs favorably on challenging objects and scenes. |
Tasks | Image Generation |
Published | 2017-03-31 |
URL | http://arxiv.org/abs/1703.10730v2 |
http://arxiv.org/pdf/1703.10730v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-holistic-image-generation-from |
Repo | https://github.com/hellbell/KeyPatchGan |
Framework | pytorch |
BEGAN: Boundary Equilibrium Generative Adversarial Networks
Title | BEGAN: Boundary Equilibrium Generative Adversarial Networks |
Authors | David Berthelot, Thomas Schumm, Luke Metz |
Abstract | We propose a new equilibrium enforcing method paired with a loss derived from the Wasserstein distance for training auto-encoder based Generative Adversarial Networks. This method balances the generator and discriminator during training. Additionally, it provides a new approximate convergence measure, fast and stable training and high visual quality. We also derive a way of controlling the trade-off between image diversity and visual quality. We focus on the image generation task, setting a new milestone in visual quality, even at higher resolutions. This is achieved while using a relatively simple model architecture and a standard training procedure. |
Tasks | Image Generation |
Published | 2017-03-31 |
URL | http://arxiv.org/abs/1703.10717v4 |
http://arxiv.org/pdf/1703.10717v4.pdf | |
PWC | https://paperswithcode.com/paper/began-boundary-equilibrium-generative |
Repo | https://github.com/eliceio/vocal-style-transfer |
Framework | tf |
Hierarchical Spatial-aware Siamese Network for Thermal Infrared Object Tracking
Title | Hierarchical Spatial-aware Siamese Network for Thermal Infrared Object Tracking |
Authors | Xin Li, Qiao Liu, Nana Fan, Zhenyu He, Hongzhi Wang |
Abstract | Most thermal infrared (TIR) tracking methods are discriminative, treating the tracking problem as a classification task. However, the objective of the classifier (label prediction) is not coupled to the objective of the tracker (location estimation). The classification task focuses on the between-class difference of the arbitrary objects, while the tracking task mainly deals with the within-class difference of the same objects. In this paper, we cast the TIR tracking problem as a similarity verification task, which is coupled well to the objective of the tracking task. We propose a TIR tracker via a Hierarchical Spatial-aware Siamese Convolutional Neural Network (CNN), named HSSNet. To obtain both spatial and semantic features of the TIR object, we design a Siamese CNN that coalesces the multiple hierarchical convolutional layers. Then, we propose a spatial-aware network to enhance the discriminative ability of the coalesced hierarchical feature. Subsequently, we train this network end to end on a large visible video detection dataset to learn the similarity between paired objects before we transfer the network into the TIR domain. Next, this pre-trained Siamese network is used to evaluate the similarity between the target template and target candidates. Finally, we locate the candidate that is most similar to the tracked target. Extensive experimental results on the benchmarks VOT-TIR 2015 and VOT-TIR 2016 show that our proposed method achieves favourable performance compared to the state-of-the-art methods. |
Tasks | Object Tracking, Thermal Infrared Object Tracking |
Published | 2017-11-27 |
URL | http://arxiv.org/abs/1711.09539v2 |
http://arxiv.org/pdf/1711.09539v2.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-spatial-aware-siamese-network |
Repo | https://github.com/QiaoLiuHit/HSSNet |
Framework | none |
Hidden Talents of the Variational Autoencoder
Title | Hidden Talents of the Variational Autoencoder |
Authors | Bin Dai, Yu Wang, John Aston, Gang Hua, David Wipf |
Abstract | Variational autoencoders (VAE) represent a popular, flexible form of deep generative model that can be stochastically fit to samples from a given random process using an information-theoretic variational bound on the true underlying distribution. Once so-obtained, the model can be putatively used to generate new samples from this distribution, or to provide a low-dimensional latent representation of existing samples. While quite effective in numerous application domains, certain important mechanisms which govern the behavior of the VAE are obfuscated by the intractable integrals and resulting stochastic approximations involved. Moreover, as a highly non-convex model, it remains unclear exactly how minima of the underlying energy relate to original design purposes. We attempt to better quantify these issues by analyzing a series of tractable special cases of increasing complexity. In doing so, we unveil interesting connections with more traditional dimensionality reduction models, as well as an intrinsic yet underappreciated propensity for robustly dismissing sparse outliers when estimating latent manifolds. With respect to the latter, we demonstrate that the VAE can be viewed as the natural evolution of recent robust PCA models, capable of learning nonlinear manifolds of unknown dimension obscured by gross corruptions. |
Tasks | Dimensionality Reduction |
Published | 2017-06-16 |
URL | https://arxiv.org/abs/1706.05148v5 |
https://arxiv.org/pdf/1706.05148v5.pdf | |
PWC | https://paperswithcode.com/paper/hidden-talents-of-the-variational-autoencoder |
Repo | https://github.com/Kismuz/crypto_spread_test |
Framework | tf |