May 7, 2019

3154 words 15 mins read

Paper Group AWR 27

Paper Group AWR 27

Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering. Preliminaries of a Space Situational Awareness Ontology. Drawing and Recognizing Chinese Characters with Recurrent Neural Network. Fully Convolutional Networks for Semantic Segmentation. Sequence-to-point learning with neural networks for nonintrusive load monitoring. Deep …

Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering

Title Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering
Authors Bo Yang, Xiao Fu, Nicholas D. Sidiropoulos, Mingyi Hong
Abstract Most learning approaches treat dimensionality reduction (DR) and clustering separately (i.e., sequentially), but recent research has shown that optimizing the two tasks jointly can substantially improve the performance of both. The premise behind the latter genre is that the data samples are obtained via linear transformation of latent representations that are easy to cluster; but in practice, the transformation from the latent space to the data can be more complicated. In this work, we assume that this transformation is an unknown and possibly nonlinear function. To recover the `clustering-friendly’ latent representations and to better cluster the data, we propose a joint DR and K-means clustering approach in which DR is accomplished via learning a deep neural network (DNN). The motivation is to keep the advantages of jointly optimizing the two tasks, while exploiting the deep neural network’s ability to approximate any nonlinear function. This way, the proposed approach can work well for a broad class of generative models. Towards this end, we carefully design the DNN structure and the associated joint optimization criterion, and propose an effective and scalable algorithm to handle the formulated optimization problem. Experiments using different real datasets are employed to showcase the effectiveness of the proposed approach. |
Tasks Dimensionality Reduction
Published 2016-10-15
URL http://arxiv.org/abs/1610.04794v2
PDF http://arxiv.org/pdf/1610.04794v2.pdf
PWC https://paperswithcode.com/paper/towards-k-means-friendly-spaces-simultaneous
Repo https://github.com/boyangumn/DCN
Framework none

Preliminaries of a Space Situational Awareness Ontology

Title Preliminaries of a Space Situational Awareness Ontology
Authors Robert John Rovetto, T. S. Kelso
Abstract Space situational awareness (SSA) is vital for international safety and security, and the future of space travel. By improving SSA data-sharing we improve global SSA. Computational ontology may provide one means toward that goal. This paper develops the ontology of the SSA domain and takes steps in the creation of the space situational awareness ontology. Ontology objectives, requirements and desiderata are outlined; and both the SSA domain and the discipline of ontology are described. The purposes of the ontology include: exploring the potential for ontology development and engineering to (i) represent SSA data, general domain knowledge, objects and relationships (ii) annotate and express the meaning of that data, and (iii) foster SSA data-exchange and integration among SSA actors, orbital debris databases, space object catalogs and other SSA data repositories. By improving SSA via data- and knowledge-sharing, we can (iv) expand our scientific knowledge of the space environment, (v) advance our capacity for planetary defense from near-Earth objects, and (vi) ensure the future of safe space flight for generations to come.
Tasks
Published 2016-06-02
URL http://arxiv.org/abs/1606.01924v2
PDF http://arxiv.org/pdf/1606.01924v2.pdf
PWC https://paperswithcode.com/paper/preliminaries-of-a-space-situational
Repo https://github.com/rrovetto/space-situational-awareness-domain-ontology
Framework none

Drawing and Recognizing Chinese Characters with Recurrent Neural Network

Title Drawing and Recognizing Chinese Characters with Recurrent Neural Network
Authors Xu-Yao Zhang, Fei Yin, Yan-Ming Zhang, Cheng-Lin Liu, Yoshua Bengio
Abstract Recent deep learning based approaches have achieved great success on handwriting recognition. Chinese characters are among the most widely adopted writing systems in the world. Previous research has mainly focused on recognizing handwritten Chinese characters. However, recognition is only one aspect for understanding a language, another challenging and interesting task is to teach a machine to automatically write (pictographic) Chinese characters. In this paper, we propose a framework by using the recurrent neural network (RNN) as both a discriminative model for recognizing Chinese characters and a generative model for drawing (generating) Chinese characters. To recognize Chinese characters, previous methods usually adopt the convolutional neural network (CNN) models which require transforming the online handwriting trajectory into image-like representations. Instead, our RNN based approach is an end-to-end system which directly deals with the sequential structure and does not require any domain-specific knowledge. With the RNN system (combining an LSTM and GRU), state-of-the-art performance can be achieved on the ICDAR-2013 competition database. Furthermore, under the RNN framework, a conditional generative model with character embedding is proposed for automatically drawing recognizable Chinese characters. The generated characters (in vector format) are human-readable and also can be recognized by the discriminative RNN model with high accuracy. Experimental results verify the effectiveness of using RNNs as both generative and discriminative models for the tasks of drawing and recognizing Chinese characters.
Tasks
Published 2016-06-21
URL http://arxiv.org/abs/1606.06539v1
PDF http://arxiv.org/pdf/1606.06539v1.pdf
PWC https://paperswithcode.com/paper/drawing-and-recognizing-chinese-characters
Repo https://github.com/YifeiY/hanzi_recognition
Framework none

Fully Convolutional Networks for Semantic Segmentation

Title Fully Convolutional Networks for Semantic Segmentation
Authors Evan Shelhamer, Jonathan Long, Trevor Darrell
Abstract Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves improved segmentation of PASCAL VOC (30% relative improvement to 67.2% mean IU on 2012), NYUDv2, SIFT Flow, and PASCAL-Context, while inference takes one tenth of a second for a typical image.
Tasks Real-Time Semantic Segmentation, Scene Segmentation, Semantic Segmentation
Published 2016-05-20
URL http://arxiv.org/abs/1605.06211v1
PDF http://arxiv.org/pdf/1605.06211v1.pdf
PWC https://paperswithcode.com/paper/fully-convolutional-networks-for-semantic
Repo https://github.com/DaiLisen/Eye-gaze-Point-Detection-Modified-sec-
Framework caffe2

Sequence-to-point learning with neural networks for nonintrusive load monitoring

Title Sequence-to-point learning with neural networks for nonintrusive load monitoring
Authors Chaoyun Zhang, Mingjun Zhong, Zongzuo Wang, Nigel Goddard, Charles Sutton
Abstract Energy disaggregation (a.k.a nonintrusive load monitoring, NILM), a single-channel blind source separation problem, aims to decompose the mains which records the whole house electricity consumption into appliance-wise readings. This problem is difficult because it is inherently unidentifiable. Recent approaches have shown that the identifiability problem could be reduced by introducing domain knowledge into the model. Deep neural networks have been shown to be a promising approach for these problems, but sliding windows are necessary to handle the long sequences which arise in signal processing problems, which raises issues about how to combine predictions from different sliding windows. In this paper, we propose sequence-to-point learning, where the input is a window of the mains and the output is a single point of the target appliance. We use convolutional neural networks to train the model. Interestingly, we systematically show that the convolutional neural networks can inherently learn the signatures of the target appliances, which are automatically added into the model to reduce the identifiability problem. We applied the proposed neural network approaches to real-world household energy data, and show that the methods achieve state-of-the-art performance, improving two standard error measures by 84% and 92%.
Tasks
Published 2016-12-29
URL http://arxiv.org/abs/1612.09106v3
PDF http://arxiv.org/pdf/1612.09106v3.pdf
PWC https://paperswithcode.com/paper/sequence-to-point-learning-with-neural
Repo https://github.com/OdysseasKr/online-nilm
Framework tf

Deep Reinforcement Learning for Mention-Ranking Coreference Models

Title Deep Reinforcement Learning for Mention-Ranking Coreference Models
Authors Kevin Clark, Christopher D. Manning
Abstract Coreference resolution systems are typically trained with heuristic loss functions that require careful tuning. In this paper we instead apply reinforcement learning to directly optimize a neural mention-ranking model for coreference evaluation metrics. We experiment with two approaches: the REINFORCE policy gradient algorithm and a reward-rescaled max-margin objective. We find the latter to be more effective, resulting in significant improvements over the current state-of-the-art on the English and Chinese portions of the CoNLL 2012 Shared Task.
Tasks Coreference Resolution
Published 2016-09-27
URL http://arxiv.org/abs/1609.08667v3
PDF http://arxiv.org/pdf/1609.08667v3.pdf
PWC https://paperswithcode.com/paper/deep-reinforcement-learning-for-mention
Repo https://github.com/clarkkev/deep-coref
Framework none

Horizon Lines in the Wild

Title Horizon Lines in the Wild
Authors Scott Workman, Menghua Zhai, Nathan Jacobs
Abstract The horizon line is an important contextual attribute for a wide variety of image understanding tasks. As such, many methods have been proposed to estimate its location from a single image. These methods typically require the image to contain specific cues, such as vanishing points, coplanar circles, and regular textures, thus limiting their real-world applicability. We introduce a large, realistic evaluation dataset, Horizon Lines in the Wild (HLW), containing natural images with labeled horizon lines. Using this dataset, we investigate the application of convolutional neural networks for directly estimating the horizon line, without requiring any explicit geometric constraints or other special cues. An extensive evaluation shows that using our CNNs, either in isolation or in conjunction with a previous geometric approach, we achieve state-of-the-art results on the challenging HLW dataset and two existing benchmark datasets.
Tasks Horizon Line Estimation
Published 2016-04-07
URL http://arxiv.org/abs/1604.02129v2
PDF http://arxiv.org/pdf/1604.02129v2.pdf
PWC https://paperswithcode.com/paper/horizon-lines-in-the-wild
Repo https://github.com/scottworkman/deephorizon
Framework caffe2

Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models

Title Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models
Authors Ashwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra
Abstract Neural sequence models are widely used to model time-series data. Equally ubiquitous is the usage of beam search (BS) as an approximate inference algorithm to decode output sequences from these models. BS explores the search space in a greedy left-right fashion retaining only the top-B candidates - resulting in sequences that differ only slightly from each other. Producing lists of nearly identical sequences is not only computationally wasteful but also typically fails to capture the inherent ambiguity of complex AI tasks. To overcome this problem, we propose Diverse Beam Search (DBS), an alternative to BS that decodes a list of diverse outputs by optimizing for a diversity-augmented objective. We observe that our method finds better top-1 solutions by controlling for the exploration and exploitation of the search space - implying that DBS is a better search algorithm. Moreover, these gains are achieved with minimal computational or memory over- head as compared to beam search. To demonstrate the broad applicability of our method, we present results on image captioning, machine translation and visual question generation using both standard quantitative metrics and qualitative human studies. Further, we study the role of diversity for image-grounded language generation tasks as the complexity of the image changes. We observe that our method consistently outperforms BS and previously proposed techniques for diverse decoding from neural sequence models.
Tasks Image Captioning, Machine Translation, Question Generation, Text Generation, Time Series
Published 2016-10-07
URL http://arxiv.org/abs/1610.02424v2
PDF http://arxiv.org/pdf/1610.02424v2.pdf
PWC https://paperswithcode.com/paper/diverse-beam-search-decoding-diverse
Repo https://github.com/pytorch/fairseq
Framework pytorch

Automatic Differentiation Variational Inference

Title Automatic Differentiation Variational Inference
Authors Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, David M. Blei
Abstract Probabilistic modeling is iterative. A scientist posits a simple model, fits it to her data, refines it according to her analysis, and repeats. However, fitting complex models to large data is a bottleneck in this process. Deriving algorithms for new models can be both mathematically and computationally challenging, which makes it difficult to efficiently cycle through the steps. To this end, we develop automatic differentiation variational inference (ADVI). Using our method, the scientist only provides a probabilistic model and a dataset, nothing else. ADVI automatically derives an efficient variational inference algorithm, freeing the scientist to refine and explore many models. ADVI supports a broad class of models-no conjugacy assumptions are required. We study ADVI across ten different models and apply it to a dataset with millions of observations. ADVI is integrated into Stan, a probabilistic programming system; it is available for immediate use.
Tasks Probabilistic Programming
Published 2016-03-02
URL http://arxiv.org/abs/1603.00788v1
PDF http://arxiv.org/pdf/1603.00788v1.pdf
PWC https://paperswithcode.com/paper/automatic-differentiation-variational
Repo https://github.com/yiyuezhuo/bayes-torch
Framework pytorch

FastMask: Segment Multi-scale Object Candidates in One Shot

Title FastMask: Segment Multi-scale Object Candidates in One Shot
Authors Hexiang Hu, Shiyi Lan, Yuning Jiang, Zhimin Cao, Fei Sha
Abstract Objects appear to scale differently in natural images. This fact requires methods dealing with object-centric tasks (e.g. object proposal) to have robust performance over variances in object scales. In the paper, we present a novel segment proposal framework, namely FastMask, which takes advantage of hierarchical features in deep convolutional neural networks to segment multi-scale objects in one shot. Innovatively, we adapt segment proposal network into three different functional components (body, neck and head). We further propose a weight-shared residual neck module as well as a scale-tolerant attentional head module for efficient one-shot inference. On MS COCO benchmark, the proposed FastMask outperforms all state-of-the-art segment proposal methods in average recall being 2~5 times faster. Moreover, with a slight trade-off in accuracy, FastMask can segment objects in near real time (~13 fps) with 800*600 resolution images, demonstrating its potential in practical applications. Our implementation is available on https://github.com/voidrank/FastMask.
Tasks
Published 2016-12-28
URL http://arxiv.org/abs/1612.08843v4
PDF http://arxiv.org/pdf/1612.08843v4.pdf
PWC https://paperswithcode.com/paper/fastmask-segment-multi-scale-object
Repo https://github.com/voidrank/FastMask
Framework caffe2

A Threshold-based Scheme for Reinforcement Learning in Neural Networks

Title A Threshold-based Scheme for Reinforcement Learning in Neural Networks
Authors Thomas H. Ward
Abstract A generic and scalable Reinforcement Learning scheme for Artificial Neural Networks is presented, providing a general purpose learning machine. By reference to a node threshold three features are described 1) A mechanism for Primary Reinforcement, capable of solving linearly inseparable problems 2) The learning scheme is extended to include a mechanism for Conditioned Reinforcement, capable of forming long term strategy 3) The learning scheme is modified to use a threshold-based deep learning algorithm, providing a robust and biologically inspired alternative to backpropagation. The model may be used for supervised as well as unsupervised training regimes.
Tasks
Published 2016-09-12
URL http://arxiv.org/abs/1609.03348v4
PDF http://arxiv.org/pdf/1609.03348v4.pdf
PWC https://paperswithcode.com/paper/a-threshold-based-scheme-for-reinforcement
Repo https://github.com/thward/neural_agent
Framework none

Saliency Driven Image Manipulation

Title Saliency Driven Image Manipulation
Authors Roey Mechrez, Eli Shechtman, Lihi Zelnik-Manor
Abstract Have you ever taken a picture only to find out that an unimportant background object ended up being overly salient? Or one of those team sports photos where your favorite player blends with the rest? Wouldn’t it be nice if you could tweak these pictures just a little bit so that the distractor would be attenuated and your favorite player will stand-out among her peers? Manipulating images in order to control the saliency of objects is the goal of this paper. We propose an approach that considers the internal color and saliency properties of the image. It changes the saliency map via an optimization framework that relies on patch-based manipulation using only patches from within the same image to achieve realistic looking results. Applications include object enhancement, distractors attenuation and background decluttering. Comparing our method to previous ones shows significant improvement, both in the achieved saliency manipulation and in the realistic appearance of the resulting images.
Tasks
Published 2016-12-07
URL http://arxiv.org/abs/1612.02184v3
PDF http://arxiv.org/pdf/1612.02184v3.pdf
PWC https://paperswithcode.com/paper/saliency-driven-image-manipulation
Repo https://github.com/roimehrez/photorealism
Framework none

We don’t need no bounding-boxes: Training object class detectors using only human verification

Title We don’t need no bounding-boxes: Training object class detectors using only human verification
Authors Dim P. Papadopoulos, Jasper R. R. Uijlings, Frank Keller, Vittorio Ferrari
Abstract Training object class detectors typically requires a large set of images in which objects are annotated by bounding-boxes. However, manually drawing bounding-boxes is very time consuming. We propose a new scheme for training object detectors which only requires annotators to verify bounding-boxes produced automatically by the learning algorithm. Our scheme iterates between re-training the detector, re-localizing objects in the training images, and human verification. We use the verification signal both to improve re-training and to reduce the search space for re-localisation, which makes these steps different to what is normally done in a weakly supervised setting. Extensive experiments on PASCAL VOC 2007 show that (1) using human verification to update detectors and reduce the search space leads to the rapid production of high-quality bounding-box annotations; (2) our scheme delivers detectors performing almost as good as those trained in a fully supervised setting, without ever drawing any bounding-box; (3) as the verification task is very quick, our scheme substantially reduces total annotation time by a factor 6x-9x.
Tasks
Published 2016-02-26
URL http://arxiv.org/abs/1602.08405v3
PDF http://arxiv.org/pdf/1602.08405v3.pdf
PWC https://paperswithcode.com/paper/we-dont-need-no-bounding-boxes-training
Repo https://github.com/EscVM/OIDv4_ToolKit
Framework none

A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering

Title A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering
Authors Tegan Maharaj, Nicolas Ballas, Anna Rohrbach, Aaron Courville, Christopher Pal
Abstract While deep convolutional neural networks frequently approach or exceed human-level performance at benchmark tasks involving static images, extending this success to moving images is not straightforward. Having models which can learn to understand video is of interest for many applications, including content recommendation, prediction, summarization, event/object detection and understanding human visual perception, but many domains lack sufficient data to explore and perfect video models. In order to address the need for a simple, quantitative benchmark for developing and understanding video, we present MovieFIB, a fill-in-the-blank question-answering dataset with over 300,000 examples, based on descriptive video annotations for the visually impaired. In addition to presenting statistics and a description of the dataset, we perform a detailed analysis of 5 different models’ predictions, and compare these with human performance. We investigate the relative importance of language, static (2D) visual features, and moving (3D) visual features; the effects of increasing dataset size, the number of frames sampled; and of vocabulary size. We illustrate that: this task is not solvable by a language model alone; our model combining 2D and 3D visual information indeed provides the best result; all models perform significantly worse than human-level. We provide human evaluations for responses given by different models and find that accuracy on the MovieFIB evaluation corresponds well with human judgement. We suggest avenues for improving video models, and hope that the proposed dataset can be useful for measuring and encouraging progress in this very interesting field.
Tasks Language Modelling, Object Detection, Question Answering
Published 2016-11-23
URL http://arxiv.org/abs/1611.07810v2
PDF http://arxiv.org/pdf/1611.07810v2.pdf
PWC https://paperswithcode.com/paper/a-dataset-and-exploration-of-models-for
Repo https://github.com/teganmaharaj/movieFIB
Framework none

Region-based semantic segmentation with end-to-end training

Title Region-based semantic segmentation with end-to-end training
Authors Holger Caesar, Jasper Uijlings, Vittorio Ferrari
Abstract We propose a novel method for semantic segmentation, the task of labeling each pixel in an image with a semantic class. Our method combines the advantages of the two main competing paradigms. Methods based on region classification offer proper spatial support for appearance measurements, but typically operate in two separate stages, none of which targets pixel labeling performance at the end of the pipeline. More recent fully convolutional methods are capable of end-to-end training for the final pixel labeling, but resort to fixed patches as spatial support. We show how to modify modern region-based approaches to enable end-to-end training for semantic segmentation. This is achieved via a differentiable region-to-pixel layer and a differentiable free-form Region-of-Interest pooling layer. Our method improves the state-of-the-art in terms of class-average accuracy with 64.0% on SIFT Flow and 49.9% on PASCAL Context, and is particularly accurate at object boundaries.
Tasks Semantic Segmentation
Published 2016-07-26
URL http://arxiv.org/abs/1607.07671v1
PDF http://arxiv.org/pdf/1607.07671v1.pdf
PWC https://paperswithcode.com/paper/region-based-semantic-segmentation-with-end
Repo https://github.com/nightrome/matconvnet-calvin
Framework none
comments powered by Disqus