Paper Group AWR 27
Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering. Preliminaries of a Space Situational Awareness Ontology. Drawing and Recognizing Chinese Characters with Recurrent Neural Network. Fully Convolutional Networks for Semantic Segmentation. Sequence-to-point learning with neural networks for nonintrusive load monitoring. Deep …
Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering
Title | Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering |
Authors | Bo Yang, Xiao Fu, Nicholas D. Sidiropoulos, Mingyi Hong |
Abstract | Most learning approaches treat dimensionality reduction (DR) and clustering separately (i.e., sequentially), but recent research has shown that optimizing the two tasks jointly can substantially improve the performance of both. The premise behind the latter genre is that the data samples are obtained via linear transformation of latent representations that are easy to cluster; but in practice, the transformation from the latent space to the data can be more complicated. In this work, we assume that this transformation is an unknown and possibly nonlinear function. To recover the `clustering-friendly’ latent representations and to better cluster the data, we propose a joint DR and K-means clustering approach in which DR is accomplished via learning a deep neural network (DNN). The motivation is to keep the advantages of jointly optimizing the two tasks, while exploiting the deep neural network’s ability to approximate any nonlinear function. This way, the proposed approach can work well for a broad class of generative models. Towards this end, we carefully design the DNN structure and the associated joint optimization criterion, and propose an effective and scalable algorithm to handle the formulated optimization problem. Experiments using different real datasets are employed to showcase the effectiveness of the proposed approach. | |
Tasks | Dimensionality Reduction |
Published | 2016-10-15 |
URL | http://arxiv.org/abs/1610.04794v2 |
http://arxiv.org/pdf/1610.04794v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-k-means-friendly-spaces-simultaneous |
Repo | https://github.com/boyangumn/DCN |
Framework | none |
Preliminaries of a Space Situational Awareness Ontology
Title | Preliminaries of a Space Situational Awareness Ontology |
Authors | Robert John Rovetto, T. S. Kelso |
Abstract | Space situational awareness (SSA) is vital for international safety and security, and the future of space travel. By improving SSA data-sharing we improve global SSA. Computational ontology may provide one means toward that goal. This paper develops the ontology of the SSA domain and takes steps in the creation of the space situational awareness ontology. Ontology objectives, requirements and desiderata are outlined; and both the SSA domain and the discipline of ontology are described. The purposes of the ontology include: exploring the potential for ontology development and engineering to (i) represent SSA data, general domain knowledge, objects and relationships (ii) annotate and express the meaning of that data, and (iii) foster SSA data-exchange and integration among SSA actors, orbital debris databases, space object catalogs and other SSA data repositories. By improving SSA via data- and knowledge-sharing, we can (iv) expand our scientific knowledge of the space environment, (v) advance our capacity for planetary defense from near-Earth objects, and (vi) ensure the future of safe space flight for generations to come. |
Tasks | |
Published | 2016-06-02 |
URL | http://arxiv.org/abs/1606.01924v2 |
http://arxiv.org/pdf/1606.01924v2.pdf | |
PWC | https://paperswithcode.com/paper/preliminaries-of-a-space-situational |
Repo | https://github.com/rrovetto/space-situational-awareness-domain-ontology |
Framework | none |
Drawing and Recognizing Chinese Characters with Recurrent Neural Network
Title | Drawing and Recognizing Chinese Characters with Recurrent Neural Network |
Authors | Xu-Yao Zhang, Fei Yin, Yan-Ming Zhang, Cheng-Lin Liu, Yoshua Bengio |
Abstract | Recent deep learning based approaches have achieved great success on handwriting recognition. Chinese characters are among the most widely adopted writing systems in the world. Previous research has mainly focused on recognizing handwritten Chinese characters. However, recognition is only one aspect for understanding a language, another challenging and interesting task is to teach a machine to automatically write (pictographic) Chinese characters. In this paper, we propose a framework by using the recurrent neural network (RNN) as both a discriminative model for recognizing Chinese characters and a generative model for drawing (generating) Chinese characters. To recognize Chinese characters, previous methods usually adopt the convolutional neural network (CNN) models which require transforming the online handwriting trajectory into image-like representations. Instead, our RNN based approach is an end-to-end system which directly deals with the sequential structure and does not require any domain-specific knowledge. With the RNN system (combining an LSTM and GRU), state-of-the-art performance can be achieved on the ICDAR-2013 competition database. Furthermore, under the RNN framework, a conditional generative model with character embedding is proposed for automatically drawing recognizable Chinese characters. The generated characters (in vector format) are human-readable and also can be recognized by the discriminative RNN model with high accuracy. Experimental results verify the effectiveness of using RNNs as both generative and discriminative models for the tasks of drawing and recognizing Chinese characters. |
Tasks | |
Published | 2016-06-21 |
URL | http://arxiv.org/abs/1606.06539v1 |
http://arxiv.org/pdf/1606.06539v1.pdf | |
PWC | https://paperswithcode.com/paper/drawing-and-recognizing-chinese-characters |
Repo | https://github.com/YifeiY/hanzi_recognition |
Framework | none |
Fully Convolutional Networks for Semantic Segmentation
Title | Fully Convolutional Networks for Semantic Segmentation |
Authors | Evan Shelhamer, Jonathan Long, Trevor Darrell |
Abstract | Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves improved segmentation of PASCAL VOC (30% relative improvement to 67.2% mean IU on 2012), NYUDv2, SIFT Flow, and PASCAL-Context, while inference takes one tenth of a second for a typical image. |
Tasks | Real-Time Semantic Segmentation, Scene Segmentation, Semantic Segmentation |
Published | 2016-05-20 |
URL | http://arxiv.org/abs/1605.06211v1 |
http://arxiv.org/pdf/1605.06211v1.pdf | |
PWC | https://paperswithcode.com/paper/fully-convolutional-networks-for-semantic |
Repo | https://github.com/DaiLisen/Eye-gaze-Point-Detection-Modified-sec- |
Framework | caffe2 |
Sequence-to-point learning with neural networks for nonintrusive load monitoring
Title | Sequence-to-point learning with neural networks for nonintrusive load monitoring |
Authors | Chaoyun Zhang, Mingjun Zhong, Zongzuo Wang, Nigel Goddard, Charles Sutton |
Abstract | Energy disaggregation (a.k.a nonintrusive load monitoring, NILM), a single-channel blind source separation problem, aims to decompose the mains which records the whole house electricity consumption into appliance-wise readings. This problem is difficult because it is inherently unidentifiable. Recent approaches have shown that the identifiability problem could be reduced by introducing domain knowledge into the model. Deep neural networks have been shown to be a promising approach for these problems, but sliding windows are necessary to handle the long sequences which arise in signal processing problems, which raises issues about how to combine predictions from different sliding windows. In this paper, we propose sequence-to-point learning, where the input is a window of the mains and the output is a single point of the target appliance. We use convolutional neural networks to train the model. Interestingly, we systematically show that the convolutional neural networks can inherently learn the signatures of the target appliances, which are automatically added into the model to reduce the identifiability problem. We applied the proposed neural network approaches to real-world household energy data, and show that the methods achieve state-of-the-art performance, improving two standard error measures by 84% and 92%. |
Tasks | |
Published | 2016-12-29 |
URL | http://arxiv.org/abs/1612.09106v3 |
http://arxiv.org/pdf/1612.09106v3.pdf | |
PWC | https://paperswithcode.com/paper/sequence-to-point-learning-with-neural |
Repo | https://github.com/OdysseasKr/online-nilm |
Framework | tf |
Deep Reinforcement Learning for Mention-Ranking Coreference Models
Title | Deep Reinforcement Learning for Mention-Ranking Coreference Models |
Authors | Kevin Clark, Christopher D. Manning |
Abstract | Coreference resolution systems are typically trained with heuristic loss functions that require careful tuning. In this paper we instead apply reinforcement learning to directly optimize a neural mention-ranking model for coreference evaluation metrics. We experiment with two approaches: the REINFORCE policy gradient algorithm and a reward-rescaled max-margin objective. We find the latter to be more effective, resulting in significant improvements over the current state-of-the-art on the English and Chinese portions of the CoNLL 2012 Shared Task. |
Tasks | Coreference Resolution |
Published | 2016-09-27 |
URL | http://arxiv.org/abs/1609.08667v3 |
http://arxiv.org/pdf/1609.08667v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-for-mention |
Repo | https://github.com/clarkkev/deep-coref |
Framework | none |
Horizon Lines in the Wild
Title | Horizon Lines in the Wild |
Authors | Scott Workman, Menghua Zhai, Nathan Jacobs |
Abstract | The horizon line is an important contextual attribute for a wide variety of image understanding tasks. As such, many methods have been proposed to estimate its location from a single image. These methods typically require the image to contain specific cues, such as vanishing points, coplanar circles, and regular textures, thus limiting their real-world applicability. We introduce a large, realistic evaluation dataset, Horizon Lines in the Wild (HLW), containing natural images with labeled horizon lines. Using this dataset, we investigate the application of convolutional neural networks for directly estimating the horizon line, without requiring any explicit geometric constraints or other special cues. An extensive evaluation shows that using our CNNs, either in isolation or in conjunction with a previous geometric approach, we achieve state-of-the-art results on the challenging HLW dataset and two existing benchmark datasets. |
Tasks | Horizon Line Estimation |
Published | 2016-04-07 |
URL | http://arxiv.org/abs/1604.02129v2 |
http://arxiv.org/pdf/1604.02129v2.pdf | |
PWC | https://paperswithcode.com/paper/horizon-lines-in-the-wild |
Repo | https://github.com/scottworkman/deephorizon |
Framework | caffe2 |
Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models
Title | Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models |
Authors | Ashwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra |
Abstract | Neural sequence models are widely used to model time-series data. Equally ubiquitous is the usage of beam search (BS) as an approximate inference algorithm to decode output sequences from these models. BS explores the search space in a greedy left-right fashion retaining only the top-B candidates - resulting in sequences that differ only slightly from each other. Producing lists of nearly identical sequences is not only computationally wasteful but also typically fails to capture the inherent ambiguity of complex AI tasks. To overcome this problem, we propose Diverse Beam Search (DBS), an alternative to BS that decodes a list of diverse outputs by optimizing for a diversity-augmented objective. We observe that our method finds better top-1 solutions by controlling for the exploration and exploitation of the search space - implying that DBS is a better search algorithm. Moreover, these gains are achieved with minimal computational or memory over- head as compared to beam search. To demonstrate the broad applicability of our method, we present results on image captioning, machine translation and visual question generation using both standard quantitative metrics and qualitative human studies. Further, we study the role of diversity for image-grounded language generation tasks as the complexity of the image changes. We observe that our method consistently outperforms BS and previously proposed techniques for diverse decoding from neural sequence models. |
Tasks | Image Captioning, Machine Translation, Question Generation, Text Generation, Time Series |
Published | 2016-10-07 |
URL | http://arxiv.org/abs/1610.02424v2 |
http://arxiv.org/pdf/1610.02424v2.pdf | |
PWC | https://paperswithcode.com/paper/diverse-beam-search-decoding-diverse |
Repo | https://github.com/pytorch/fairseq |
Framework | pytorch |
Automatic Differentiation Variational Inference
Title | Automatic Differentiation Variational Inference |
Authors | Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, David M. Blei |
Abstract | Probabilistic modeling is iterative. A scientist posits a simple model, fits it to her data, refines it according to her analysis, and repeats. However, fitting complex models to large data is a bottleneck in this process. Deriving algorithms for new models can be both mathematically and computationally challenging, which makes it difficult to efficiently cycle through the steps. To this end, we develop automatic differentiation variational inference (ADVI). Using our method, the scientist only provides a probabilistic model and a dataset, nothing else. ADVI automatically derives an efficient variational inference algorithm, freeing the scientist to refine and explore many models. ADVI supports a broad class of models-no conjugacy assumptions are required. We study ADVI across ten different models and apply it to a dataset with millions of observations. ADVI is integrated into Stan, a probabilistic programming system; it is available for immediate use. |
Tasks | Probabilistic Programming |
Published | 2016-03-02 |
URL | http://arxiv.org/abs/1603.00788v1 |
http://arxiv.org/pdf/1603.00788v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-differentiation-variational |
Repo | https://github.com/yiyuezhuo/bayes-torch |
Framework | pytorch |
FastMask: Segment Multi-scale Object Candidates in One Shot
Title | FastMask: Segment Multi-scale Object Candidates in One Shot |
Authors | Hexiang Hu, Shiyi Lan, Yuning Jiang, Zhimin Cao, Fei Sha |
Abstract | Objects appear to scale differently in natural images. This fact requires methods dealing with object-centric tasks (e.g. object proposal) to have robust performance over variances in object scales. In the paper, we present a novel segment proposal framework, namely FastMask, which takes advantage of hierarchical features in deep convolutional neural networks to segment multi-scale objects in one shot. Innovatively, we adapt segment proposal network into three different functional components (body, neck and head). We further propose a weight-shared residual neck module as well as a scale-tolerant attentional head module for efficient one-shot inference. On MS COCO benchmark, the proposed FastMask outperforms all state-of-the-art segment proposal methods in average recall being 2~5 times faster. Moreover, with a slight trade-off in accuracy, FastMask can segment objects in near real time (~13 fps) with 800*600 resolution images, demonstrating its potential in practical applications. Our implementation is available on https://github.com/voidrank/FastMask. |
Tasks | |
Published | 2016-12-28 |
URL | http://arxiv.org/abs/1612.08843v4 |
http://arxiv.org/pdf/1612.08843v4.pdf | |
PWC | https://paperswithcode.com/paper/fastmask-segment-multi-scale-object |
Repo | https://github.com/voidrank/FastMask |
Framework | caffe2 |
A Threshold-based Scheme for Reinforcement Learning in Neural Networks
Title | A Threshold-based Scheme for Reinforcement Learning in Neural Networks |
Authors | Thomas H. Ward |
Abstract | A generic and scalable Reinforcement Learning scheme for Artificial Neural Networks is presented, providing a general purpose learning machine. By reference to a node threshold three features are described 1) A mechanism for Primary Reinforcement, capable of solving linearly inseparable problems 2) The learning scheme is extended to include a mechanism for Conditioned Reinforcement, capable of forming long term strategy 3) The learning scheme is modified to use a threshold-based deep learning algorithm, providing a robust and biologically inspired alternative to backpropagation. The model may be used for supervised as well as unsupervised training regimes. |
Tasks | |
Published | 2016-09-12 |
URL | http://arxiv.org/abs/1609.03348v4 |
http://arxiv.org/pdf/1609.03348v4.pdf | |
PWC | https://paperswithcode.com/paper/a-threshold-based-scheme-for-reinforcement |
Repo | https://github.com/thward/neural_agent |
Framework | none |
Saliency Driven Image Manipulation
Title | Saliency Driven Image Manipulation |
Authors | Roey Mechrez, Eli Shechtman, Lihi Zelnik-Manor |
Abstract | Have you ever taken a picture only to find out that an unimportant background object ended up being overly salient? Or one of those team sports photos where your favorite player blends with the rest? Wouldn’t it be nice if you could tweak these pictures just a little bit so that the distractor would be attenuated and your favorite player will stand-out among her peers? Manipulating images in order to control the saliency of objects is the goal of this paper. We propose an approach that considers the internal color and saliency properties of the image. It changes the saliency map via an optimization framework that relies on patch-based manipulation using only patches from within the same image to achieve realistic looking results. Applications include object enhancement, distractors attenuation and background decluttering. Comparing our method to previous ones shows significant improvement, both in the achieved saliency manipulation and in the realistic appearance of the resulting images. |
Tasks | |
Published | 2016-12-07 |
URL | http://arxiv.org/abs/1612.02184v3 |
http://arxiv.org/pdf/1612.02184v3.pdf | |
PWC | https://paperswithcode.com/paper/saliency-driven-image-manipulation |
Repo | https://github.com/roimehrez/photorealism |
Framework | none |
We don’t need no bounding-boxes: Training object class detectors using only human verification
Title | We don’t need no bounding-boxes: Training object class detectors using only human verification |
Authors | Dim P. Papadopoulos, Jasper R. R. Uijlings, Frank Keller, Vittorio Ferrari |
Abstract | Training object class detectors typically requires a large set of images in which objects are annotated by bounding-boxes. However, manually drawing bounding-boxes is very time consuming. We propose a new scheme for training object detectors which only requires annotators to verify bounding-boxes produced automatically by the learning algorithm. Our scheme iterates between re-training the detector, re-localizing objects in the training images, and human verification. We use the verification signal both to improve re-training and to reduce the search space for re-localisation, which makes these steps different to what is normally done in a weakly supervised setting. Extensive experiments on PASCAL VOC 2007 show that (1) using human verification to update detectors and reduce the search space leads to the rapid production of high-quality bounding-box annotations; (2) our scheme delivers detectors performing almost as good as those trained in a fully supervised setting, without ever drawing any bounding-box; (3) as the verification task is very quick, our scheme substantially reduces total annotation time by a factor 6x-9x. |
Tasks | |
Published | 2016-02-26 |
URL | http://arxiv.org/abs/1602.08405v3 |
http://arxiv.org/pdf/1602.08405v3.pdf | |
PWC | https://paperswithcode.com/paper/we-dont-need-no-bounding-boxes-training |
Repo | https://github.com/EscVM/OIDv4_ToolKit |
Framework | none |
A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering
Title | A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering |
Authors | Tegan Maharaj, Nicolas Ballas, Anna Rohrbach, Aaron Courville, Christopher Pal |
Abstract | While deep convolutional neural networks frequently approach or exceed human-level performance at benchmark tasks involving static images, extending this success to moving images is not straightforward. Having models which can learn to understand video is of interest for many applications, including content recommendation, prediction, summarization, event/object detection and understanding human visual perception, but many domains lack sufficient data to explore and perfect video models. In order to address the need for a simple, quantitative benchmark for developing and understanding video, we present MovieFIB, a fill-in-the-blank question-answering dataset with over 300,000 examples, based on descriptive video annotations for the visually impaired. In addition to presenting statistics and a description of the dataset, we perform a detailed analysis of 5 different models’ predictions, and compare these with human performance. We investigate the relative importance of language, static (2D) visual features, and moving (3D) visual features; the effects of increasing dataset size, the number of frames sampled; and of vocabulary size. We illustrate that: this task is not solvable by a language model alone; our model combining 2D and 3D visual information indeed provides the best result; all models perform significantly worse than human-level. We provide human evaluations for responses given by different models and find that accuracy on the MovieFIB evaluation corresponds well with human judgement. We suggest avenues for improving video models, and hope that the proposed dataset can be useful for measuring and encouraging progress in this very interesting field. |
Tasks | Language Modelling, Object Detection, Question Answering |
Published | 2016-11-23 |
URL | http://arxiv.org/abs/1611.07810v2 |
http://arxiv.org/pdf/1611.07810v2.pdf | |
PWC | https://paperswithcode.com/paper/a-dataset-and-exploration-of-models-for |
Repo | https://github.com/teganmaharaj/movieFIB |
Framework | none |
Region-based semantic segmentation with end-to-end training
Title | Region-based semantic segmentation with end-to-end training |
Authors | Holger Caesar, Jasper Uijlings, Vittorio Ferrari |
Abstract | We propose a novel method for semantic segmentation, the task of labeling each pixel in an image with a semantic class. Our method combines the advantages of the two main competing paradigms. Methods based on region classification offer proper spatial support for appearance measurements, but typically operate in two separate stages, none of which targets pixel labeling performance at the end of the pipeline. More recent fully convolutional methods are capable of end-to-end training for the final pixel labeling, but resort to fixed patches as spatial support. We show how to modify modern region-based approaches to enable end-to-end training for semantic segmentation. This is achieved via a differentiable region-to-pixel layer and a differentiable free-form Region-of-Interest pooling layer. Our method improves the state-of-the-art in terms of class-average accuracy with 64.0% on SIFT Flow and 49.9% on PASCAL Context, and is particularly accurate at object boundaries. |
Tasks | Semantic Segmentation |
Published | 2016-07-26 |
URL | http://arxiv.org/abs/1607.07671v1 |
http://arxiv.org/pdf/1607.07671v1.pdf | |
PWC | https://paperswithcode.com/paper/region-based-semantic-segmentation-with-end |
Repo | https://github.com/nightrome/matconvnet-calvin |
Framework | none |